palantir / atlasdb

Transactional Distributed Database Layer
https://palantir.github.io/atlasdb/
Apache License 2.0
52 stars 9 forks source link

We are miscalculating quorum for cassandra #3611

Open gmaretic opened 5 years ago

gmaretic commented 5 years ago

CassandraKeyValueServices.waitForSchemaVersions calcualtes quroum from config as n/2 + 1, both of which are wrong. We should use the number of nodes in the cluster (that may dynamically change) and also take into account the rf

gmaretic commented 5 years ago

We need to make sure we have a quorum of racks available. If this is not easy to do, a super-conservative check we can do is to make sure we have at most (rf - 1)/ 2 unreachable nodes