palantir / atlasdb

Transactional Distributed Database Layer
https://palantir.github.io/atlasdb/
Apache License 2.0
47 stars 8 forks source link

Revisit the logic for updating Cassandra nodes #4167

Open gmaretic opened 5 years ago

gmaretic commented 5 years ago

Following up on PDS-94960, where we ended up unable to talk to Cassandra.

Since the updates happen only every 2 minutes, trying to talk to a single node that is not blacklisted, it is possible that we got unlucky and ended up in a state where all the old nodes were decommissioned before we were able to find out about any of the new ones.

jeremyk-91 commented 5 years ago

This is fishy:

    private List<TokenRange> getTokenRanges() throws Exception {
        return getRandomGoodHost().runWithPooledResource(CassandraUtils.getDescribeRing(config));
    }