Closed tpetracca closed 7 years ago
Partition Handling Re: #1252 @SerialVelocity actually fixed it, I think I just explained how some of the cases were handled. 😃 For the partition case, if I'm not wrong this is safe for timestamps, I'm not 100% sure about locks:
(1) addAndGets to the DistributedLong will fail at A once it's partitioned off even if it doesn't know anything about B's election, because you need a majority quorum for reads/writes (Atomix docs), and (2) as you say I don't think A can become re-elected without that line in ILP firing.
Default ConsistencyLevel If we don't get an answer soon I could try digging around in a debugger tomorrow.
Assigning to @SerialVelocity - can you take a look and see if you're worried about Tom's other concerns?
@rhero Which other concerns do you mean? Jeremy answered the first part and the second part was answered in the gitter room (generally all operations are atomic unless otherwise specified)
Closed - Jepsen tests and current impl satisfy this.
CC @carrino @SerialVelocity @jeremyk-91
I don't necessarily think we have any problems with the implementation, I just more wanted to put on everyones radar in one place some things that I think are interesting to understand about timelock and atomix.
Leader -> Partitioned -> Re-elected Leader
For this I'm curious about what happens if an election term happens while we're partitioned from the cluster. In #1252 @jeremyk-91 fixed an issue where A loses leadership to B and then regains it. My interpretation of why the fix works is because when B gets elected A will invalidate its cache, and when A get's re-elected A will invalidate its cache again.
What if A doesn't know about the two elections because it was partitioned? I guess this should be impossible; like how could A possibly get elected without https://github.com/palantir/atlasdb/blob/develop/atlasdb-timelock-server/src/main/java/com/palantir/atlasdb/timelock/atomix/InvalidatingLeaderProxy.java#L63 firing?
If we were actually concerned about this we could be the term long into the leaderId, and on the check for if we're the leader we also invalidate the cache if the term long is different then the last one we were aware of.
Default ConsistencyLevel for Operations
I assume the default is ATOMIC, which is what we want but it's unclear and I wasn't able to figure it out running it down in code. I've asked in gitter: https://gitter.im/atomix/atomix
// other things?