palantir / atlasdb

Transactional Distributed Database Layer
https://palantir.github.io/atlasdb/
Apache License 2.0
54 stars 10 forks source link

TimeLock Impl Stuff #1261

Closed tpetracca closed 7 years ago

tpetracca commented 7 years ago

CC @carrino @SerialVelocity @jeremyk-91

I don't necessarily think we have any problems with the implementation, I just more wanted to put on everyones radar in one place some things that I think are interesting to understand about timelock and atomix.

Leader -> Partitioned -> Re-elected Leader

For this I'm curious about what happens if an election term happens while we're partitioned from the cluster. In #1252 @jeremyk-91 fixed an issue where A loses leadership to B and then regains it. My interpretation of why the fix works is because when B gets elected A will invalidate its cache, and when A get's re-elected A will invalidate its cache again.

What if A doesn't know about the two elections because it was partitioned? I guess this should be impossible; like how could A possibly get elected without https://github.com/palantir/atlasdb/blob/develop/atlasdb-timelock-server/src/main/java/com/palantir/atlasdb/timelock/atomix/InvalidatingLeaderProxy.java#L63 firing?

If we were actually concerned about this we could be the term long into the leaderId, and on the check for if we're the leader we also invalidate the cache if the term long is different then the last one we were aware of.

Default ConsistencyLevel for Operations

I assume the default is ATOMIC, which is what we want but it's unclear and I wasn't able to figure it out running it down in code. I've asked in gitter: https://gitter.im/atomix/atomix

// other things?

jeremyk-91 commented 7 years ago

Partition Handling Re: #1252 @SerialVelocity actually fixed it, I think I just explained how some of the cases were handled. 😃 For the partition case, if I'm not wrong this is safe for timestamps, I'm not 100% sure about locks:

(1) addAndGets to the DistributedLong will fail at A once it's partitioned off even if it doesn't know anything about B's election, because you need a majority quorum for reads/writes (Atomix docs), and (2) as you say I don't think A can become re-elected without that line in ILP firing.

Default ConsistencyLevel If we don't get an answer soon I could try digging around in a debugger tomorrow.

rhero commented 7 years ago

Assigning to @SerialVelocity - can you take a look and see if you're worried about Tom's other concerns?

SerialVelocity commented 7 years ago

@rhero Which other concerns do you mean? Jeremy answered the first part and the second part was answered in the gitter room (generally all operations are atomic unless otherwise specified)

gsheasby commented 7 years ago

Closed - Jepsen tests and current impl satisfy this.