Closed benbjohnson closed 1 year ago
If this is the whole history, yeah, it's illegal--there's reads here of values which were never written. But I'm guessing the register did take on values 117, 119, and 121 at points in the past, and if so this could be legal. All 3 are executed by different processes, and so their operations could be executed at any point in the timeline when the value was 117, or 119, or 121 respectively. It'd have to be consistent with earlier operations executed by that specific process, of course, but those aren't shown here.
@aphyr Ok, my understanding was that the identity CAS would force the read to be linearizable but it sounds like it could be re-ordered to a previous state where that value existed since it wouldn't change the history of other clients.
So in order force a linearizable read using a CAS, would the value set to seq-kv
need to have some kind of unique identifier to force it to be the latest? e.g
n0: read("key") => {"value":121, "rnd":1287638"}
n0: cas("key", {"value":121, "rnd":1287638"}, {"value":121, "rnd":8347813"})
Linearizability is not the same property as sequential consistency. Seq-kv is only sequential, not linearizable. There is no meaningful concept of "latest" in this kind of system--at least not in a realtime sense. For a linearizable store, use lin-kv. :-)
For the Grow-Only Counter challenge, it uses seq-kv
and one of the requirements of the g-counter
workload, AFAICT, is that it requires the final value to be correct. We had discussed that one option was to perform an identity CAS on the reads.
I'm sure I'm missing something here. I was thinking that the identity CAS would force the client to effectively catch up with the latest write since there is a total order on the writes and that would allow all counts at the end to be the same. Is that not the case? Or does the g-counter
allow multiple end values to be correct?
Ha! This may actually be too hard--it relies on a subtle understanding of sequential consistency and I didn't hint at it in the challenge--if people consistently struggle we may need to add more explicit hints.
Just like with a real DB, reads and even identity CAS in seq-kv can be arbitrarily stale; they can be safely reordered to execute in the past. In order to read the "current" state you can do two things. One option that I thought most people with real-world DB experience might try is to spam seq-kv with operations; since it's sequential your client must eventually, probabilistically, converge to the most recent logical state. However there's another, more subtle option: perform an operation that could not possibly have occurred before the ops you want your client to observe. If it succeeds, any later op your client performs must observe them! If you'd like to ponder that before the next paragraph, please do--it's not necessarily obvious.
What ought to work (though I have not tried it due to time constraints!) is something like a write of a unique key or value--something that has never been written before. Since that write creates a state of seq-kv that could never have existed before already-completed writes, it must occur after them in the sequential timeline. Later reads by that client must then observe those already-completed writes.
Thanks for the detail, Kyle! That makes sense.
I understood what you wrote, and successfully implemented it, but don't quite understand why it works.
Here's an example of the previous problem:
n0: cas from 0 to 10
n1: cas from 10 to 20
n1: read 20
n0: read 10
This order is valid because n0's last read can observe system state before the cas by n1.
To mitigate this, we add a write <brand new key>
before every read. However, this should still result in similarly unsynced reads.
n0: cas from 0 to 10
n1: cas from 10 to 20
n1: write random_39928
n1: read 20
n0: write random_97712
n0: read 10
This is still a valid system order, for the exact same reason. Both the write and read from n0 can happen against a version of the data before n1's final writes. So how does that random write operation make the system work in a strictly consistent way?
Technically true, but consider: in order to safely reorder the write to an arbitrary point in the past, the system would need to retain every operation--including, say, predicate reads--ever performed, and prove that none of them could have interacted with the write. Practically seq-kv's behavior is sorta closer to ZK-style OSC(U), but it's actually weaker than that because Maelstrom is clever enough to allow reordering of writes to the past--but only where we can prove state equivalency. Unique writes, for us, force unique states!
Got it. I don't have any academic knowledge on the subject, so may be very wrong here, but is it safe to say that this is an implementation detail of this specific data store rather than an intrinsic property of sequentially consistent systems?
the system would need to retain every operation--including, say, predicate reads--ever performed, and prove that none of them could have interacted with the write
This should be possible for a similar kv store to ensure, right?
Yup, completely valid! I should probably be more rigorous about this. :-)
This should be possible for a similar kv store to ensure, right?
Hah, well, uh... I think this degrades to something like every operation costing something awful (maybe O(txns!)
?) while the system tries to compute a legal reordering over the entire transaction history. The implementation I wrote for Maelstrom tries to be extra evil, but even it only tries twice--once at a random past state, and if that fails, at the most recent state. Doesn't even ask about transactions themselves, though we theoretically could. Maybe worth a shot, with some sort of limited window.
Hi @aphyr. Someone recently posted a graph of messages against the
seq-kv
where two separate identity CAS operations succeed:n0
for value121
n1
for value119
Is this legal under Sequential consistency? My understanding from the section in Services was that updates could only interact with past states if total-order is not violated.
tl;dr just wondering if this is a bug or if I'm misunderstanding something. Thanks!