sirikata / sirikata

Sirikata is a BSD-licensed platform for networked 3d environments
http://www.sirikata.com/
Other
126 stars 39 forks source link

CassandraStorage doesn't roll back failed transactions #494

Open ewencp opened 12 years ago

ewencp commented 12 years ago

CassandraStorage just processes the parts of transactions one at a time, which doesn't match the transactional semantics required by the OH::Storage interface. 8fdf319003ee9fdb87e26ee9528f1c2d3da23729 introduces a test which shows how the failure can occur. Just ordering mutation operations last isn't sufficient since, e.g., the last mutation could fail.

If Cassandra can't implement this directly, we might need to build a layer on top of Cassandra which can do these multi-key updates. Note that this may have also become a bit more complicated with the compare operations added in f88435fcd4d329af387dcf3b7fe08436ca969cc7.

xiaozhou commented 12 years ago

Cassandra supports atomic update within a single bucket. Can you try with an "invalid read" rather than "invalid erase"? I doubt whether Cassandra treat erasing a non-existed key as "invalid".

ewencp commented 12 years ago

Invalid reads won't trigger the error because CassandraStorage performs those first. I put some notes in the test about why it's written the way it is -- the test tries to be robust to different OH::Storage implementation details like the order in which reads,writes,erases are handled and if the order in which the keys being operated on doesn't match the order they are requested via calls to OH::Storage methods (e.g. if the OH::Storage implementation stored requested read keys in a std::set, which would cause them to be lexicographically ordered).

Based on my reading of the Cassandra FAQ, Cassandra probably doesn't directly support what we need for this bug or #495. We probably need a layer on top of Cassandra which can provide these guarantees.

xiaozhou commented 12 years ago

I still don't quite get it. As atomic is all occur or nothing occurs, shouldn't the orders be irrelevant when performing multi-keys transactions?

At least we should modify the "invalid erase" part of the test file, since Cassandra won't return error if the erasing key does not exist.

jterrace commented 12 years ago

Yes, atomic means they all occur or none of them occur. I think you could implement this on cassandra but it would be a bit tricky:

  1. grab a lock (can be implemented with a TTL key)
  2. if the op_count differs from finished_count, resume the previous copying of the table
  3. if the success_count differs from op_count, delete previous tables
  4. begin transaction by incrementing an op_count and get its current value
  5. copy the table to a temporary table named by op_count
  6. perform the actions on the temp table
  7. if fails, delete temp table and unlock
  8. if all succeed, set success_count to op_count and copy the values back to the original table
  9. set finished_count value to op_count
  10. unlock

Might be some errors in there, but I think that's the basic idea that should work. This would be pretty inefficient though. Not sure if we want to implement this or not.

ewencp commented 12 years ago

I agree that doing locking and copying for every transaction would probably kill performance.

How about instead we bake leases into the storage interface? Before performing transactions there is an operation for grabbing a lease (and have renewals as well). Making the leases explicit would allow the user of OH::Storage to control it (e.g. length of the leases to trade off cost/availability) and makes it clear that the expected use is for a single OH to be accessing a bucket at a time.

This would solve #495's issue (two OH's loading up object with same ID and trying to use the same bucket, which is a case that needs to be handled when there's migration or just multiple OH's run by a coordinator) by forcing us to do a bit of work up front to lock the entire bucket.

It would also effectively solve this issue since the current approach works as long as the final batchMutate is atomic and nobody else is reading from the bucket at the same time. We would both be able to check the validity of erase commands and using different, non-atomic commands for reads and writes so long as we can guarantee the mutations fail if the lease expires. It would also make implementing the compare functionality simple.