jonhoo / left-right

A lock-free, read-optimized, concurrency primitive.
Apache License 2.0
1.95k stars 94 forks source link

rollback #77

Open ratmice opened 3 years ago

ratmice commented 3 years ago

When looking at how the refresh mechanism works on stream, it ocurred to me that it seems like a (single level) rollback mechanism seems like it would be possible to implement, by:

I think the first/second drop seems tricky though, because you would need to perform the second drop, for the items in the oplog. (i.e. you want to actually drop the values which are being rolled back, but don't exist in the stale map, while not performing the second drop on items in the stale map).

There was a bit of musing on uses for non-collection evmaps and what they are useful for. I think the addition of rollback could make them more obviously useful.

This is not really something I need, but seemed not impossible to implement

dialtone commented 3 years ago

+1 to this functionality or something like it.

Our use case currently involves simply rebuilding the whole hashmap from scratch each time a new configuration file is downloaded rather than updating it in place. We have this implemented in erlang using ets tables that we create and drop periodically, specifically we drop after we're reasonably sure no more readers are still referencing them, although no guarantees. evmap would allow us to rebuild this usecase in a way, however here are some considerations:

1) the forced oplog clearly adds overhead to this work, since data first makes it inside the oplog and then goes in the handle upon publish time. Sadly I'm too new to Rust to think about a way to deal with old readers still accessing this old ReadHandle but if there was one way, perhaps you could just give more direct access to the evmap without going through the oplog, and simply return an error for stale reads.

2) while in the current system we can discard the new ets table if the process that is building it fails to parse the file, in this new evmap world there is no transaction concept and thus you would end up with an oplog that is effectively invalid at this point. This is the case even if one were to implement this by calling first WriteHandle::purge() and then re-inserting all the keys, and you would actually also pay the price of iterating through each key 2 times I think. And in reality if there's a bug in the code that is supposed to generate the new evmap version you'd end up with a quickly growing memory leak inside this oplog.

3) the evmap::WriteHandle::extend(...) call could be helpful here, except it only supports inserts vs all the other operations like this crate does. For example the writer could submit a big transaction of its own with all of the operations needed and make the whole thing effectively atomic via left-right new WriteHandle::extend() call. It's not a big deal now if the new values don't change as they end up in the hashbag, but if they do change then the hashbag has no guarantees on returning the last item inserted, so using left-right original extend provides more atomic functionality.

I realize this comment straddles a bit between evmap and left-right but I hope it's fine anyway as the problem bridges a bit the 2.

jonhoo commented 3 years ago

@dialtone Ah, I think maybe you have a different impression of how evmap works than how it actually works. It doesn't constantly create a new map for writes, it continues to use the same two maps over and over, and just shifts readers from one to the other, with the writer carefully choosing when it's safe to modify each.

The rollback being discussed here is specifically about what you do if you enqueue a number of operations and then want to void them instead of calling publish. In that case, rolling back is pretty easy: simply truncate the oplog down to the operations that are already present in the read map (self.swap_index). I think that's all that's needed here. Not sure if that addresses the use-case you have in mind though?

ratmice commented 3 years ago

@jonhoo fwiw, the rollback I was thinking above is slightly more complex than just tossing away the oplog, but after publish, being able to revert back to the previous published state that existed before the latest call to publish in addition to the current oplog.

Or at least noting that the way that writes are staggered seems to make that possible.

jonhoo commented 3 years ago

Yeah, I think it might be possible to roll back the last publish as well either by:

  1. On rollback wait for readers to depart, swap again immediately (and wait), then clear the write map, then copy in all elements from the reader map (which is now the map from before the prior publish); or
  2. do the same, but instead of clear + copy, walk the oplog and "undo" each operation in it on the write map utilizing the contents of the read map to do so.

2 is far more complex, but also much more efficient!

dialtone commented 3 years ago

@jonhoo hey, I think I may have explained myself badly, sorry about that :).

Yeah I understand evmap always uses the same 2 maps, and in fact it reuses the same pool of linked values as well. My point was simply to explain a usecase in which transactionality of changes was achieved by simply re-creating the whole map each time. This can be avoided if it were possible to queue up a bunch of changes that one could rollback if, for whatever reason, the process of updating the chunk of keys didn't go successfully.