turbofish-org / merk

High-performance Merkle key/value store
Apache License 2.0
226 stars 36 forks source link

Database trait implementation #11

Closed davebryson closed 1 month ago

davebryson commented 5 years ago

First off this is great work! Thank you. I'm looking forward to trying this out with rust-abci and seeing how it develops.

I took a stab at refactoring the DB specific logic out of Merk and into a separate trait and associated modules. The work is in my fork HERE. I didn't submit a pull request for any of this since it appears you're focused on refactoring on the develop branch.

I based my fork on your master branch.

I added:

I also refactored merk.rs to use the the new trait and removed the mut requirement from get_node.

All tests and benchmarks are updated and everything is passing.

I'm more than happy to work it into the main codebase if/when you're interested in using it.

mappum commented 5 years ago

Awesome, thanks for working on this! And sorry for leaving develop in a half-refactored state, in the future I'll contain half-baked changes in PRs to better communicate what's going on.

The Database trait will be nice for tests, I'll be happy to merge these changes after the current refactor.

mappum commented 5 years ago

BTW, this change will look a lot different after this refactor since it makes some fundamental changes, e.g. supporting concurrency for the backing store.

Also, something to keep in mind is that in some cases I'd prefer keeping things coupled to RocksDB since there are features that might be hard to implement in other stores (like the checkpointing feature), so abstracting the store should primarily be for testing purposes rather than focusing on supporting other backing stores (although this could change in the future).

davebryson commented 5 years ago

Understand. I agree Rocksdb is probably the best bet for now. I'm also interested in using columns vs prefixing keys with a common namespace.

Down the road it may be interesting to look at other potential stores such as Sled

Side thought I have though.... How fast is fast enough for a Tree in a blockchain environment? You're only as fast as the slowest component - you're still bound by the consensus process and state transition logic regardless of how fast the tree is.

mappum commented 5 years ago

How fast is fast enough for a Tree in a blockchain environment? You're only as fast as the slowest component - you're still bound by the consensus process and state transition logic regardless of how fast the tree is.

After Merk I'd like to optimize the state machine as well. While it's true you need to process transactions serially in order to ensure determinism, the vast majority of transactions have no intersection of the set of keys they read from or write to (Alice sending to Bob can be processed independently from Carol sending to Dave (hey, that's you)).

So to process transactions concurrently, we can simply bin them into N queues, where transactions with intersecting key sets end up in the same bin and do get a canonical ordering. Now we can process transactions in parallel for an e.g. 32x throughput increase. Also, I think this can be done in a way where the consensus algorithm does not significantly affect the throughput.

If we know the set of keys already, this can be done efficiently with Bloom filters, since set intersection is just the AND of the filter bits, and set union is the OR, and I believe we can make a data structure to even make these operations faster.

I have more thoughts about this but I'll write them up elsewhere.

mappum commented 5 years ago

I'm also interested in using columns vs prefixing keys with a common namespace.

Interesting, is this for a performance increase, or for logically structuring the data for app developers?

davebryson commented 5 years ago

Ok. I see where you're going with all this now. Interesting and exciting stuff!

... is this for a performance increase, or for logically structuring the data for app developers?

I don't know about performance with columns. For now, I was just thinking in terms of "logical structure": Account keys, etc...separating them by columns vs prefixes.

Side note: After fiddling with a simple abci app using the Database trait and Merk, I'm finding the use of the db: &'a mut dyn Database signature is creating an un-necessary battle with the Borrow checker ( at least for me). I think I'm going to take another stab at it and simplify things a bit especially in light of RocksDB being the default Merk key/value. For now, a DB trait is probably overkill. And I'm all about simplicity :smile:

mappum commented 5 years ago

Side note: After fiddling with a simple abci app using the Database trait and Merk, I'm finding the use of the db: &'a mut dyn Database signature is creating an un-necessary battle with the Borrow checker ( at least for me).

One thing to keep in mind is that merk is designed to be written to in batches, once per block. So ABCI apps will require a layer on top that maintains a map of keys/values as the transaction mutates the values, then it should all get put into merk on Commit (I think this might simplify that but I'm not sure).

davebryson commented 5 years ago

Thanks. Yes. I have a map in the sample app I'm working on now: 1 for checkTx and 1 for deliverTx. The deliveryTx map is batched to Merk in the abci.commit().

The &mut dyn Database was causing issues in the main() when trying to pass it all to rust-abci. As the rust-abci app requires a static lifetime. So trying to wrangle the &mut was a real pain. It was just way more complicated than needed. So I switched to a an Arc<Database> signature and life got a lot simpler. I think it may actually be more flexible in the long run.

I'm going to test it out a bit more and update my fork.

tomtau commented 5 years ago

One thing to keep in mind is that merk is designed to be written to in batches, once per block. So ABCI apps will require a layer on top that maintains a map of keys/values as the transaction mutates the values, then it should all get put into merk on Commit (I think this might simplify that but I'm not sure).

FYI, this is what kvdb-rocksdb is kind of doing: https://github.com/paritytech/parity-common/blob/master/kvdb-rocksdb/src/lib.rs#L257

mappum commented 1 month ago

Closing due to inactivity.