lbryio / lbrycrd

The blockchain that provides the digital content namespace for the LBRY protocol
https://lbry.com
MIT License
2.57k stars 178 forks source link

Reimplement ClaimTrie as Patricia Trie #106

Closed lyoshenka closed 5 years ago

lyoshenka commented 6 years ago

Our Claimtrie is inefficient because it must store lots of empty nodes when storing a long name. The solution is to use this: https://github.com/ethereum/wiki/wiki/Patricia-Tree

shyba commented 6 years ago

For the sake of having this documented, we should be mindful that there is a space efficiency optimization and a time efficiency one mixed in this change. If we are doing one or both is a pending discussion. If we just change the trie without changing how proofs are calculated we could solve the memory issue without a hardfork. While changing the proof calculation to match the new data structure (skipping empty nodes) would require a hardfork.

BrannonKing commented 6 years ago

I was just doing some testing on node counts for this. With a million random strings (standard 60 characters), I end up with 25 million nodes in traditional trie. With the collapsed prefixes, I end up with 1.5 million nodes after a million insertions. That's a substantial difference! Next test: the qp trie. We can talk about where to check in my tests and what to name them next week.

BrannonKing commented 6 years ago

I'm attaching my experiments on this. I originally concluded that the PrefixTrie is the right structure, and that it would drastically reduce our (present) memory usage. I also concluded that we don't need a separate library as it is not a lot of code. Since then a discussion with @shyba pointed out that std::map may be the best container. I did some testing and came to that conclusion as well. I'll leave it here on the shelf and hope that we can get c++11 support in lbrycrd in the very near future. I intend to bring it in without a hard fork (keep the hash calculation functioning the way it is).

lbry_tries.zip

BrannonKing commented 6 years ago

Last week I tested the code here with my same sample data: https://github.com/ytakano/radix_tree. It worked well enough, but it was not as fast as std::map. I didn't see a lot of other libraries that were easily accessible. I did not attempt to use Ethereum's trie code. It appeared complex, and I don't want to mix their code in with ours.