Namestate optimization - Githubissues

pinheadmz commented 3 years ago

We currently have two blockchain optimizations in review:

Flat file block storage (https://github.com/handshake-org/hsd/pull/612)
"Assume valid" checkpoints (https://github.com/handshake-org/hsd/pull/597)

Somewhat surprisingly, we are not seeing a very dramatic performance improvement from either of these, even when combined into a single branch. They are both huge boosts for Bitcoin and its possible that we will see a more dramatic benefit from them as the Handshake blockchain gets bigger.

However, it's occurred to me that in addition to optimizing signature verification and data storage and other Bitcoin-y stuff, what we need to really look at is how Namestate is processed during blockchain sync.

The first thing we can try is simply replace the JavaScript Urkel tree module with the new library written in C: https://github.com/chjj/liburkel

But I think even after data is fetched from Urkel there are performance improvements we can investigate:

Caching

When verifying a block, the CoinView object gets NameState from urkel and caches them here. So even if a block has ten bids for the same name, that name is only pulled from the database once. However, the NameState for a name does not change until the reveal period begins -- thats (5 * 144) + 36 blocks that we can cache a NameState after processing the OPEN. I believe we can be clever about how NameState is cached in chain verification and save some reads from disk. Even during the reveal phase when NameState does get updated (if a reveal has a higher bid) we could theoretically cache those updates as well for 10 * 144 blocks and avoid unnecessary reads (although a higher bid value would still have to be written to disk)

Serialization

In bcoin and hsd there is object called a MemBlock and I think we can use a similar model for NameState.

My first thought about this was, especially during the bidding phase, the only thing we need to know about a name is its height and maybe some of the flags like revoked. So, do we need to NameState.decode(raw) for every name we verify? Do we need JavaScript object instantiation and processing for the entire namestate, for every name?

What would be awesome is if we could just grab the raw buffer from the Urkel tree, and read the U32 height directly from the buffer without decoding anything else. The biggest problem with this idea is unfortunately the NameState serialization isn't optimized for this:

https://github.com/handshake-org/hsd/blob/98a6491cdbfb173b4834a892b9bd55b6839cadbf/lib/covenants/namestate.js#L572-L580

There are a few variable-length fields in the buffer before the height. We'd have to read the name length to know where the data length is, then read the data length to know where the height is. That's not terrible and still might be an optimization but its not as clean as could be if NameState had a fixed-length "header". Re-serializing namestates in Urkel would be a super-brutal hardfork since the tree root committed in each block is based on the serialized namestate of each name in the tree.. and the NameState object has no version number to iterate :-(

pinheadmz commented 3 years ago

Fast-sync

Now that we have checkpoints actually there is another option for us which is an Etehreum-like "fast sync". New bootstrapping nodes (while under the last checkpoint) could simply download the entire Urkel Tree from a peer at a specific tree root that matches the header in the last checkpoint. Name resolution should be deactivated until the node is fully synced anyway. This would allow a new node to skip ALL namestate operations until the last checkpoint is reached, and they could verify the entire Urkel Tree at that point before processing new data.

pinheadmz commented 2 years ago

Reopening this as a placeholder for addressing the cache option. https://github.com/handshake-org/hsd/pull/643 is a great optimization but doesn't actually close this issue.

handshake-org / hsd

Namestate optimization #620

Caching

Serialization

Fast-sync