This PR proposes improvements to the trie-improvements PR.
Necessary Changes
The following is the list of changes that were made in the first commit, to get the code to compile.
Updated Store implementation to match with interface (using [32]byte and []byte instead of ledger.Path and ledger.Payload).
Changed height-related attributes in light nodes/checkpoints to use a uint8 instead of a uint16.
Moved mapper.AllPaths to trie.Paths() since its logic cannot be implemented externally if node.LeftChild() and node.RightChild() are not public methods.
Reworked the node iterator so that it uses node.left and node.right instead of node.LeftChild() and node.RightChild(). In order to have access to those fields however, the iterator must each time assert that the node is a branch and not an extension or a leaf.
Replaced the return with a continue statement, when a leaf is created in the trie insertion method, as this would prevent it from ever getting its hash computed. The only place where we should return is after computing the leaf hash and storing it in the DB.
Used the height of the parent extension, for leaf hash computation, as discussed before.
For some reason we had an outdated badger import somewhere (v1) so this was removed as well.
The following is a list of changes that are required in order to be able to produce hashes that match the Flow Go implementation of a trie.
Have access to the height of the branches and extensions at the moment their hash is computed.
~Can be done by simply adding back the height attribute to those nodes. This uses RAM (1 byte per node) but is a simple solution.~
Done in ea614ff (#525) by recursively passing a depth/height value all the way down the trie, from the root node to each child. This saves up RAM (1 byte per node) but adds limitations.
Nodes are no longer independent and able to compute their own hash by themselves.
This adds a tiny bit of complexity and cognitive overhead.
This also means that we need the node iterator to keep track of heights (and ultimately, paths) by iterating on structures that contain nodes, paths and heights instead of just nodes.
The height at which nodes are stored cannot be 0, it needs to be 1. The Flow trie implementation goes from 256 to 1 (the same way that a uint8 goes from 0 to 255). Going from 256 to 0 would require using a uint16 for heights, depths and extension skip lengths. What we can do though, is use 0 to 255 on our side, so that we have a uint8, and just do +1 whenever we need to compute a hash or -1 when we read from a checkpoint in legacy format.
Currently, the light_node implementation cannot work, since it relies on marshalling/unmarshalling the heights and paths of nodes, but the way it is currently implemented does not give it access to those. It just sees one node, without any context, and is supposed to convert it. This needs to be reworked in order to function with the new nodes. Done in ab6bf63 (#525)
Additionally, we might not even need to store leaf nodes in our light trie implementation, since the extension will contain the hash of its leaf. We can say that if an extension doesn't have a branch or another extension as its child, it is assumed to have a leaf with the same hash as its own.
Payload storage likely needs to be changed so that the whole payload is stored and not just the payload value, as we discussed before. Done in ab6bf63 (#525)
Since the store implementation adds entries to its cache, but also to a DB transaction, which can need to be committed, which can fail, I don't understand why the error return of t he Save method was removed. It is necessary to handle that error, since if an insertion fails and the value is not in the cache like it's supposed to be, the trie will thereafter produce incorrect hashes or be unable to read the payloads for certain leaves. Everything needs to stop immediately if that happens. Done in ab6bf63 (#525)
Add a trie.Clone method for the mapper to be able to produce new tries for its forest.
In order to test the trie, we need external tests (so that we can use mocks without having an import cycle, and also because it makes more sense to have external tests for this kind of logic), which means we need constructors back for the Node types. Not sure why they were removed.
Goal of this PR
This PR proposes improvements to the
trie-improvements
PR.Necessary Changes
The following is the list of changes that were made in the first commit, to get the code to compile.
Store
implementation to match with interface (using[32]byte
and[]byte
instead ofledger.Path
andledger.Payload
).uint8
instead of auint16
.mapper.AllPaths
totrie.Paths()
since its logic cannot be implemented externally ifnode.LeftChild()
andnode.RightChild()
are not public methods.node.left
andnode.right
instead ofnode.LeftChild()
andnode.RightChild()
. In order to have access to those fields however, the iterator must each time assert that the node is a branch and not an extension or a leaf.return
with acontinue
statement, when a leaf is created in the trie insertion method, as this would prevent it from ever getting its hash computed. The only place where we should return is after computing the leaf hash and storing it in the DB.The following is a list of changes that are required in order to be able to produce hashes that match the Flow Go implementation of a trie.
height
attribute to those nodes. This uses RAM (1 byte per node) but is a simple solution.~ea614ff
(#525) by recursively passing adepth
/height
value all the way down the trie, from the root node to each child. This saves up RAM (1 byte per node) but adds limitations.0
, it needs to be1
. The Flow trie implementation goes from256
to1
(the same way that auint8
goes from0
to255
). Going from256
to0
would require using auint16
for heights, depths and extension skip lengths. What we can do though, is use0
to255
on our side, so that we have auint8
, and just do +1 whenever we need to compute a hash or -1 when we read from a checkpoint in legacy format.ea614ff
(#525)light_node
implementation cannot work, since it relies on marshalling/unmarshalling the heights and paths of nodes, but the way it is currently implemented does not give it access to those. It just sees one node, without any context, and is supposed to convert it. This needs to be reworked in order to function with the new nodes. Done inab6bf63
(#525)ab6bf63
(#525)Save
method was removed. It is necessary to handle that error, since if an insertion fails and the value is not in the cache like it's supposed to be, the trie will thereafter produce incorrect hashes or be unable to read the payloads for certain leaves. Everything needs to stop immediately if that happens. Done inab6bf63
(#525)trie.Clone
method for the mapper to be able to produce new tries for its forest.