Improve trie structure to reduce memory usage

Ullaakut commented 2 years ago

Goal of this PR

This PR implements #507

Checklist

Todo

[x] Fix light node -> node transition, since it should currently not work to read a checkpoint with the current implementation.

Tests

All covered.

Does the PR require documentation to be added or updated?

Maybe in the future.

Misc

[x] PR title will be clear as part of the changelog
[x] PR is against the correct branch
[x] PR is labelled appropriately
[x] PR is linked to an issue

Ullaakut commented 2 years ago

Status Report

My current implementation uses one forest type for both what we need in the mapper, which is to trace the steps between multiple state commitments and their respective tries, and the features needed by Flow Go for a Forest.

All node/trie/forest flattening is also reimplemented by our package, in a way that fits our use case. Now the problem is that the FlowGo checkpoints contain encoded nodes which are not using our format but theirs. Their checkpoints do not have extension nodes or anything like that, they just contain standard tries that are full of branches.

Potential solutions include:

Build the trie as-is in the checkpoint, and therefore save no memory (this is a bad solution, but it's there) [🎉]
Only look at leaves when reading the checkpoint, and reinsert them one by one into a new trie to get this new trie to have the optimal structure [❤️]
Build the trie as-is in the checkpoint, and optimize it afterwards by trimming unnecessary branches (would be a somewhat complex operation, but could theoretically be done progressively while the trie is in use) [🚀]

If you don't want to phrase your opinion but want to "vote" on your favorite solution, feel free to do so using the GH reaction that corresponds to the outlined proposal.

ricardomgoncalves commented 2 years ago

Status Report

My current implementation uses one forest type for both what we need in the mapper, which is to trace the steps between multiple state commitments and their respective tries, and the features needed by Flow Go for a Forest.

All node/trie/forest flattening is also reimplemented by our package, in a way that fits our use case. Now the problem is that the FlowGo checkpoints contain encoded nodes which are not using our format but theirs. Their checkpoints do not have extension nodes or anything like that, they just contain standard tries that are full of branches.

Potential solutions include:
* Build the trie as-is in the checkpoint, and therefore save no memory (this is a bad solution, but it's there) [tada]

* Only look at leaves when reading the checkpoint, and reinsert them one by one into a new trie to get this new trie to have the optimal structure [heart]

* Build the trie as-is in the checkpoint, and optimize it afterwards by trimming unnecessary branches (would be a somewhat complex operation, but could theoretically be done progressively while the trie is in use) [rocket]
If you don't want to phrase your opinion but want to "vote" on your favorite solution, feel free to do so using the GH reaction that corresponds to the outlined proposal.

From what I read, I think the question would be like, does it matter to have a long loading phase [heart] otherwise [rocket]. I think the [heart] would be simple enough and would bring more benefits.

Maelkum commented 2 years ago

Currently I'm leaning towards the second option. Using the third option would basically mean that, in total, the memory requirements would remain the same as they are now, right? Even though the memory usage would drop eventually, one would still have to have it available for the initial phase..?

What are the downsides of ❤️ ? (This is the closest I've ever been to writing a country song 😄 )

Ullaakut commented 2 years ago

Currently I'm leaning towards the second option. Using the third option would basically mean that, in total, the memory requirements would remain the same as they are now, right? Even though the memory usage would drop eventually, one would still have to have it available for the initial phase..?

What are the downsides of ❤️ ? (This is the closest I've ever been to writing a country song 😄 )

It would mean that initially we load a large trie and progressively trim it off. Basically the trade off is more memory usage in exchange for start up speed. For the second solution, the inconvenient is that it might be a bit slower to start off, since instead of just loading nodes, we need to recreate the trie completely. We don't need to recompute hashes however so it might not be that bad.

Another issue with solution #2 is that checkpoints contain a forest of tries, and if they have multiple tries, they are stored as a slice of tries and a slice of nodes, and the only way to know which nodes belong to which tries is by building their trie currently, since the nodes have indexes pointing to their left and right children, and the tries just have the index of their root node in the node slice. If we want to read only leaves and insert them, we'll need to figure out a way to know which trie they belong to. This is what I'm planning on trying to figure out this morning :p

Ullaakut commented 2 years ago

Status Report

In the end, the chosen solution was to read all of the nodes and then, one by one, read the leaves from each trie and rebuild a new trie from those leaves. This required the following:

Adding NewXFromHash functions for each kind of node, so that they get created without a need for the payloads.
Adding trie.Leaves which goes through the whole trie and fetches all of the leaves (could be replaced by a map[ledger.Path]hash.Hash if we want to save some space.

This seems to work, however another issue came up: creating a new trie with those leaves, without having their payloads in the payload store, means that if new leaves are added in the future and we need to recompute the hash of a moved node, we might not be able to, and this would result in a fatal error. Because of this, we need to populate our store while we read the checkpoint. Encoded leaves contain payloads, which can be inserted in the store using the leaf's hash as the key.

This makes processing a checkpoint an order of magnitude or two slower than it was without it. When we didn't process the payloads, processing a 30GB checkpoint took about 20mns on my machine, but when we need to write payloads (in memory at first, but periodically/on evictions on disk) this process now takes over 8 hours (I say over 8 because I stopped it at that point, and it was almost done processing the nodes and inserting things on disk, so I assume it would have taken 9-10 hours tops).

This, plus the fact that it takes quite some memory to load all of the nodes from the checkpoint, lead me to believe that I reached a point where my workflow's bottleneck is that the checkpoint I have is too large. I need a smaller one to process, which still contains realistic payloads and a complete trie. Otherwise, I'll have to wait 10 hours between each test run of my changes, which isn't great since what I'm currently working on is what happens after we process all of the nodes, which is the longest part of the process.

In order to speed things up, I'll try including multiple writes bundled together in a single transaction, the same way we do in the Flow DPS. This creates a potentially fatal edge case that will need to be dealt with however: if a payload is part of a hanging transaction, waiting to be written on disk, it is temporarily no longer accessible and if a new insertion requires it, it might not find this payload and therefore not be able to recompute the node hash, which would result in a crash. We could prevent this in many ways.

FIXMEs left

[x] ~Fix edge case where payload is waiting for commit but required for node insertion~ Has been solved by retrieving from the current transaction if key is missing from cache, and from disk if missing from transaction. ⚠️ ⚠️ ⚠️ TODO: Verify that this is concurrency-safe and doesn't require locking the mutex.
[x] In our Forest implementation, when we add an update, we apply it the current trie that the update is for, instead of creating a new trie with that update. This is probably not the intended behavior and will cause problems.
[x] Proper testing
[ ] Compress payloads when writing them into the database using payload dictionary.
[ ] Regenerate payload dictionary with latest checkpoint data to improve compression ratio.

To be done later

[ ] For extensions, store only the skipped part of the path rather than the whole path (currently it's a ledger.Path which is a fixed-size array, so we'd need to change it to a []byte for all nodes)
[ ] Proper error handling when we have a missing payload from the payload store (should be a fatal error with proper context, is currently a panic)
[ ] Better error handling for unsafe read failures (shouldn't happen, but doesn't cost much to properly handle them rather than panic)

Ullaakut commented 2 years ago

Status Report

I am currently working on the following fixes:

The Store's purging routine doesn't need to manually call write, this actually results in redundant writes. Calling s.cache.RemoveOldest() is enough since it triggers the eviction callback.
The Store's edge case is fixed on my branch, by reading values from the ongoing transaction if it's missing from the cache, and only attempt to read from the DB if it's not in the transaction either. ~I currently have an issue with this solution that I need to solve. Actually if it works the way I think it does, it might not even make any sense to do a DB read if a value is missing from the in-progress transaction, since this transaction is effectively the latest representation of the persistent storage.~
~The aforementioned fix causes a test to not pass, because an old value that has been evicted is no longer accessible and causes a panic~
The trie.Insert method had an unused/nonsensical case handled which is removed (if match == extension node height, which should be impossible).

In Progress

Forest tests. There is a lot of work to do on the forest stuff, as I said before this is the most volatile part of the PR at the moment.

Ullaakut commented 2 years ago

As Faye highlighted it, there is currently an issue with this implementation which is that upon inserting new elements in tries, even if a new trie is created from the original one, the original trie gets mutated, while it should not. This might be why we have a mismatch on the mainnet-14 checkpoint, and it's a major different between both implementations. It needs to be fixed before the PR can be merged.

optakt / flow-dps