onflow / atree

Atree provides scalable arrays and scalable ordered maps.
https://onflow.org
Apache License 2.0
39 stars 13 forks source link

Atree Register Inlining and Data Deduplication #342

Closed fxamacker closed 9 months ago

fxamacker commented 10 months ago

Updates epics #292 https://github.com/onflow/flow-go/issues/1744 Updates #296 #297 #347 #348 https://github.com/onflow/cadence/issues/1854 Required by Cadence Integration with Atree https://github.com/onflow/cadence/issues/2809 Required by Atree Storage Migration:

NOTE: some of the above updated issues will be closed by this PR:

Problem

Memory use on execution nodes, register count, and growth rate can be improved.

Currently, every array or map are stored in its own slab and parent slab refers to child array or map by SlabID.

The current approach can lead to many small slabs, especially for Cadence data structures with multiple nested levels.

Another aspect is duplicate data, especially when encoding nested Cadence composites. The design of atree inlining already requries changing the encoding format, so use this opportunity to also deduplicate data when encoding nested composites.

Changes

Reduce register count by inlining and optimize encoding size by deduplicating Cadence composite types within slabs.

  1. Inlining. This commit inlines child array/map in parent slab when:

    • child array/map fits in one slab (root slab is data slab)
    • encoded size of inlined child array/map is less than the max inline size limit enforced by parent
  2. Data deduplication. This commit optimizes encoding size by:

    • reusing inlined array types
    • reusing seed, digests, and field names of inlined composite types

Also update debugging code to handle inlined array/map element.

TODO

In this PR

Atree Inlining Part 2

Open new PR to add support for inlinability of child values returned by iterator (requires PR #329)

Cadence Integration

Open new PR in onflow/cadence to integrate with Cadence.


codecov-commenter commented 9 months ago

Codecov Report

Attention: 1110 lines in your changes are missing coverage. Please review.

Comparison is base (1e6ec55) 64.93% compared to head (5f1de0a) 62.62%.

Additional details and impacted files ```diff @@ Coverage Diff @@ ## main #342 +/- ## ========================================== - Coverage 64.93% 62.62% -2.31% ========================================== Files 14 15 +1 Lines 8811 10559 +1748 ========================================== + Hits 5721 6613 +892 - Misses 2356 3004 +648 - Partials 734 942 +208 ``` | [Files](https://app.codecov.io/gh/onflow/atree/pull/342?src=pr&el=tree&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=onflow) | Coverage Δ | | |---|---|---| | [encode.go](https://app.codecov.io/gh/onflow/atree/pull/342?src=pr&el=tree&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=onflow#diff-ZW5jb2RlLmdv) | `80.72% <100.00%> (+4.25%)` | :arrow_up: | | [storable.go](https://app.codecov.io/gh/onflow/atree/pull/342?src=pr&el=tree&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=onflow#diff-c3RvcmFibGUuZ28=) | `59.37% <ø> (ø)` | | | [basicarray.go](https://app.codecov.io/gh/onflow/atree/pull/342?src=pr&el=tree&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=onflow#diff-YmFzaWNhcnJheS5nbw==) | `49.38% <70.00%> (+0.01%)` | :arrow_up: | | [storage.go](https://app.codecov.io/gh/onflow/atree/pull/342?src=pr&el=tree&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=onflow#diff-c3RvcmFnZS5nbw==) | `73.09% <52.00%> (-0.60%)` | :arrow_down: | | [typeinfo.go](https://app.codecov.io/gh/onflow/atree/pull/342?src=pr&el=tree&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=onflow#diff-dHlwZWluZm8uZ28=) | `60.32% <60.32%> (ø)` | | | [map\_debug.go](https://app.codecov.io/gh/onflow/atree/pull/342?src=pr&el=tree&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=onflow#diff-bWFwX2RlYnVnLmdv) | `43.65% <49.34%> (-8.51%)` | :arrow_down: | | [array\_debug.go](https://app.codecov.io/gh/onflow/atree/pull/342?src=pr&el=tree&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=onflow#diff-YXJyYXlfZGVidWcuZ28=) | `47.44% <49.04%> (-4.67%)` | :arrow_down: | | [array.go](https://app.codecov.io/gh/onflow/atree/pull/342?src=pr&el=tree&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=onflow#diff-YXJyYXkuZ28=) | `68.10% <62.09%> (-1.97%)` | :arrow_down: | | [map.go](https://app.codecov.io/gh/onflow/atree/pull/342?src=pr&el=tree&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=onflow#diff-bWFwLmdv) | `65.30% <58.52%> (-1.78%)` | :arrow_down: | ... and [1 file with indirect coverage changes](https://app.codecov.io/gh/onflow/atree/pull/342/indirect-changes?src=pr&el=tree-more&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=onflow)

:umbrella: View full report in Codecov by Sentry.
:loudspeaker: Have feedback on the report? Share it here.

turbolent commented 9 months ago

Fantastic work Faye! 👏

Some notes from the first review session:

I'll do a proper code review after the next review session. So far the work looks great, well done! 👏

fxamacker commented 9 months ago

@ramtinms @turbolent thanks for the fast PR reviews! I resolved all feedback that may require coding changes.

Ramtin, thanks for suggestions and approval! I will open separate PR to update smoke tests.

Bastian, can you please take another look?

UPDATE: I addressed all feedback including comments. I will open separate PR to update smoke tests and extra linting in CI.

fxamacker commented 9 months ago

@turbolent I found and resolved edge case found while adding comments about how inlining works on Sunday.

While at it, I also refactored, added more checks, and added no-op to parentUpdater we discussed last week. Thanks again for good discussions last week. :+1:

turbolent commented 9 months ago

Fantastic work Faye! 👏

Here are the cleaned up notes I took from the last review session last Thursday. They might not be up-to-date anymore, as you might already have addressed some points since then, so please feel free to disregard if that is the case.

I will have another look over the changes of the PR today and will open issues for the TODOs on the Cadence side mentioned above.

turbolent commented 9 months ago

@fxamacker Would be good to sync on remaining work, see the (sorry, long) comment above. If there isn't really anything left to be addressed, this should be good to go 🎉

Additional tests and adjustments (if needed) can be done in follow-up PRs, this one is already very large

turbolent commented 9 months ago

I had a look at adding the container/child address matching checks to Cadence: Just before performing a mutating operation on a container (array/map), we already transfer the child, so checking the invariant does not really help much: if we insert the assertion we can also check the transfer.

On the atree side, it's not easy to check this invariant: children are values at first, and getting the storable (which is needed to get the potential storage ID) is delayed until the very "end", e.g. deep down when the container/parent is not available anymore.

Maybe there isn't much we can do here

fxamacker commented 9 months ago

@turbolent Thanks for detailed review and meeting notes!

I think everything discussed so far have been addressed in this PR. :tada:

BTW, can you please take a look at PR #345? That is the 2nd of 2 atree inlining PRs.