Closed fxamacker closed 9 months ago
Attention: 1110 lines
in your changes are missing coverage. Please review.
Comparison is base (
1e6ec55
) 64.93% compared to head (5f1de0a
) 62.62%.
:umbrella: View full report in Codecov by Sentry.
:loudspeaker: Have feedback on the report? Share it here.
Fantastic work Faye! 👏
Some notes from the first review session:
Set
and Insert
operations.I'll do a proper code review after the next review session. So far the work looks great, well done! 👏
@ramtinms @turbolent thanks for the fast PR reviews! I resolved all feedback that may require coding changes.
Ramtin, thanks for suggestions and approval! I will open separate PR to update smoke tests.
Bastian, can you please take another look?
UPDATE: I addressed all feedback including comments. I will open separate PR to update smoke tests and extra linting in CI.
@turbolent I found and resolved edge case found while adding comments about how inlining works on Sunday.
While at it, I also refactored, added more checks, and added no-op to parentUpdater we discussed last week. Thanks again for good discussions last week. :+1:
Fantastic work Faye! 👏
Here are the cleaned up notes I took from the last review session last Thursday. They might not be up-to-date anymore, as you might already have addressed some points since then, so please feel free to disregard if that is the case.
Maybe add some general/conceptual comments/explanations of how the inlining works, and what operations/cases need to be supported.
notifyParentIfNeeded
)setCallbackwithChild
) when that child gets mutatedIn the array/map remove operations, only a storable is returned. Similar, the set operations returns a potentially existing existing storable for the given key.
However, there might be values for the storable which have parent callbacks. Those callbacks should theoretically be unset. Loading the value for the storable does not make much sense, as current implementations will allocate another, new value.
I don't quite remember the plan for resolving this. Was the plan to leave the parent callback happening, detect the "invalid" child callback to the "previous" parent, and reject it?
Sometimes "value" is used in a variable name, even though it refers to a just a storable. For example, there was a case for extistingValue
, which was only a Storable
. Maybe check naming of such variables and update accordingly to avoid confusion.
We walked through the example case of inserting a child into a parent, mutating it, removing the child from that parent, mutating it, and then inserting the child into another container, and mutating it.
It would be good to have an integration test for this in both atree and Cadence.
We noticed that a mutation after the removal (or set operation), atree needs to uninline the child
A callback of a child to a previous parent was a fatal error. However, this is likely not an invalid case: The child value is mutated, so notifies its parent (parent callback was not unset, but also could not, see above). Instead of a fatal error, this callback should be a "no-op". Further, when the parent detects such a callback from a previous child, it should now use this opportunity to unset the parent callback from the child.
We talked through the "singleton value" case: StoredValue
might return same, existing value, and might not return a new value. It might be good to not make any assumptions about the nature of the returned value (e.g. that an implementation is required to always return a new value), or at least document the assumptions properly.
This prevents regressions in the future, where such assumptions might be broken by the user of the library, i.e. Cadence.
On the Cadence side, there is the implicit invariant that all values stored in an account always reference values in the same account, and never across accounts. We should add defensive assertions for this invariant, i.e. check that the container and child addresses match.
Finally, we discussed the encoding, in particular, the deduplication of type information.
When types are dedupliated, there seeds are normalized.
In atree comments/docstring, in Cadence comments/docstrings, and in the Cadence user documentation, we should emphasize that iteration order is undefined (but deterministic of course), and no assumptions should be made during execution or across transactions. For example, just because iteration over the keys of a dictionary has a particular order in one transaction does not mean that it will be the same in the next transaction.
The deduplication optimization will change the current behavior, which some users might rely on (but should (have) not).
I will have another look over the changes of the PR today and will open issues for the TODOs on the Cadence side mentioned above.
@fxamacker Would be good to sync on remaining work, see the (sorry, long) comment above. If there isn't really anything left to be addressed, this should be good to go 🎉
Additional tests and adjustments (if needed) can be done in follow-up PRs, this one is already very large
I had a look at adding the container/child address matching checks to Cadence: Just before performing a mutating operation on a container (array/map), we already transfer the child, so checking the invariant does not really help much: if we insert the assertion we can also check the transfer.
On the atree side, it's not easy to check this invariant: children are values at first, and getting the storable (which is needed to get the potential storage ID) is delayed until the very "end", e.g. deep down when the container/parent is not available anymore.
Maybe there isn't much we can do here
@turbolent Thanks for detailed review and meeting notes!
I think everything discussed so far have been addressed in this PR. :tada:
More comments were added about how atree inlining works in commit a75e3886e2987a452487a5ed02559c284a4b096c
Commit 63ea7a74dd58db15fc871c40f4c3c47772dc8c81 handles values with outdated parent callbacks (value was removed or overwritten in parent container):
Commit 8d02bbcd024cf77a14b6feb31ca7a4aadec330c7 renamed existingValue
.
Commit 042eb686fc19f48fe3e78a69b3cbe6de979a4001 unlined removed or overwritten child (if it is inlined)
I double-checked that atree doesn't make assumption about value returned from StoredValue()
.
About "invariant that all values stored in an account always reference values in the same account"
atree.CheckStorageHealth()
checks that all child or referenced slabs have the same address and atree validation functions also checks inlined slabs in memory have the same address as its parent. I opened these issues to add more integration tests:
BTW, can you please take a look at PR #345? That is the 2nd of 2 atree inlining PRs.
Updates epics #292 https://github.com/onflow/flow-go/issues/1744 Updates #296 #297 #347 #348 https://github.com/onflow/cadence/issues/1854 Required by Cadence Integration with Atree https://github.com/onflow/cadence/issues/2809 Required by Atree Storage Migration:
NOTE: some of the above updated issues will be closed by this PR:
Problem
Memory use on execution nodes, register count, and growth rate can be improved.
Currently, every array or map are stored in its own slab and parent slab refers to child array or map by SlabID.
The current approach can lead to many small slabs, especially for Cadence data structures with multiple nested levels.
Another aspect is duplicate data, especially when encoding nested Cadence composites. The design of atree inlining already requries changing the encoding format, so use this opportunity to also deduplicate data when encoding nested composites.
Changes
Reduce register count by inlining and optimize encoding size by deduplicating Cadence composite types within slabs.
Inlining. This commit inlines child array/map in parent slab when:
Data deduplication. This commit optimizes encoding size by:
Also update debugging code to handle inlined array/map element.
TODO
In this PR
Atree Inlining Part 2
Open new PR to add support for inlinability of child values returned by iterator (requires PR #329)
Cadence Integration
Open new PR in onflow/cadence to integrate with Cadence.
main
branchFiles changed
in the Github PR explorer