onflow / atree

Atree provides scalable arrays and scalable ordered maps.
https://onflow.org
Apache License 2.0
39 stars 16 forks source link

Question about design considerations #254

Closed bluesign closed 2 years ago

bluesign commented 2 years ago

Finally I found some time to dive in to atree. It looks like a great piece of software. When trying to understand the architecture, some questions popped up in my mind. I would be really happy if you can answer those.

Thanks in advance

bluesign commented 2 years ago

btw main reason I am asking this is to make comments mutually beneficial.

For example: I have opinion that slabs should backlink to their parent. But not sure if it aligns with current objectives.

fxamacker commented 2 years ago

Hi @bluesign :wave:

Is atree Flow Blockchain specific ? Actually better question which parts are targeted at Flow Blockchains goal ?

Maybe @ramtinms can comment on this and similar questions.

What was the design goals ? I see speed and minimum changes to storage ( minimum writes ) mentioned, can you expand a bit on those ?

Ramtin led the design and communicated the following design goals:

Basically, we wanted to store large dictionaries and large arrays in a scalable way across multiple registers. We wanted to limit register sizes without increasing number of registers any more than necessary.

Maybe Ramtin can help expand a bit more on this if needed.

For example: I have opinion that slabs should backlink to their parent. But not sure if it aligns with current objectives.

This is an interesting idea. How would you use those backlinks?

ramtinms commented 2 years ago

Maybe the only extra requirement that we considered for the purpose of Flow was minimizing bytes read instead of having a fully balanced tree (tolerating size changes up to some point).

For the backlink to parents: we initially had it in the design but we removed it for two reasons,

bluesign commented 2 years ago

Hey @fxamacker @ramtinms,

Thanks for the answers, I am very new to atree and tries, so it may be possible I am missing something obvious, sorry for that in advance. As a context, I am trying to code a "versioned storage (resource) browser" for Flow basically, it has been a challenging task to say at least. (My target is to run it with minimal RAM.) Actually I managed pretty good so far with checkpoint loading (thanks to new checkpoint improvements)

For the backlinks part: I am trying to find some kind of delta update on atree from trie updates (which I am getting from google execution state bucket, I guess probably soon will be able to request from network)

So if account had let's say a storage map at the root, then deep somewhere some value changes, I figured out I have no way of knowing ( traversing from bottom to top ) without traversing all account storage. I think anything deeper than 1 level has this problem.

Actually what @ramtinms said my first guess, copy-on-write optimizations, or slab deduping to optimize for storage. Also I understand that there a lot of nodes ( registers ) and saving every byte counts. But I pretty convinced myself my use-case is not so crazy :)

Another question comes into mind, again totally uneducated bystander comment.

I am trying to see the whole picture, there are a lot of registers, a lot of memory. Why can't let's say atree generate a cryptographically secure hash for OrderedMap and use that in mtrie ? hash(path, value) can easily change to hash(path, hash(value))

Isn't Daily Active User / Total User of Flow enough low to warrant a LRU cache instead of loading all trie into memory ?

Thanks in advance for taking time to answer.

ramtinms commented 2 years ago

@bluesign very interesting questions and ideas.

For the backlinks part: I am trying to find some kind of delta update on atree from trie updates (which I am getting from google execution state bucket, I guess probably soon will be able to request from network)

Resource tracking especially when they are nested is a very interesting use-case for several applications and I totally get the value of it and is something we have tried to see if we can solve it on different levels above atree: for example, one idea was to do solve it by special events, or exposing extra traces about it on a higher level. Another idea of a secondary ownership index built by execution/access nodes. Unfortunately solving it by adding parent pointers at the time was a bit opposite to one of the main objectives of the atree design, which was minimizing bytes touched when resources are moved. For example, if you have a container (A) of container (B) of resources (C), the idea is if you change ownership of B and move it to another container, we are trying to avoid touching anything under B and all resources on C. Touching more registers would make the chunk data packs greatly larger and would slow down the network performance. In practice, when we started integrating into FVM, we hit a few blocks to unlock that level of optimization (we had to compute the storage changes and min balance stuff, so we had to physically move the registers related to B and C to the new owner) So we might be able to revisit and see if we could use backlinks or separate out ownership as a secondary meta table and keep the atree pointers only for inclusion, that way you don't even need to load the data, you can check what is owned by who. Ownership modeling was something we decided not to tackle for V1 as it was not trivial how to solve it in an efficient way and collect more data for V2. I'll make sure to include you in the conversations for V2 when we start it.

Isn't atree and mtree accomplish same thing ? Isn't it some kind of inception to put atree inside mtree?

Atree is a layer above the mtrie, atree mostly tries to solve the problem of scalable array and dictionaries for the cadence, and would only assume it has access to a key/value store. mTrie is used by the protocol to provide provable key/value storage. I love your line of thinking here and I actually have similar ideas to make the atree verifiable and we do provide features such as forests etc on atree level, though that requires a comprehensive security review and proof design. so it was also something that looked requires a lot of work for V1, so we decided to postpone it for future editions.

Isn't Daily Active User / Total User of Flow enough low to warrant a LRU cache instead of loading all trie into memory ?

that's part of the plan to eventually move from all in memory to do hybrid of memory / disk, so far keeping everything in memory was the fastest way, over time we would change that. Faye already looking into if we can store leafes or parts of the tree on disk, without sacrificing that much performance.

bluesign commented 2 years ago

Thanks for the answers @ramtinms

Just one question about:

For example, if you have a container (A) of container (B) of resources (C), the idea is if you change ownership of B and move it to another container, we are trying to avoid touching anything under B and all resources on C. Touching more registers would make the chunk data packs greatly larger and would slow down the network performance.

Is this true now? Isn't changing ownership of B is automatically changing ownership of C? I always imagined changing ownership of B would require migrating everything under B, then it would disturb the tree.

bluesign commented 2 years ago

closing this issue, you both have been very helpful, thanks a lot.

ramtinms commented 2 years ago

@bluesign that was the hope for the V1, to support move of a large collection with minimal changes, using a global slab allocation and a virtual usage and ownership tracker instead of physically moving objects around, though as I mentioned it was not successful during integration and we had to put it on hold until we figure out how to deal with the storage used and minimum balance in a way that doesn't require physical registers to be stored under accounts. in reality ownership can be detached from the object itself and save rewriting the object over and over. It could also make the object ownership tracking way easier, still many ideas still in that space, we just need to explore the governance and FVM requirements further before designing V2.