thrumdev / nomt

Apache License 2.0
73 stars 4 forks source link

Lookup and update API #1

Closed rphmeier closed 5 months ago

rphmeier commented 6 months ago

There is quite a lot of logic which is orthogonal to the backend. In particular:

For these use-cases, we can decouple from the backend and work entirely in memory.

From the perspective of the traversal and update APIs, we should require that all required pages are passed into the function, i.e. they are already loaded. Determining the pages to load and loading them should be done with a backend-specific API.

Here is a very rough draft of the trie operations API.

/// Compute the needed page IDs according to the provided keys.
fn needed_page_ids(keys) -> Vec<PageId>;

/// Prove lookup for a single key
/// Fails on missing page or value mismatch.
fn prove_lookup(root, key, expected_value_hash, &PageMap) -> Result<Proof, LookupError>;

/// Prove lookup for multiple keys.
/// Fails on missing page or value mismatch.
fn prove_multi_lookup(root, [(key, expected_value_hash)], &PageMap) -> Result<Proof, LookupError>;

/// Update the trie with new key/value pairs.
/// Produces a set of pages to be altered.
/// Fails on missing page.
fn update(root, [(key, value_hash)], &PageMap) -> Result<PageMap, LookupError>;

One important note is that we likely need to be generic over hash function.

pepyakin commented 6 months ago

There is quite a lot of logic which is orthogonal to the backend

Some of this contingent on the design of the rocksdb backend. E.g. 64 bytes in a page schema may only be needed for the optimized DB. In fact, maybe rocksdb implementation would benefit of bigger pages.

fn needed_page_ids(keys) -> Vec;

How would that be used?

Also it seems a number of APIs are missing. Not sure if that's because you meant to lay down only the MVP, but I am going to mention it just in case:

  1. value lookups.
  2. re-org awareness.

And just to refine the draft:

fn update(root, [(key, value_hash)], &PageMap) -> Result<PageMap, LookupError>;

since we discern between a lack of value and the default value (all zeroes, I suppose), we would need to pass Option<value_hash>.

rphmeier commented 6 months ago

we would need to pass Option.

This comes down to a decision whether having an empty value or having no value are considered to be identical or different. i.e. is every key in the mapping by default or not? I will need to review the design document. So we might also use hash([]) as the expected value hash in the case of no value. And a third option is to use the zero hash for no-value.


I thought about the traverse API some more, and I think we can do an even better job of decoupling from storage formats. For one, my initial proposal bakes in the idea of having "pages" at the level of this API. As an alternative, we could completely decouple and provide something like a Cursor which allows for traversing the trie in its binary-trie representation.

trait Cursor: Clone {
  type Error: std::Error;

  /// Bit-Path to the current location
  fn path(&self) -> BitPath;
  /// The 32-byte node hash at the current location.
  fn node(&self) -> Node;
  /// Traverse to the left child of the current node.
  /// Fails if there is no left child.
  fn seek_left(&mut self) -> Result<(), Error>;
  /// Traverse to the right child of the current node.
  /// Fails if there is no right child.
  fn seek_right(&mut self) -> Result<(), Error>;
  /// Seek upwards.
  /// Fails if the amount of links is greater than the path length.
  fn seek_upwards(&mut self) -> Result<(), Error>;
}

This cursor API could be implemented for all of the following:

We'd expose similar functions:

fn prove_lookup(root_cursor: &mut impl Cursor, key: BitSlice, expected_value_hash) -> Result<Proof, LookupError>;
fn prove_multi_lookup(root_cursor: &mut impl Cursor, [(key, expected_value_hash)]) -> Result<MultiProof, LookupError>;
fn update(root_cursor: &mut impl Cursor, [(key, value_hash)]) -> Result<TrieUpdate, LookupError>;

The TrieUpdate would include a list of nodes to delete, insert, etc. which could then be passed to the backend code.

pepyakin commented 6 months ago

This comes down to a decision whether having an empty value or having no value are considered to be identical or different.

Well, I assumed that those cases are different because nomt is supposed to power sov-sdk and that has Option<StorageValue>.

Special hash value may do but it seems to be cleaner to have Option anyway, and there are no obvious wins from using a magic value.

This cursor API could be implemented for all of the following:

This shape of API doesn't allow for parallelization. Once again, is this on purpose?

rphmeier commented 6 months ago

What kind of parallelization do you expect? Traversal seems quite single-threaded. Updates are too, because each update depends on the node alteration of the previous keys.

pepyakin commented 6 months ago

I meant the loading of pages in parallel, the secret-sauce of NOMT.

rphmeier commented 6 months ago

My intention is that the pages would be loaded in parallel ahead of calling this routine, which then works out of memory.

pepyakin commented 6 months ago

Gotcha. Then in that case, there should be a backend-agnostic way of communicating the information required to compute the required pages to the cursor and it must be a part of public facing API. What would it look like?

You also say,

The TrieUpdate would include a list of nodes to delete, insert, etc. which could then be passed to the backend code.

so the user would directly interact with the types of a concrete backend?

rphmeier commented 5 months ago

so the user would directly interact with the types of a concrete backend?

Yes, I think so. It is better for the user to be the bridge between the abstract logic and the concrete backend. Backends may differ in subtle ways and I would prefer not to attempt to encapsulate that until we are clear on exactly what that consists of. We may never nail it down entirely.

Then in that case, there should be a backend-agnostic way of communicating the information required to compute the required pages to the cursor and it must be a part of public facing API. What would it look like?

On that note, it seems that this answers this question as well. Some backends are going to be more intelligent than others. For example, a "dumb" rocksdb or in-memory hash-map needs only trie node hashes in order to be effective. Our more intelligent paging mechanism will need to do deep inspection of the loaded paths in order to determine page IDs to pre-fetch. It seems difficult to encapsulate in a general way, and will likely be better to reproduce bits of the tree logic within the backend. We may be able to extract some helper functions that operate only on the schema and abstract nodes, if they prove useful within multiple backends.

I can zoom out: it seems to me that we want to have:

  1. a very low barrier to entry for writing simple backends, such as those used for testing and comparison
  2. a very high degree of specificity in the API for the main backend, which uses the disk-backed hash table.
  3. independently testable code for common tree operations

It may be that the proposed Cursor API is not the best for serving these needs; there are clear implied issues, such as an excess of copying within memory during the update operations.

gabriele-0201 commented 5 months ago

You talked about different types of backends. It's not clear to me whether those backends should be aware of the concept of pages or not.

An important distinction we should make is: should each backend be aware of what a Page and PageId are? Should the backend only be responsible for storing and efficiently providing those pages?

If yes, then the only API needed by nomt-core is something to fetch pages on demand. They will be moved into the core, and as soon as a batch of updates is done, all modified pages will be returned to the backend, whose mandate is now to update them. In this scenario, all the logic of traversing and managing pages is handled by the core; even the translation between key-path and PageIds could be kept in the core.

The advantage of this approach is the ease of implementation, where each backend will only need to do a few tasks while most of the logic is in the core. The disadvantage is the limited types of backends that could be implemented compared to what was proposed previously.

I don't see great benefits to implement backends similar to node_hash - node just to benchmark internally. I think we should focus on paging and benchmark it against other existing databases instead

rphmeier commented 5 months ago

You talked about different types of backends. It's not clear to me whether those backends should be aware of the concept of pages or not.

An important distinction we should make is: should each backend be aware of what a Page and PageId are? Should the backend only be responsible for storing and efficiently providing those pages?

This is an important point. I believe that the generic trie operations should be completely agnostic of the storage format. This is for one crucial reason: we should support proof verification in storage-constrained environments. We are targeting two such environments: Polkadot PVF and Risc0 circuits. What this means is that proof verification and proof data structures should operate on nodes, not pages.

The other reason I suggest this is that it modularizes the code and reduces a dependency. This is, of the two, the weaker reason.

There is a case to limit this just to the verification routines, so that the proving and update routines operate on pages. That is likely to be faster, but probably not substantially, as this is mostly an I/O workload. I am not sure which is better yet.

I don't see great benefits to implement backends similar to node_hash - node just to benchmark internally. I think we should focus on paging and benchmark it against other existing databases instead

Agreed

pepyakin commented 5 months ago

I don't think I have a good grasp of the amounts of logic required for every path and every use-case, but I have a gut feeling, that it may be beneficial to split up those paths. One implementation would focus on the prover case (be it zk or collator), the other implementation would concern itself only with verification. That could make both implementations more straightforward and more efficient. On top of that the proof is a stable interface that we have to upkeep anyway.

Once again, not entirely sure how much sense it makes. It all depends on the amounts of duplication it'd yield and how sizeable the benefits are, if any. Just food for thought for now.

rphmeier commented 5 months ago

Even within verification I could expect some benefit in having separate implementations, APIs, and assumptions based on the platform where verification is done.

The maybe counter-intuitive thing is that the verification code is what's usually run during the proving step: generating the proof is done in an arbitrary, not resource or architecture-constrained environment. But verifying the proof is done inside of a ZK circuit or PVF or mobile device. That makes me think the general thrust of your point is accurate but slightly misplaced.

In any case, I don't personally have the information to decide what makes a ZK-optimized verification routine at this stage. Are we OK with the approach in #22 which provides basic implementations of both proving and verification?

rphmeier commented 5 months ago

We had a longer conversation about the cursor API and backend independence where we discussed the expected usage of the library.

Goals of Lookup and Update

Let's summarize the goals from that conversation:

  1. Separate page traversal from trie traversal. I don't mind if Cursor is a concrete type that uses pages under the hood, or if it's a trait that in theory allows for different types of backends. Supporting different types of backends is a non-goal, so it only really helps in tests. Whatever leaks the least and simplifies the code the most is best.
  2. Lightweight single-key traverse API. We need to check that a set of loaded pages are sufficient to traverse and read siblings for the keys we are updating and proving. Information should be provided on where the query fails, which in turn will be used to load more pages.
  3. Batch update. Given a set of keys, a set of updates to those keys, and a set of pages which encompass all sibling-traversals for those keys, perform the trie update calculation and produce new pages. Leaving proving to the side for now, this may be done here as well.

Context: Session Flow

This aligns with our intention of operating the DB as a series of sessions, which have three phases:

  1. The Run Phase: a computation is running which performs reads from the DB. We read values synchronously from the flat store and schedule the corresponding pages to be loaded in the Read phase. The Run phase concludes when the underlying computation concludes with a set of intended changes to key-value pairs. All keys in that set which were not previously read are passed to the read phase.
  2. The Read Phase: starts at the same time as the run phase and runs in parallel with it. Continually loads required pages for keys, tests whether those pages are sufficient with the single-key traverse API, fetches key preimages, and so on. This runs until all pages and other data for all keys from the Run phase are read. The Read phase can never conclude before the Run phase.
  3. The Closing Phase: starts when the Read phase concludes. Uses the pages and preimages from the Read phase to perform a root update calculation, prepare a diff for pages, and construct merkle proofs.

Timeline of a session: image

Run Phase / Read Phase interaction image

Read Phase interacting with page DB + write phase image