vulcanize / cosmos-sdk

:chains: A Framework for Building High Value Public Blockchains :sparkles:
https://cosmos.network/
Other
0 stars 1 forks source link

Ian/adr 040 update #20

Closed i-norden closed 2 years ago

i-norden commented 2 years ago

Currently, at the SDK layer we maintain two additional kv mappings for state storage access:

B1: key => value so that we can lookup value from key without needing to do any SMT tree access or traversal. B2: hash(key) => key so that we can lookup value from within the context of iterating the tree or any other time we don’t have the key preimage.

B2 is not used from within the context of SDK state execution.

The SMT implementation already stores a hash(key) => val mapping (see https://github.com/celestiaorg/smt/blob/master/smt.go#L118 and https://github.com/celestiaorg/smt/blob/master/smt.go#L302), as such this PR proposes we make use of that existing mapping as the inverse index instead of a new one at the SDK layer. This would require one less read op (hash(key) => value vs hash(key) => key, key => value) to get from a leaf node (leafPrefix || hash(key) || hash(value)) to a value and removes the extra write ops and data duplication of B2.

A downside of this change is that we lose the ability to retrieve the key from the context of SMT traversal. In most cases this isn’t an issue. For cases where the hash(key) => key mapping is needed externally (e.g. when representing full state as IPLD on IPFS, we want to be able to retrieve the key from within the context of the IPLD DAG) you can create a preimage mapping by listening to KVPairs.

i-norden commented 2 years ago

@roysc plz ignore the commit mess, I have no intention of merging this here. Just opening it to solicit your feedback before upstreaming (or not). Only file updated is the ADR-040 doc, but you shouldn't need to look at the file changes the PR comment explains it in more detail.

AFDudley commented 2 years ago

We should make sure we've have a performant way of generating those pre-images, compared to how we generate them our current Ethereum stack.

i-norden commented 2 years ago

@AFDudley collecting them efficiently in real time is/will be no issue using ADR-038 listening or the to-be-proposed SMT listening (spec started here: https://github.com/vulcanize/cosmos-sdk/blob/adr_051/docs/architecture/adr-051-state-commitment-listening.md but needs updating).

But as with geth, collecting historical preimages while performing a diff iteration would not be possible without this mapping. But since we are using DB snapshots and because of the B1 bucket (key => value) we can readily iterate over a select key space at a select block height at the underlying database level and extract preimages in that manner.

i-norden commented 2 years ago

@roysc @AFDudley want to re-iterate and follow up on some of what we discussed this morning. In part because I think I explain things better in text.

The issue with this- as Roy pointed out- is that we actually do need the preimages for ics23 non-membership proofs. But since generating these proofs isn't necessary for the core functionality of the SDK, this preimages mapping could still be an optional bucket much like key preimage mapping is turned on or off in geth using the --preimages flag, or the onus of preimage collecting/mapping could be left to an external service that generates that mapping either by iterating the underlying database snapshot or by listening to streamed KVPairs.

Also, I'd like to follow up on this a bit more because I am a little confused since the SMT itself supports non-membership proofs without needing this key preimage mapping e.g. https://github.com/celestiaorg/smt/blob/6e634fe4424005b4bce55a56a7a559412b4152b3/smt.go#L422. So this requirement that the key of neighboring leafs be provided must be a requirement of the ics23 format?

As an aside, we can potentially look at this de-duplication from the other side of the coin: we could implement a version of the SMT that does not maintain this hash(key) => value mapping under-the-hood. The SMT would no longer be able to support direct value lookup (without traversing the tree) like it currently does, but for its use inside the SDK this is fine since we rely on the B1 mapping for that.

roysc commented 2 years ago

So this requirement that the key of neighboring leafs be provided must be a requirement of the ics23 format?

Right, it's specific to ICS23. The format used internally in smt is a simpler one.

As an aside, we can potentially look at this de-duplication from the other side of the coin: we could implement a version of the SMT that does not maintain this hash(key) => value mapping under-the-hood.

This is something I proposed to celestia when I was implementing orphan removal, but they weren't interested. It will be easy enough to fork and make that change, as well as remove the internal key hashing. I think that's the way to go.

github-actions[bot] commented 2 years ago

This pull request has been automatically marked as stale because it has not had recent activity. It will be closed if no further activity occurs. Thank you for your contributions.