Compact Trie Proof format

cheme commented 2 years ago

I currently make it a draft as it is something that seems good to switch to to me but is not strictly needed, and current .

It comes form conversation I had with @tomaka, and the realization that the current compact proof format can be a bit awkward.

So opening this is for initial feedbacks.

The design is largely inspired by turbogeth one but without the useless instructions and taking in consideration the possibility to have a sequence of compact proof. (proto implement the eth variant but I don't want to describe them, just mentionning it is not incompatible).

So I create this PR to start discussion here.

Some part are undefined. More notably all the codec question as it is rather boring things that needs to be defined depending on frequency of nodes and other consideration but can be completed in a second step: my proto just use scale codec instead.

The intention would be to replace all existing Compact proof usage and extend to light client rpc too.

The algorithm here is mostly the decoding one, the algorithm for building can also be added, but it can be inferred from the decoding one, and the current implementation is a bit related to parity trie crate node iterator internals.

tomaka commented 2 years ago

Correct me if I'm wrong: right now, the existing compact proof format is used for:

State requests (I haven't checked, but I think you said that?)
Within parachains POVs. However this is purely an implementation detail of specific parachains (i.e. the parachain generates the proof then later decodes it, but the relay chain only sees an opaque blob), and thus isn't part of Polkadot.

So if we modify the format, these are the things that would break/change.

cheme commented 2 years ago

* State requests (I haven't checked, but I think you said that?)

Only warp sync uses it (I mean not the current light client rpc). But we can probably version the protobuf schema somehow.

* Within parachains POVs. However this is purely an implementation detail of specific parachains (i.e. the parachain generates the proof then later decodes it, but the relay chain only sees an opaque blob), and thus isn't part of Polkadot.

Yes, changing the compact proof in cumulus would still required collator to update their client to create a pov compatible with the new pvf (this can be tricky but we did it in the past when adding the first compact proof format), but on the verification side all decoding is in the pvf.

tomaka commented 2 years ago

For what it's worth, my immediate reaction to reading the draft is: I have no idea what to think of this, it's really too complicated and too big of a leap for me to figure out what the advantages and drawbacks of this format are.

cheme commented 2 years ago

Psp may go a bit too far. Short description of format would be (complexity comes mostly from sequence of proof that can be out of the psp and rules to prevent two different encoding for a same state):

Sequence of instruction:

PushKey(nibbles): append nibbles to current cursor
PopKey(nb nibbles): remove n nibbles from the current cursor
value(bytes): set value at the current cursor
hash(index, hash): set child hash in one child of the current cursor

So it is mostly description of proof content, ordered by trie node iteration.

One discutable point is that value is written when node get accessed, and hashes are only written when node get out of scope.

I will try to write the decoding in a recursive pseudo-code form later (which may be clearer than the stack based algorithm describe here).

lamafab commented 2 years ago

This PSP has quite a bit of typos and it's quite hard to follow. Imo, once you officially decide to use your codec2 implementation in Substrate, I will need to spec it for spec.polkadot.network anyway. As of now, this PSP reads mostly like a general overview rather than a specification.

I'll leave this open, but it probably needs to be written differently in the future.

cheme commented 1 year ago

Small note about writing the child hash when poping node and child value after push: it allows saving a key push operation when there is no values. When writing something else it also can be rather awkward if the proof does skip some content, then due to inline node we can have a non sequenced child access. I end up struggling a bit with it with some algo. So it now make sense to change to something strictly sequenced (operation strictly following child index sequence: and value being considered as index -1). The example can become:

KeyPush: b"b" without last nibble (1)
HashChild: SomeOtherHash at 1 (b"alfa" branch)
KeyPush: b"bravo" without first (9)
Value: b"bravo"
KeyPop: 9
KeyPush: b"do" without first nibble (3)
Value: b"verb"
KeyPush: b"g" all nibbes (2)
Value: b"puppy"
KeyPush: b"e" all nibbes (2)
Value: [0; 32]
KeyPop: 7
KeyPush: b"hor" skipping first nibble and last nibble (4)
HashChild: SomeHash at 2 (second nibble of b"r")
KeyPush: b"use" skipping first nibble (5)
Value: b"building" // This is not queried but include because inline value of an included node

so I am closing and will open it again when I got branch rewritten and this written in a simpler way.

w3f / PSPs

Compact Trie Proof format #56