Open antiochp opened 4 years ago
The paper claims that "All of these checks ensure that the difficulty transitions of queried blocks are valid." which to me seems incorrect, unless they check every single difficulty adjustment, which they don't. How can they check that even two consecutive difficulty adjustments are correct, without knowing the intermediate data? It looks like their verification only checks whether the difficulty adjustments are possible, given certain limits on how fast difficulty can change (at most a factor 4 over 2016 blocks in bitcoin). Furthermore, the proofs don't show why this need only be checked at doubling intervals. If that suffices, then only doing it between consecutive sampled points could suffice as well. Which doesn't need any additional data to be hashed at internal nodes. I contacted Benedikt to ask for clarification.
only checks whether the difficulty adjustments are possible, given certain limits on how fast difficulty can change
Yes. That was my interpretation also. That a path in a Merkle proof would include increasing sizes of periods from a pair of headers at height 0 all the way up to the entire chain at the root. And that this difficulty adjustment check would be performed at each intermediate "aggregate" subtree.
Furthermore, the proofs don't show why this need only be checked at doubling intervals.
I was assuming this is not so much the doubling that is important, but inclusion under all ancestors above it in the MMR along the path to the root. That no ancestor on the Merkle proof path exhibits an invalid (invalid in aggregate) difficulty adjustment.
Hopefully Benedikt gets back to us with some clarification/confirmation soon. In the meantime I have spent a bit of time looking at what would be involved in supporting "sumtree" ("difftree" here I guess) functionality in our MMR structures - maintaining summary data in addition to the hash at each parent in the MMR.
This may end up being potentially useful for other things, regardless of what the outcome is here.
[WIP] PR coming shortly that explores this.
at most a factor 4 over 2016 blocks in bitcoin
If we have the start time and the end time of the aggregate period (based on first and last header timestamps) is this further constrained?
WIP PR is here - https://github.com/mimblewimble/grin/pull/3480
Still working on getting this into a state where it compiles, but the basics are there and this approach appears to be do-able. Might be a PITA to deal with pre and post HF for eternity though.
Please hold off on rewriting hash_with_index. I'm entirely unconvinced we need that. In my last email to Benedikt I wrote:
Wouldn't it suffice to check that the cumulative difficulty change between any pair of successive sample points is consistent with the time interval bet ween them? With all your internal node checks, you seem to be checking pretty much the same thing, since there is always a largest (aligned) 2-power sized interval in between successive samples of which you only check the boundary condition.
(still no response)
Related: ZCash ZIP 22 "FlyClient" - https://zips.z.cash/zip-0221
https://github.com/therealyingtong/zips/blob/master/zip-0221.rst https://github.com/zcash/zcash/pull/4264
@tromp The more I think about this the more convinced I am that you are right to question this.
(I'm just repeating what we know here for my benefit and reference later).
In terms of sampling headers based on difficulty then inclusion proofs would suffice and we don't need aggregate data in the proof itself. i.e. To prove we are sampling a header at some difficulty (rather than some specific pos) we can provide the header immediately higher than difficulty x
and the header immediately lower than difficulty x
and show they are consecutive in terms of pos (so no other header exists closer to x
). Proofs will be larger as we need 2 consecutive headers per sampled difficulty.
From the DAA RFC -
wtema is simpler to state, requires only the last 2 blocks, ...
Am I correct in assuming the new DAA will allow us to validate each sampled header difficulty based on 2 previous headers. So if we were to provide 3 headers per sampled difficulty then we could validate difficulty for individual sampled headers?
And then, as you mention above, consecutive sampled headers could be used to construct time periods across which we can validate boundary conditions of the overall difficulty adjustments across each period. Presumably if we wanted more granularity we could sample more headers? Or randomly sample headers based on difficulty as described in FlyClient paper and then additionally sample midpoints between these (or some other subsampling approach)? Although any approach can potentially be simplified by just sampling more headers randomly?
The approach in the FlyClient paper using nodes in the Merkle proof as doubling of period looks only at MMR position and this feels like it conflicts with the need to look at total difficulty and not position (due to varying difficulty over time). Looking at pos here feels somewhat arbitrary, simply because we have them in the Merkle proofs.
FlyClient (revised Aug2020): https://eprint.iacr.org/2019/226.pdf
Our header MMR contains non-leaf nodes that are defined simply as -
To implement FlyClient such that we can efficiently verify difficulty transitions, the paper states that we need to redefine non-leaf nodes as -
i.e. Rather than simply storing intermediate hashes for non-leaf nodes we store difficulty related information, representing contiguous lists of headers (periods of time) beneath subtrees of the header MMR. Otherwise there is no efficient way to determine, for example
w
the total difficulty of the set of headers beneath the subtree root, orDstart
andDnext
the initial and next difficulty targets within the period beneath the subtree root.We used to have the concept of a
sumtree
in the early days of Grin. This maintained both the hash and the sum of all commitments beneath a given subtree root in the output MMR. I believe the intent was to leverage the sum for utxo related calculations but we stopped using this once it became apparent that output MMR was only ever going to represent the full TXO and not the UTXO (or STXO).The approach discussed in the FlyClient paper is similar conceptually to our
sumtree
, but for difficulty related information.lc.t + rc.t
is confusing givent
is a timestamp related to each leaf node block header. Presumably it would make more sense to store the delta (the time taken to mine a set of blocks) in non-leaf nodes?n
does not appear to be necessary given our MMR construction. We can determinen
based on the MMRpos
for any given node.Q) what information is required at each non-leaf node to allow verification of difficulty transitions for a non-leaf node (i.e. across an aggregate time period over multiple block headers, without access to the individual headers)?
TODO - If we want to support FlyClient in the future, and if this requires a consensus breaking change to way we calculate the header MMR root, do we want to consider making these changes sooner rather than later?
It is not sufficient to simply commit to difficulty information at the individual header leaf nodes. We need to be able to verify difficulty transitions at the aggregate non-leaf node level.
Related: https://github.com/mimblewimble/grin-rfcs/pull/61