mit-dci / utreexo

accumulator for bitcoin utxo set
MIT License
323 stars 60 forks source link

Function detectSubTreeRows() is incorrect #351

Open kcalvinalvin opened 2 years ago

kcalvinalvin commented 2 years ago

detectSubTreeRows() for the below case fails.

position: 839, numLeaves: 300, forestRows: 9 should result in 5 but detectSubTreeRows() returns 8.

How to calculate the correct position:

  1. With 300 leaves and 9 rows, 839's parent is 931.
  2. 931's parent is 977.
  3. 977's parent is 1000, which is a root.
  4. detectRow(1000, 9) returns 5.

However, detectSubTreeRows(1000, 300, 9) returns 8.

Playground: https://go.dev/play/p/LarcZzajT_3

bicrxm commented 2 years ago

What forestRows: 9 tells us?

kcalvinalvin commented 2 years ago

What forestRows: 9 tells us?

It means that the entire accumulator is 9 rows tall.

30                                                             <------- row 4
|-------------------------------\
28                              29                             <------- row 3
|---------------\               |---------------\
24              25              26              27             <------- row 2
|-------\       |-------\       |-------\       |-------\
16      17      18      19      20      21      22      23     <------- row 1
|---\   |---\   |---\   |---\   |---\   |---\   |---\   |---\
00  01  02  03  04  05  06  07  08  09  10  11  12  13  14  15 <------- row 0

This accumulator is 4 rows tall.

theStack commented 2 years ago

According to the comments documentation of detectSubTreeRows, this function only works for leaf positions, so if I understand this correctly, for numLeaves: 300, the position has to be in the range of 0-299.

To validate and also for the fun of it I followed the (left) childs of position 839 down to the bottom and then executed detectSubTreeRows on the leaf:

  1. child(839, 9) -> 654
  2. child(654, 9) -> 284 (smaller than 300, so we must have reached the bottom)
  3. detectSubTreeRows(284, 300, 9) -> 5 🎉

By the way, after putting a debug message into the loop body of detectSubTreeRows and executing the PR description example (position: 839) I saw that an integer underflow with the 8-bit variable h happens (i.e. h has value 0 -> h gets decremented -> h has value 255), which I think is quite creepy and should be avoided. Would it make sense to put in a sanity check or assertion into detectSubTreeRows to ensure position is smaller than numLeaves?

I'm both a golang and an utreexo noob, but find this a really interesting project, would love to contribute. I used this issue as kind of an entrypoint/motivation to understand more about the fascinating world of utreexo forests and trees, so thanks for creating it :)

kcalvinalvin commented 2 years ago

Hey @theStack

That's a good catch! I didn't check up on the comments closely. My excuse would be that there's a new accumulator design that I'm currently implementing that allows for leaves to be in rows other than row 0, which was why I thought this was a bug.

By the way, after putting a debug message into the loop body of detectSubTreeRows and executing the PR description example (position: 839) I saw that an integer underflow with the 8-bit variable h happens (i.e. h has value 0 -> h gets decremented -> h has value 255), which I think is quite creepy and should be avoided. Would it make sense to put in a sanity check or assertion into detectSubTreeRows to ensure position is smaller than numLeaves?

Yeah I think it'd definitely be better to have some sort of check in there. Before the loop, maybe something like

if position < numLeaves {
    return 0, fmt.Errorf("detectSubTreeRows error: requested position of %d is not in row 0 " +
    "in a forest with % leaves and % forestRows", position, numLeaves, forestRows)
}

I'm both a golang and an utreexo noob, but find this a really interesting project, would love to contribute. I used this issue as kind of an entrypoint/motivation to understand more about the fascinating world of utreexo forests and trees, so thanks for creating it :)

Glad to hear that :) On the top of my head, I guess the one thing that's desperately needed is cleaning up the code so that comments/old-unused code is removed. Something that's more involved would be an Pollard.Serialize() that also serializes the cached leaves. We have a Pollard.Serialize() that serializes the roots but it excludes the cached leaves. Having the ability to serialize and deserialize the caches leaves as well would be great. https://github.com/mit-dci/utreexo/blob/1ff40de973fc0c9877898829f96231d8f3cea67f/accumulator/pollardutil.go#L167 https://github.com/mit-dci/utreexo/blob/1ff40de973fc0c9877898829f96231d8f3cea67f/accumulator/pollardutil.go#L189

You may also be interested in https://github.com/mit-dci/libutreexo, the implementation in C++ and maybe https://github.com/utreexo/utreexod, the full node with utreexo accumulators implemented (Go). @dergoegge also has a branch on his fork of Bitcoin Core with utreexo accumulators.