Closed oleganza closed 5 years ago
If the nodes do not validate all txs, but still need to update utxo proofs, they can potentially limit amount of data dramatically by receiving only the tips of perfect trees to re-calculate the changed portion of the proof, but ignoring bulk of the data hidden behind the perfect trees.
If this is doable, it can dramatically reduce the bandwidth requirement to "follow the chain", while allowing full SPV capabilities: detecting spending of one's utxo (payment channel's forced closure) or checking the log-sized fraud proofs to reject bad chain.
Implemented and merged here: #308
Background
Utreexo is a scheme by Tadge Dryja proposed for Bitcoin that optimizes the utxo set storage of a full node.
The goal of Utreexo is to trim the utxo set to a
O(log(utxos))
fromO(utxos)
which is the current state of Bitcoin.This is achieved by:
Utreexo roots also work as a secure commitment to the entire state of the node, and it's necessarily validated by the entire network (in slingshot case), so it allows bootstrapping the state from a recent and widely acknowledged block header, without replaying the entire blockchain from the beginning of times.
(see also #308)
My understanding of the protocol
Tadge presents an algorithm that works at horizontal levels, spanning multiple trees, climbing up at each level. The goal is to convert deletions of the nodes into a new forest of perfect trees, and also more-or-less preserve the order of utxos, so the older ones remain mostly to the left — this allows having shorter proofs (due to smaller trees) for more recent utxos,
My understanding seems to be aligned with this goal and supposed behavior of the algorithm, but I don't fully grok it yet.
However, I have my own variant of the algorithm that may be equivalent to Tadge's (likely), or either flawed (also likely) or even superior (unlikely).
Algorithm
The algorithm first processes all deletions, then processes all additions. To make it match the ZkVM semantics, the interim utxos that are created-and-spent are tracked through "UTXO View" object that provides the interface of add/delete, but filters out those that are both added and deleted during a single tx or a single block.
Additions are simple: add all to the right, then combine into perfect trees (see Tadge's talk): bigger powers of two — to the left, smaller ones — to the right. Since they are so simple, they will be a part of an overall algorithm:
Mark the deleted nodes as deleted and reflect this in the parent nodes' metadata.
Add to the end all the additional items as 1-trees.
Scan through the list of nodes left-to-right, for each power of two, starting with k=1:
If another k-tree node is found, remove it, and merge into 2k-tree node with the first one in-place (so the leftmost node stays where it is), and compute its hash right away. Continue scanning the list, finding again first k-tree, and then possibly finding a second k-tree. If none or only one found, go to next power of two, and jump back to the beginning of the list.
O(n^2)
is wheren
islog(utxos)
(that is, just 30 for 1B utxos), and the list cannot be fully reversed anyway because bigger trees can only grow more to the left, than to the right.TXO bitfields
Bram Cohen's idea is to let nodes store the
O(n/256)
state of all txos as bitfield: one bit for each tx output. 0 if spent, 1 if unspent. Txos are append-only, and the list grows with the blockchain, but it's compressible (most txos are spent), so in the end of the day, it'sO(n/256)
storage forn
utxos: about 125Mb instead of 32Gb if your set is 1B utxos (Bitcoin has ~100M utxos so far, amounting to 3Gb of utxoset storage).The cool thing is that clients only need a static merkle proof of index for their output, that's not changing (modulo reorgs), and nodes can check whether the bit at this index is still 1, or not. If it's 1, they flip it to zero. Merkle proofs are added for newly added txo bits only, so merkle proofs are always O(log(k)) where
k
is not the number of all outputs, but only number of outputs per block.The small downside is still unbounded storage growth (although, 256x factor is a significant difference). The snapshot can be created by committing/verifying the entire bitfield in a block header.
Maybe there's an elegant way to use TXO bitfields as a Thick Node option to keep their storage light? That's not likely, since txo bitfields are not changing positions, while utreexo items necessarily have to change positions.