Closed oleganza closed 5 years ago
Had a chat with @vickiniu. She helped figuring out that patricia tree is not good for inserting new items with trimmed state: utxo id is random and you probably don't have necessary neighboring hashes to make a proof of insertion. Deletion works, though, because you delete your own utxo for which you keep the proof.
Utreexo and MMR TXOs proposals have designated point of insertion (from the right), so it's always known how to recompute the tree. But there you cannot have a proof of non-membership, which is needed for keeping a trimmed set of nonces, and helping SPV clients to watch for forced lightning settlement transactions.
On way to solve SPV proofs of non-published utxos:
This way, an SPV client can receive a proof of non-membership for each Nth block. This design, however, does not allow decoupling optimization of a lower-order protocol from a higher-order protocol.
And this does not solve the problem of trimming the nonce set: nonces appear later and we'd have to scan back all the blocks to see if they were not published earlier.
By batching + sorting, does this mean that instead of inserting blocks into the "perfect forest" by timestamp, you would sort lexicographically & then insert in that order?
I mean, you keep the perfect forest as in utreexo, but inserted items are ordered lexicographically, so you (probably) can have a shorter proof of non-membership of anything in the inserted batch.
Update: seems like utreexo allows watching for the payment channel state: instead of proofs of "settlement utxo was not created", the server would provide continuous proofs of the status of the "channel utxo":
So there seems to be no problem with using utreexo for tracking utxos by SPV channel-using clients.
Regarding nonces, there's a problem: neither patricia tree, nor utreexo allow trimming state:
If we try to address nonces and txid uniqueness from first principles, let's look how other systems do that:
Utreexo merged in #308.
Problem
The blockchain scalability problem has 4 facets:
We are not including "SPV/light clients" as those are more like optimizations to the above with security tradeoffs. The exact nature of "light client" protocols depends on how those 4 problems are addressed.
Current situation
The current way to address these issues is:
Bandwidth is minimized by using aggregated signatures, aggregated zero-knowledge proofs and keeping amount of data and computation on-chain at minimum. It can be further minimized by offloading majority of payment transactions into the lightning network.
Averaging the best/worst cases to 1 Kb per payment, and having 500 payments per second, the avg required bandwidth is 500 KB/s at maximum load. With more optimized network, where only 50 payments/sec happen on-chain, with 0.5Kb per payment, the bandwidth could be brought down to 25 KB/s.
CPU cost is kept at minimum likewise: all expensive operations are minimized, unified and processed in batch with state-of-the-art implementation. A 2x2 transaction can be verified in about 2ms, allowing ≈500 payment/sec throughput per 3GHz CPU core (≈1000 inputs/sec). There's room of about 30-50% of further optimization.
State is mostly a UTXO set that grows slowly with user base and has favorable access pattern (~Zipf's law): older UTXOs are spent more rarely than newer UTXOs. The UTXO set consists of N 32-byte hashes of "output structures", so for N users each having M payment channels or idle outputs, the state is
N*M*32
bytes. So a 10M users with 4 outputs on average produces a 1.2 Gb state (only smaller fraction of fresh utxos needs to reside in RAM).ZkVM also uses nonces that ensure tx uniqueness. These must be all stored to be checked against reuse, but they can have an expiration date.
Another piece of state is a trail of "recent block headers" that can be referenced by nonces. The number of those should be covering at least 24h so multi-party transactions could get co-signed with a fresh nonce and published w/o having to retry because of the block reference sliding away. These are sufficiently small to not be a bottleneck (a few megabytes at most).
Bootstrapping requires either of two things:
The option (2) reduces the bootstrap cost to the cost of cloning an existing node's state, which is a significant upside. The downsides are:
a. different security model: one needs to trust that the entire network has followed the rules up till the recent point, and if everyone prunes the history, there may not be data left available for audit; b. extra load on the network nodes to verify and update state commitments, in addition to verification of transactions.
Considering the cost of alternative and existence of fairly efficient schemes for state commitments (especially compared to the cost of verifying zero-knowledge proofs), the need for a state commitment scheme is evident. Currently we use a patricia merkle tree (aka "radix tree") for secure commitment and incremental updates to the set of utxos and nonces.
Optimizing state size
The utxo set growth is the next biggest concern.
Solving this problem via economics with "rent fees" or alike is a measure of last resort, since that solution would create its own technical issues. It is much better to solve it at a technical level, so we do not need to introduce additional assumptions about behavior of participants. For instance, by making the nodes keep a O(1) or O(log(n)) commitment to a UTXO set and letting the users provide proofs of membership of their own UTXOs, plus an efficient algorithm to update that commitment with only the partial data from the users' proofs.
Similar to UTXO set, nonces pose two issues of their own:
Again, a proper technical solution would permit nodes keeping O(1) or O(log(n)) commitment to the set of nonces, allowing users proving non-membership of their nonces. This could allow us eliminating the need for expiration time entirely, streamlining the ergonomics of higher-order protocols.
Optimizing bandwidth
Finally, once the scalability of the blockchain state is solved, the last problem is data/CPU bandwidth.
Data bandwidth can be significantly optimized with a well-oiled payment channel network (lightning): most of the financial operations are payments, and even some sophisticated financial contracts are still structured as payments (e.g. futures). This means, in principle, the entire blockchain network activity can consist mostly of closing and opening of channels and occasional high-value transfers.
CPU bandwidth follows the data bandwidth, but we can also have an additional performance win from batching multiple proofs. E.g. verifying an entire block of 50-100 transactions is 20-24x faster than verifying them one by one. And this ratio is the same for whether those transactions are small, or big multi-party joint transactions. This means, validators can minimize the consensus latency by verifying proposed blocks in batches, and also lower-power devices can keep up with the chain at a lower CPU cost. The batching is not enjoyed to the fullest only by nodes that relay new transactions, but partial batching can be applied there still.
Can we run on a phone?
With 10 tx/sec network load and 1KB transactions, user has to download >800 MB per day, spending 14.4 min on a 1MB/sec link. The CPU cost with use of batching is pretty low: just a minute per day.
On hourly basis it's 34 MB, 36 seconds per download, and 3.6 sec of verification.
Seems like a ≈1GB per day of traffic is intolerable for consumer device, but pretty feasible for any online shopping cart, merchant's point-of-sale terminal or an accounting software running in a closet of a 5-person firm.
We can optimize bandwidth by remembering recent transaction output and referring to them by short index instead of a fully serialized output structure (128 bytes for predicate+anchor+value). Then, a single lane contains not
(4+3)*32
bytes of commitments, but only3*32
. On small transactions, r1cs proof would dominate, but on larger, aggregated transactions, we can achieve up to 40% bandwidth savings for realistic aggregation sizes (16x16—32x32).Note: even radical savings with hypothetical vector-commitment for values (32 bytes instead of 64 - not sure if that's even possible w/o wasting bandwidth for more proofs), packing predicate into the value commitment (MW-style) and minimizing VM format overhead to zero, we can push 40% gains to 70% gains, which is roughly "3x with crazy hard-core optimizations VS 2x with simple optimization".
So, taking simple optimization of 32x32 aggregations and 336 bytes per logical payment (2x2), and a cap of 10 payments per second, the daily bandwidth is 290 MB instead of 800 MB as in our earlier example. To make this runnable on phones we need to trim the throughput 10x more to make the monthly data fit in under 1GB — to just 1 payment per second. That's pretty unrealistic.
SPV costs
Given that the light clients (individual mobile devices) are not going to be able to casually run full nodes in background (although, they absolutely can if the user dedicates them to — e.g. in a point-of-sale application), we need a simplified security model for them, aka "simplified payment verification".
(1) Client only tracks block headers and confirms payment via a merkle proof of its inclusion. The bandwidth depends linearly on the frequency of blocks (every 1 sec in the most loaded case, 3-5 sec more realistically) and not on their contents. For 1000 outputs per block per second, and 1Kb of header data including consensus proofs, the daily bandwidth becomes 84 Mb (≈2.5Gb/month). With 5 second interval (100 payments/sec throughput), it becomes 17 Mb (≈0.500 Gb/month).
This can be optimized further by using skiplist-style reference to skip over 1024 blocks to bring down the bandwidth by a giant factor.
(2) SPV clients are likely to be users of payment channels, which requires watching the chain for forced channel settlements. To avoid simply outsourcing watching for such settlements to other parties, we need a more efficient way to query subset of block data per each block, that allows to detect the settlement.
ZkVM makes output created by a contract fully deterministic, so the user can request merkle paths for a prefix of that (unexpected) utxo ID. Quorum members then can produce those log-sized proofs for all utxos created with such prefix per block. As of proof for "nothing with this prefix at this time" (typical case), it is simply a joint merkle-path for two neighbour items around the prefix, of size
log(outputs_per_block)*32
(300 bytes per block) which is 25 Mb/day (1 block/sec) or 5 Mb/day (5 blocks/sec).Payment channels normally have large contest period, so the forced-settlement utxo is guaranteed to exist over some number of blocks N — this means the user may request proofs of inclusion only every N/10 blocks to have guaranteed detection of the settlement with at most 10% of contest period wasted.