blockchain: scalability plan

oleganza commented 5 years ago

Problem

The blockchain scalability problem has 4 facets:

Bandwidth needed to receive all transactions happening on the network.
CPU cost to verify all these transactions.
State size that every node must maintain to prevent double-spending.
Bootstrapping cost: how a new node is supposed to join the network.

We are not including "SPV/light clients" as those are more like optimizations to the above with security tradeoffs. The exact nature of "light client" protocols depends on how those 4 problems are addressed.

Current situation

The current way to address these issues is:

Bandwidth is minimized by using aggregated signatures, aggregated zero-knowledge proofs and keeping amount of data and computation on-chain at minimum. It can be further minimized by offloading majority of payment transactions into the lightning network.

A 2-input/2-output transaction occupies about 1.5 Kb.
A 32x32 transaction takes about 8.5 Kb, which is equivalent to a marginal cost of ≈0.5 Kb per individual 2x2 transfer.
In larger transactions commitments and input/output data dominates the bandwidth, keeping the marginal cost "per payment" slightly below 0.5 Kb.

Averaging the best/worst cases to 1 Kb per payment, and having 500 payments per second, the avg required bandwidth is 500 KB/s at maximum load. With more optimized network, where only 50 payments/sec happen on-chain, with 0.5Kb per payment, the bandwidth could be brought down to 25 KB/s.

CPU cost is kept at minimum likewise: all expensive operations are minimized, unified and processed in batch with state-of-the-art implementation. A 2x2 transaction can be verified in about 2ms, allowing ≈500 payment/sec throughput per 3GHz CPU core (≈1000 inputs/sec). There's room of about 30-50% of further optimization.

State is mostly a UTXO set that grows slowly with user base and has favorable access pattern (~Zipf's law): older UTXOs are spent more rarely than newer UTXOs. The UTXO set consists of N 32-byte hashes of "output structures", so for N users each having M payment channels or idle outputs, the state is N*M*32 bytes. So a 10M users with 4 outputs on average produces a 1.2 Gb state (only smaller fraction of fresh utxos needs to reside in RAM).

ZkVM also uses nonces that ensure tx uniqueness. These must be all stored to be checked against reuse, but they can have an expiration date.

Another piece of state is a trail of "recent block headers" that can be referenced by nonces. The number of those should be covering at least 24h so multi-party transactions could get co-signed with a fresh nonce and published w/o having to retry because of the block reference sliding away. These are sufficiently small to not be a bottleneck (a few megabytes at most).

Bootstrapping requires either of two things:

Replaying the entire blockchain, cost of which is bandwidth multiplied by time (the optimized network with 25 KB/s bandwidth would then require downloading 769 Gb per year of the history),
or bootstrapping the state from a committed snapshot verified by the entire network.

The option (2) reduces the bootstrap cost to the cost of cloning an existing node's state, which is a significant upside. The downsides are:

a. different security model: one needs to trust that the entire network has followed the rules up till the recent point, and if everyone prunes the history, there may not be data left available for audit; b. extra load on the network nodes to verify and update state commitments, in addition to verification of transactions.

Considering the cost of alternative and existence of fairly efficient schemes for state commitments (especially compared to the cost of verifying zero-knowledge proofs), the need for a state commitment scheme is evident. Currently we use a patricia merkle tree (aka "radix tree") for secure commitment and incremental updates to the set of utxos and nonces.

Optimizing state size

The utxo set growth is the next biggest concern.

UTXO set size is proportional to the number of users, therefore strategy "push most activity to the payment channel" does not affect its growth. And a growth of the federated network can be inhibited by requirement to keep a large state, limiting the number and diversity of participants (1B utxos == 32 Gb).
UTXO set size, like the bandwidth is a costly externality: it is one-time cost for the user to create it, and perpetual cost for everyone else to maintain it. This means that it's likely that more than necessary utxos will be created.

Solving this problem via economics with "rent fees" or alike is a measure of last resort, since that solution would create its own technical issues. It is much better to solve it at a technical level, so we do not need to introduce additional assumptions about behavior of participants. For instance, by making the nodes keep a O(1) or O(log(n)) commitment to a UTXO set and letting the users provide proofs of membership of their own UTXOs, plus an efficient algorithm to update that commitment with only the partial data from the users' proofs.

Similar to UTXO set, nonces pose two issues of their own:

if the set of nonces is capped by time, then we allow unbounded growth that must be countered by some additional rules.
if the set of nonces is capped by size, then we risk DoS attacks where malicious users exhaust the quota for the honest users.

Again, a proper technical solution would permit nodes keeping O(1) or O(log(n)) commitment to the set of nonces, allowing users proving non-membership of their nonces. This could allow us eliminating the need for expiration time entirely, streamlining the ergonomics of higher-order protocols.

Optimizing bandwidth

Finally, once the scalability of the blockchain state is solved, the last problem is data/CPU bandwidth.

Data bandwidth can be significantly optimized with a well-oiled payment channel network (lightning): most of the financial operations are payments, and even some sophisticated financial contracts are still structured as payments (e.g. futures). This means, in principle, the entire blockchain network activity can consist mostly of closing and opening of channels and occasional high-value transfers.

CPU bandwidth follows the data bandwidth, but we can also have an additional performance win from batching multiple proofs. E.g. verifying an entire block of 50-100 transactions is 20-24x faster than verifying them one by one. And this ratio is the same for whether those transactions are small, or big multi-party joint transactions. This means, validators can minimize the consensus latency by verifying proposed blocks in batches, and also lower-power devices can keep up with the chain at a lower CPU cost. The batching is not enjoyed to the fullest only by nodes that relay new transactions, but partial batching can be applied there still.

Can we run on a phone?

With 10 tx/sec network load and 1KB transactions, user has to download >800 MB per day, spending 14.4 min on a 1MB/sec link. The CPU cost with use of batching is pretty low: just a minute per day.

On hourly basis it's 34 MB, 36 seconds per download, and 3.6 sec of verification.

Seems like a ≈1GB per day of traffic is intolerable for consumer device, but pretty feasible for any online shopping cart, merchant's point-of-sale terminal or an accounting software running in a closet of a 5-person firm.

tx_size = 1000 # bytes
tx_cpu_cost = 0.002 # sec
tx_per_sec = 10
blockchain_bandwidth = tx_per_sec*tx_size

# user
sync_interval = 24*3600
blob_size = sync_interval*blockchain_bandwidth
batched_cpu_cost = ((1/20.0)*tx_per_sec*tx_cpu_cost)*sync_interval

user_speed = 1_000_000 # bytes/sec
download_time = blob_size / user_speed.to_f
verification_time = batched_cpu_cost

# user downloads 823.9 MB per day for 14.4 min, verifies for 1.4 min
puts %{user downloads #{blob_size/(1024*1024.0)} MB per day for #{download_time/60.0} min, verifies for #{verification_time/60.0} min}

We can optimize bandwidth by remembering recent transaction output and referring to them by short index instead of a fully serialized output structure (128 bytes for predicate+anchor+value). Then, a single lane contains not (4+3)*32 bytes of commitments, but only 3*32. On small transactions, r1cs proof would dominate, but on larger, aggregated transactions, we can achieve up to 40% bandwidth savings for realistic aggregation sizes (16x16—32x32).

 2x2 tx:    full size: 1483,  opt size: 1231 (marginal cost per 2 lanes: 1483 or 1231: 16% better)
 4x4 tx:    full size: 2059,  opt size: 1555 (marginal cost per 2 lanes: 1029 or 777:  24% better)
 6x6 tx:    full size: 2609,  opt size: 1853 (marginal cost per 2 lanes:  869 or 617:  28% better)
 8x8 tx:    full size: 3147,  opt size: 2139 (marginal cost per 2 lanes:  786 or 534:  32% better)
 12x12 tx:  full size: 4209,  opt size: 2697 (marginal cost per 2 lanes:  701 or 449:  35% better)
 16x16 tx:  full size: 5259,  opt size: 3243 (marginal cost per 2 lanes:  657 or 405:  38% better)
 24x24 tx:  full size: 7345,  opt size: 4321 (marginal cost per 2 lanes:  612 or 360:  41% better)
 32x32 tx:  full size: 9419,  opt size: 5387 (marginal cost per 2 lanes:  588 or 336:  42% better)
 64x64 tx:  full size: 17675, opt size: 9611 (marginal cost per 2 lanes:  552 or 300:  45% better)

Note: even radical savings with hypothetical vector-commitment for values (32 bytes instead of 64 - not sure if that's even possible w/o wasting bandwidth for more proofs), packing predicate into the value commitment (MW-style) and minimizing VM format overhead to zero, we can push 40% gains to 70% gains, which is roughly "3x with crazy hard-core optimizations VS 2x with simple optimization".

So, taking simple optimization of 32x32 aggregations and 336 bytes per logical payment (2x2), and a cap of 10 payments per second, the daily bandwidth is 290 MB instead of 800 MB as in our earlier example. To make this runnable on phones we need to trim the throughput 10x more to make the monthly data fit in under 1GB — to just 1 payment per second. That's pretty unrealistic.

SPV costs

Given that the light clients (individual mobile devices) are not going to be able to casually run full nodes in background (although, they absolutely can if the user dedicates them to — e.g. in a point-of-sale application), we need a simplified security model for them, aka "simplified payment verification".

(1) Client only tracks block headers and confirms payment via a merkle proof of its inclusion. The bandwidth depends linearly on the frequency of blocks (every 1 sec in the most loaded case, 3-5 sec more realistically) and not on their contents. For 1000 outputs per block per second, and 1Kb of header data including consensus proofs, the daily bandwidth becomes 84 Mb (≈2.5Gb/month). With 5 second interval (100 payments/sec throughput), it becomes 17 Mb (≈0.500 Gb/month).

This can be optimized further by using skiplist-style reference to skip over 1024 blocks to bring down the bandwidth by a giant factor.

(2) SPV clients are likely to be users of payment channels, which requires watching the chain for forced channel settlements. To avoid simply outsourcing watching for such settlements to other parties, we need a more efficient way to query subset of block data per each block, that allows to detect the settlement.

ZkVM makes output created by a contract fully deterministic, so the user can request merkle paths for a prefix of that (unexpected) utxo ID. Quorum members then can produce those log-sized proofs for all utxos created with such prefix per block. As of proof for "nothing with this prefix at this time" (typical case), it is simply a joint merkle-path for two neighbour items around the prefix, of size log(outputs_per_block)*32 (300 bytes per block) which is 25 Mb/day (1 block/sec) or 5 Mb/day (5 blocks/sec).

Payment channels normally have large contest period, so the forced-settlement utxo is guaranteed to exist over some number of blocks N — this means the user may request proofs of inclusion only every N/10 blocks to have guaranteed detection of the settlement with at most 10% of contest period wasted.

oleganza commented 5 years ago

Had a chat with @vickiniu. She helped figuring out that patricia tree is not good for inserting new items with trimmed state: utxo id is random and you probably don't have necessary neighboring hashes to make a proof of insertion. Deletion works, though, because you delete your own utxo for which you keep the proof.

Utreexo and MMR TXOs proposals have designated point of insertion (from the right), so it's always known how to recompute the tree. But there you cannot have a proof of non-membership, which is needed for keeping a trimmed set of nonces, and helping SPV clients to watch for forced lightning settlement transactions.

oleganza commented 5 years ago

On way to solve SPV proofs of non-published utxos:

batching utreexo commitment over N blocks
requiring sorting of each batch lexicographically.

This way, an SPV client can receive a proof of non-membership for each Nth block. This design, however, does not allow decoupling optimization of a lower-order protocol from a higher-order protocol.

And this does not solve the problem of trimming the nonce set: nonces appear later and we'd have to scan back all the blocks to see if they were not published earlier.

vickiniu commented 5 years ago

By batching + sorting, does this mean that instead of inserting blocks into the "perfect forest" by timestamp, you would sort lexicographically & then insert in that order?

oleganza commented 5 years ago

I mean, you keep the perfect forest as in utreexo, but inserted items are ordered lexicographically, so you (probably) can have a shorter proof of non-membership of anything in the inserted batch.

oleganza commented 5 years ago

Update: seems like utreexo allows watching for the payment channel state: instead of proofs of "settlement utxo was not created", the server would provide continuous proofs of the status of the "channel utxo":

First, the server has to provide "watch chain and update proofs" service anyway to the light client.
Second, such update should make it clear when the utxo is destroyed. In such case, the client can ask for a tx (via merkle proof to the blocks' txroot) from which it'll learn the new utxo, and request it's proof, and then decide to contest it or not.
The server cannot lie and hide the fact of updating the utxo: because they have to stream up-to-date merkle path of the utxo per each commitment (which could be every N blocks for batch efficiency).
The server can only deny service, in which case the client can switch the server (or have redundant utxo tracker in the first place), and ask a full-state node for the up to date proof and then subscribe to it or another node that would watch the utxos.
Clients can pay fees to their quorum slices who would likely provide these services, and those fees would cover both tx publication, and utxo tracking. Confidentiality of txs makes the fees fair: servers cannot discriminate by contents of txs and charge higher fees if one wants to watch an expensive payment channel. Fees will mostly reflect operation expenses and uptime/connectivity record.

So there seems to be no problem with using utreexo for tracking utxos by SPV channel-using clients.

oleganza commented 5 years ago

Regarding nonces, there's a problem: neither patricia tree, nor utreexo allow trimming state:

Patricia tree allows efficient proofs of non-membership, but it's impossible to prove the insertion if one has trimmed state: every new nonce is random and falls in to a random place in the tree.
Utreexo allows insertion with trimmed state, but cannot prove non-membership.
MMR TXOs allow insertion with trimmed state, and allow non-membership proof for pre-existing items, which does not apply to nonces: we need to prove their absence not after deletion, but before insertion.

oleganza commented 5 years ago

If we try to address nonces and txid uniqueness from first principles, let's look how other systems do that:

In Bitcoin, issuance is done by miners only, and uniqueness is provided simply by having issuance tx commit to the block height.
In Ethereum and Stellar, accounts are issuing and nodes keep track of ALL accounts, but all transfers and issuances within account are unique via a sequence number.
In Blockstream Elements, issuance is unique by piggybacking on txid uniqueness (link) which means (1) you don't know assetid until you form a concrete transaction; (2) you need to spend some other utxos (pegged bitcoins, at least) to make the tx unique. (edited) even if we need to do something like (3), it's better ergonomics to do so with nonces, not with issuances, so asset ID can be clean and more deterministic
In TxVM nonces are all remembered (1, 2) and capped by expiration date, which is a capped variant of (2) - "simply store all the things", but has bad DoS vectors that are non-issue in Sequence, but issue in an open network
Semi-related note: Zcash/Monero remember all "nullifiers"/"key images" to prevent reuse, across all transactions. Zcash has (or will add?) "epochs" which is like our expiration time on nonces, but in their case it's expiration time on money. So if you haven't moved your funds from epoch to epoch, it becomes unspendable.

oleganza commented 5 years ago

Utreexo merged in #308.

stellar / slingshot