spacemeshos / pm

Project management. Meta-tasks related to research, dev, and specs for the Spacemesh protocol and infrastructure.
http://spacemesh.io/
Creative Commons Zero v1.0 Universal
2 stars 0 forks source link

Finalize design: State database/data structure #13

Open lrettig opened 3 years ago

lrettig commented 3 years ago

We can easily switch this unless and until we have external validators/need state to be part of consensus.

Main tradeoff here is hashing/proof length vs. I/O.

Ethereum is moving away from the Hexary patricia merkle trie design, in favor of a binary trie structure. We should understand these tradeoffs, especially wrt stateless clients and light clients in future.

Resources:

See also: https://github.com/spacemeshos/go-spacemesh/issues/3173

lrettig commented 1 year ago

The plan for now is to avoid a state trie entirely, and instead to do the simplest possible thing, e.g., some sort of key-value store. @dshulyak can elaborate on the current plan/implementation. We won't have light client support at genesis, but plan to add this later, and can revisit the trie structure at that time.

We should understand the verkle tree structure:

lrettig commented 1 year ago

I'm leaving this in the 0.3 milestone for now because there are a few things I think we should consider before finalizing this decision. (If we are sure we don't want a trie at genesis, it can be pushed to a later milestone.)

dshulyak commented 1 year ago

performance: are we sure that our solution will be at least as efficient as a trie?

we are using native db reads to get an account state. it is not even comparable to reading them from a trie. mostly because we are solving a much simpler problem.

consider ethereum state (https://arxiv.org/pdf/2108.05513.pdf 6.5 Storage trie). ethereum stores account state (nonce, balance, contract, account state hash) in a trie. in order to get a single account state it has to traverse the trie, thats log(n) reads to the database (not exactly log(n) with particia, and pretty sure eth has some optimizations, but in general disk operations are amplified).

erigon "flattened" that trie (by storing leafs and commitments separately), which allowed them to remove this additional log(n) factor from state reads. so they are using native db reads as well. but in general ethereum solves much more complex problem that we don't even scratch:

below are simple benchmarks from the current implementation. applying 100_000 txs that are using different 100_000 accounts takes about 800ms to execute, with a heap growing to ~300mb. about 2.1B gas in ethereum. with real vm we are likely to have much lower throughput. so i don't think that there is any performance concerns for a simplified vm that will be used for genesis.

goos: linux
goarch: amd64
pkg: github.com/spacemeshos/go-spacemesh/genvm
cpu: 12th Gen Intel(R) Core(TM) i9-12900KF
BenchmarkValidation
BenchmarkValidation/SpawnWallet
BenchmarkValidation/SpawnWallet-24             36490         29768 ns/op        2784 B/op         45 allocs/op
BenchmarkValidation/SpendWallet
BenchmarkValidation/SpendWallet-24             40701         27886 ns/op        2920 B/op         52 allocs/op
BenchmarkWallet
BenchmarkWallet/Accounts100k/Txs100k
BenchmarkWallet/Accounts100k/Txs100k-24            2     836748089 ns/op    333534656 B/op   7388836 allocs/op
BenchmarkWallet/Accounts100k/Txs1kk
BenchmarkWallet/Accounts100k/Txs1kk-24             1    2090298001 ns/op    1981615256 B/op 28004202 allocs/op

gas metering considerations:

we shouldn't use vm that i wrote for genesis for gas metering considerations. for now i basically used random numbers for gas. svm will likely will have much more overhead, plus we should include state commitments into storage considerations.

related: how will caching work?

accounts state is cached during block execution, in order to avoid reading it multiple times. maybe i will have to flush some of it to lower the size of the heap, if we plan to have larger blocks (1_000_000 txs).

access lists

definitely not for genesis. we have 3 simple template types, with predictable accounts access

accommodating expected future state growth, and state pruning

accounts historical state can be pruned. but this definitely won't be a problem for genesis.

other questions seems to be ethereum specific, or optimizations for problems that spacemesh doesn't face for now.

lrettig commented 1 year ago

@WilfredTA will take a look at this after he's done a little more work on gas price research. Plan A is to stick with what @dshulyak already implemented, i.e., a basic key-value store, but severely overestimate gas prices for genesis (by at least an order of magnitude, if not two), to account for greater cost (compute, I/O) in future when we switch to a trie of some kind.