QIP-17: Storage rent within Qtum-x86

Abstract

This would implement a rent mechanism for all storage consumed in Qtum-x86. The existing EVM infrastructure would be untouched (for now). The rent mechanism would be a somewhat "passive" implementation, requiring no special contract logic in order to be aware of rent, nor to make rent payments on consumed storage. The design would effectively put a cap on the total amount of data a node would need to store, while also putting a cap on the total search space lightweight SPV nodes need to effectively evaluate and interact with smart contracts.

Motivation

Disk space consumption has long been a concerning topic in the blockchain space. There is a fear that any blockchain, especially those with smart contracts and additional state, will inflate beyond the storage capabilities of most computers and servers in as little 10 years. This disk space inflation would then cause centralization of full nodes to only those capable of affording very expensive computers with very large storage arrays.

Specification

The changes to the Qtum-x86 design incurred by this can be broken into 3 parts:

DeltaDB Repropagation and rent behavior
Smart Contract behavior for "sleeping" state
Method of "waking" sleeping state

First, the terms used for different states of.. state

"active" state, this is state that has had an appropriate rent payment keeping it alive and easily accessible on the blockchain
"sleeping" state, this is state which has not had a rent payment in the appropriate period, and can not be directly accessed by smart contracts without being repropagated via transaction
"waking" state, this is the action of restoring sleeping state into active state so that it can once again be accessed directly by smart contracts

DeltaDB Repropagation and Rent Behavior

Every piece of state owned by a contract, including it's own bytecode, has a rent timer denoted by block height. Once this timer elapses to 0, the active state is moved to the sleeping state, and nodes are safe to prune most of the data concerning that state from their internal databases. When a piece of state is accessed or modified an implicit rent payment is made, which is factored into the gas price of the operation. When the rent payment is paid by accessing the data, the piece of state will have it's timer reset to RENT_TERM. There is no method of pre-paying in order for a piece of state to have it's timer set to something larger than RENT_TERM.

With DeltaDB's current consensus model, reading a piece of state (usually) does not involve adding a new delta (state change/notification) to the DeltaDB proof tree. With this storage rent proposal, each state access would cause a "repropagation" of state by submitting a delta to the DeltaDB proof. Although this would have no effect to contracts and even most developers working on the blockchain, this causes a number of side-effects:

SPV (light wallet) nodes can prove when a state's latest rent payment was made
Conversely, it allows for proof that a state and most proofing-overhead can be deleted from the (SPV or full) node's internal databases
SPV nodes can obtain censorship-resistant proofs of state data more quickly, by the most often used data of a contract being propagated more often, less blocks would need to be scanned
Other than the overhead of waking state, this would cause no additional block space consumption, as the DeltaDB proof tree collapses into a single 32 byte hash in the blockheader, and nothing else
With the ability to prove data older than RENT_TERM is no longer needed, the maximum theoretical disk space consumption of the Qtum-x86 blockchain can be greatly reduced, especially when using pruning. With pruning, most waking state transactions will not be required after 500 blocks have passed since their inclusion.
Naturally this would work to limit data being stored by nodes to only what is needed for ongoing consensus. Proofs will always be available in the blockchain, such as for use cases like proof of publishing. However, these proofs are typically added once and only occasionally accessed afterwards, and thus are non-essential for consensus purposes.

The following data would need to be kept by a node for each piece of sleeping state:

Block height in which the state was last propagated (ie, when last rent payment was made)
A hash of the key indexing the data

Smart contract Behavior

Most proposals for storage rent designs requires explicit and error-prone management of rent and awareness by smart contracts. In this design, everything is implicit and operations for checking rent are not required outside of specialized contract designs. With this proposal comes some changes to how unstoppable exceptions behave, including the one that occurs when a contract attempts to access sleeping state.

Rather than consuming all gas like in Ethereum's exception model, only the portion of gas used by the contract up until the point of the exception will be consumed, as well as an "exception tax"
All modified state will be reverted, as is similar in Ethereum's exception model, and these reverted states will not be propagated in DeltaDB at all
All active state which is accessed up until the exception will have a rent payment and thus be propagated in DeltaDB
If an active state is modified and never actually read by the execution, the state will NOT have a rent payment and thus not be propagated in DeltaDB. The new state will be propagated IFF the execution does not end in an exception
If the execution has attached waking state, this state will count as "accessed" and thus be propagated in DeltaDB and restored, despite an exception in execution. Note there is a "wake tax" on restored state which must be levied before execution of the contract begins. If too little gas is sent to account for the wake tax, then no restorations will take place and the execution is effectively ignored other than a failing execution receipt and all gas will be consumed
If the execution has attached waking state, but the state is already awake, then this attached already awake state is ignored and no gas is charged for it's inclusion. This prevents punishing people for cautiously including waking state that will soon expire to ensure contract execution does not fail. The attached state which is asleep will be woken and the wake tax levied.
In the above case of attached state which is already awake, this state will be considered accessed and thus be propogated in DeltaDB and INDEX_TAX+PROP_TAX will be charged per state key.

This implicit rent payment and exception design means that most contracts never need to worry about rent mechanisms for proper operation. However, for contracts that require some self-awareness about it, some additional system interfaces will be added:

uint32_t remainingRent(uint8_t* key, size_t keylen); -- This will return the remaining rent left for a particular state key, in terms of blocks. If the state is sleeping or has not been written to, 0 will be returned
uint32_t remainingExternalRent(UniversalAddressABI* target, uint8_t* key, size_t keylen); -- This will behave the same as remainingRent, but acts on external contracts
uint32_t remainingExternalBytecodeRent(UniversalAddressABI* target); -- This will behave the same as remainingExternalRent, but checks an external contract's bytecode rather than a state key. Note there is no need for an internal version of this because by act of the contract being executed, the rent remaining will always be RENT_TERM
uint32_t BlockData->RENT_TERM -- this is a block-constant (potentially modifiable by DGP over time) for what the max rent term is

Gas model

The gas model for storage is not completely designed for Qtum-x86 as of yet, but this proposal would completely change it if it were. So, this is as good place to lay down the design as any.

Definitions:

PROP_TAX -- The tax charged on any propagation that makes it into the DeltaDB tree
READ_TAX(size) -- A tax charged on reading state from the node database. This is not a flat cost, and will probably have a minimum cost plus a cost per byte after a certain length. This factors in costs such as copying the data into VM memory
EXEC_TAX -- The tax charged for initializing a new VM instance in order to execute a contract
SHORT_READ_TAX(size) -- This tax is charged for reading state previously read from the database in the current execution. It is otherwise similar to READ_TAX
INDEX_TAX(key_size) -- This tax is charged for indexing anything in the database. This is designed to be relatively cheap and includes the cost of hashing the key if needed
WRITE_TAX(size) -- This tax is charged for writing state to the database
EXTERNAL_TAX -- A small additional tax for accessing external account state
LIBEXEC_COST -- A fixed cost for executing any trusted library contract. Note to prevent abuse, there is a hard limit on the size of trusted library contracts
STORE_REFUND(size) -- A refund given for modifying state, assuming the state is reduced to 0
DIRTY_STORE_REFUND(old_size) -- Similar to STORE_REFUND, but this is a larger refund given if the state is reduced to 0 and it was created in the current execution (ie, it never got to the database)
PROP_REFUND -- A refund given if state is unestablished at beginning of execution, established during, but then reduced to 0 before execution is complete, meaning propagation is not necessary. This does not apply to modifications with any state other than null as the beginning and ending state. ie, PROP_TAX will still be charged if a state goes from "abc" -> 0 -> "abc" but will be refunded if a state goes from 0 -> "abc" -> "xyz" -> 0.
CLEANING_REFUND(size) -- This is an additional refund given as an incentive factor for cleaning up storage. This is only given when the state is reset to 0, and not given when the state is resized
WAKE_TAX(size) -- This is an additional cost for restoring state to be "active"
SLEEPING_REFUND -- This is a fixed refund for destroying sleeping state

Operations:

Note, in cases of reading that

Initial Contract execution: PROP_TAX + READ_TAX(size) + EXEC_TAX
First-time execute external contract: PROP_TAX + READ_TAX(size) + EXEC_TAX + EXTERNAL_TAX
Second-time execute external contract: SHORT_READ_TAX(size) + EXEC_TAX + EXTERNAL_TAX
Recursive contract execution: SHORT_READ_TAX(size) + EXEC_TAX
Contract self-destruct: PROP_TAX + STORE_REFUND(size) + CLEANING_REFUND(size)
Internal first-time size-check: PROP_TAX + INDEX_TAX(key_size) -- This can be used to force a (cheap) rent payment without actually reading the state into memory; This is currently just a state read where 0 bytes are read. (state reads always return the actual size of the data)
External first-time size-check: PROP_TAX + INDEX_TAX(key_size) + EXTERNAL_TAX
Internal second-time size-check: INDEX_TAX(key_size) -- second-time in this case means after a previous size check or state read
External second-time size-check: INDEX_TAX(key_size) + EXTERNAL_TAX
Internal first-time read: PROP_TAX + READ_TAX(size) + INDEX_TAX(key_size)
Internal second-time read: SHORT_READ_TAX(size) + INDEX_TAX(key_size) -- This includes state which was written and then read afterwards in the same execution
First-time write to new state: PROP_TAX + WRITE_TAX(size) + INDEX_TAX(key_size)
First-time write to new state, setting to 0: INDEX_TAX(key_size) -- this is a no-op, and so should not be done in normal usage
Second-time write to new state: WRITE_TAX(size) + STORE_REFUND(size) + INDEX_TAX(key_size)
Second-time write to new state, setting to 0: DIRTY_STORE_REFUND(old_size) + INDEX_TAX(key_size) + PROP_REFUND
First-time write to existing state: PROP_TAX + WRITE_TAX(size) + STORE_REFUND(old_size) + INDEX_TAX(key_size)
First-time write to existing state, setting to 0: PROP_TAX + STORE_REFUND(old_size) + INDEX_TAX(key_size) + CLEANING_REFUND(size)
Second-time write to existing state: WRITE_TAX(size) + STORE_REFUND(old_size) + INDEX_TAX(key_size)
Second-time write to existing state, setting to 0: PROP_TAX + STORE_REFUND(old_size) + INDEX_TAX(key_size) + CLEANING_REFUND(size) -- same as first-time write
Second-time write to existing state previously set to 0 at 1st write: WRITE_TAX(size) + INDEX_TAX(key_size) -- Basically the same as a normal second-time write
First-time write to sleeping state: PROP_TAX + WRITE_TAX(size) + INDEX_TAX(key_size) -- note there is no refund possible for shrinking the data in this case, but WAKE_TAX is expected to be higher than the refund would be anyway. Also note second-time write is the same as writing to normal existing (dirty) state
First-time write to sleeping state, setting to 0: PROP_TAX + INDEX_TAX(key_size) + SLEEPING_REFUND -- This should in theory balance out to around 0 with average (<32 byte) key sizes
Note: After first-write, sleeping state is treated the same as other state in regards to behavior with setting to 0, resizing, etc
External first-time read: PROP_TAX + READ_TAX(size) + INDEX_TAX(key_size) + EXTERNAL_TAX
External second-time read: SHORT_READ_TAX(size) + INDEX_TAX(key_size) + EXTERNAL_TAX
Free-standing code execution: EXEC_TAX -- a "free-standing" execution is just code in a UTXO, executed once with no state stored, so there is no cost other than execution
Pre-execution restoration of sleeping state: WAKE_TAX(size) + INDEX_TAX(size) + PROP_TAX -- note this happens before execution of the origin contract.
Note: After restoration of sleeping state, all gas costs are the same as normal active storage.
Trusted library execution: PROP_TAX + LIBEXEC_COST -- This notably does not incur per-length costs. Other than the gas cost of actual execution of the code, this is fixed cost. This is to make trusted library executions more predictable
Note: All trusted library reads/writes on a contract's behalf are the same as normal contract execution and do not incur the EXTERNAL_TAX

Although this list of operations is quite large, it is actually very formulaic in nature and shouldn't be too difficult to actually implement in code. It is very regular and should require very little handling of edge cases. Each of the defined constants or functions above should be self-explanatory and are expected to factor in all costs of a node and the greater network.

Some dangers of this approach is that refunds must be conservative to avoid a case where it could be beneficial, for example, to write data to state, and then modify the state to a smaller size, rather than simply writing the smaller state in the beginning. Refund behavior is different from Ethereum, in that any surplus after execution will be sent back to the gas payee, up to the total gas sent to the contract. If more gas were allowed than sent with the contract then this could be exploited to print Qtum by using artificially high gas prices so that refunds could take place at a higher gas price than the initial gas payment.

Account Abstraction Layer Modification

In order for contract transactions to be properly pruned of extraneous data, all contract execution and creation transactions will be spent by the AAL and condensed. This would greatly simplify other future QIPs as well, such as a proposal for UTXO-based "one-time owned" state. Currently, contract executions are only spent by the AAL when the funds from the execution are actually spent by the smart contract. Further, the contract creation transaction will only be spent when the contract self-destructs. This allows for some additional features in SPV nodes to track contracts, but comes at the cost of keeping duplicated and somewhat irrelevant data in the UTXO set. This functionality would be effectively superseded by DeltaDB's SPV targeted methods of access and tracking.

New Node Classification

Currently there are three primary types of nodes in the Qtum ecosystem:

Archival Node -- This node contains the entire blockchain. The UTXO set is pruned to not be duplicated but all of the spent transaction data is kept on disk in a slower to access format. This node is capable of any use case including staking, general wallet, historical data analysis, development, etc.
Full Node With Pruning -- This node is similar to a full node, and does download and verify the entire blockchain, but deletes data that is provably not in use. This includes in particular spent transaction data and old block data. This node is capable of all the use cases of a full node with the exception of historical analysis.
SPV Node -- This type of node functions by downloading data on demand that is considered "relevant" to the current wallet, and the entire blockchain's block headers for verification and proof purposes. This type of node is very lightweight and is often used on mobile devices and for "fast syncing" wallets. This node is decentralized, but is subject to the potential of censorship, as there is no proof that the full nodes it connects to are not providing data whch should exist. The ultimate security of this is mostly regarded as stable, but is subject to sybil attacks. This type of node is in general only useable as a wallet and for some limited types of smart contract development. Noteably, it can not be used for staking.

With the new functionality and provable behavior provided from this proposal, a new node classification is proposed:

Fast Dev Node. This uses the general security paradigms of SPV nodes, and requires download of the following data initially:

Full EVM data (this can't be avoided yet) using the state root of the best block
All block headers for the blockchain (same as SPV)
DeltaDB proofs and data from the most recent RENT_PERIOD blocks (verified against the block headers' DeltaDBRoot). Modified data can be pruned as processed
Relevant UTXOs and proofs for the current controlled wallet (same as SPV)

Data downloaded on demand includes:

Staking UTXO for new blocks
UTXO proof (ie, block hash and merkle path) for proving that a UTXO exists before accepting a block spending it as the stake
Requests for proofs of UTXOs spent (same as an SPV node) for the stake UTXO
Requests for proofs of UTXOs spent and created for relevant addresses (same as SPV)
Requests for transactions (not just UTXOs) interacting with contracts and tracking of DeltaDB state changes within RENT_PERIOD. Transaction data is pruned after execution, leaving only the DeltaDB tracking
Ongoing block header downloads

This would not be considered safe for staking and security critical purposes, as it's core security is still only as secure as SPV. However, this would be fully capable of being used for smart contract development and as a more featured version of an SPV node. Historical contract execution can be ignored, but ongoing new contract executions can be completely executed and tracked. The initial syncing process for this would only be somewhat slower than SPV, primarily due to the need to download full EVM and pruned DeltaDB data. The bandwidth costs would also only be slightly higher than an SPV node, with the primary cost being that all transactions with a smart contract execution must be downloaded in full.

Rationale

This specific implementation of rent is different from most others which have been proposed. One of the big fears is that smart contracts are already quite complex and adding more complexity with rent would greatly increase the amount of bugs and exploits in smart contract code. This proposal takes a different route by leveraging DeltaDB's unique characteristics in order to make the rent system beneficial to the ecosystem, without requiring any additional smart contract logic in most cases.

Furthermore, a big focus of this proposal is to separate what nodes need for consensus and everything else. It is perfectly acceptable to have data on the blockchain which is seldom accessed and never updated, but this data should be provably irrelevant for nodes. Of course, it will still be provable that the data is contained within the blockchain at a certain block height, however, it is not expected to be directly stored and accessible by most nodes on the network. Furthermore, this proof can be done without consuming any blockchain resources that would incur gas costs. This type of historical data would be relegated to Archival Nodes. For applications that only need access to, say, a certain smart contract's sleeping data, the application could use a partial-archival node. This is basically a standard pruning node, but which stores the full historical data for the relevant smart contract.

Strategy

This will be implemented in the initial release of Qtum-x86. It becomes very hard to change this storage model after release, so it is greatly advantagous to launch Qtum-x86 with this already implemented.

TODO

Need calculations for what the theoretical data cap would be for different RENT_TERM values
Need calculations for how much larger the DeltaDB merkle trees would be for a full block of contract executions, as well as a typical block.
This does not completely eliminate the need for "archival nodes" which store all data. In order to sync a full node with no trust, all "sleeping" data would still be required and/or need to be reconstructed from block data in order to prove the blockchain's current state is valid.

(This QIP is unfinished and will be updated as details are established)

qtumproject / qips