vyperlang / vyper

Pythonic Smart Contract Language for the EVM
https://vyperlang.org
Other
4.81k stars 788 forks source link

VIP: [research] Static balance sheet analysis for Vyper contracts #1277

Open charles-cooper opened 5 years ago

charles-cooper commented 5 years ago

Simple Summary

Create a way for contract developers to ensure that balance sheet invariants (e.g., the total amount a user can withdraw is equal to to the total amount that user has deposited) are preserved.

Abstract

Balance sheet analysis of smart contracts is prone to error. Rather than have developers prove ad-hoc invariants about cash flows using constraint-based checkers (cf. https://github.com/Zilliqa/scilla/issues/18 for one approach to this), this VIP proposes a DSL for handling funds based on double-entry accounting principles which maintains balance-sheet invariants by construction. This DSL aims to be easy to read, write and reason about, and familiar to non-programmer accountants. (It is unclear whether this is within the scope of Vyper core or should be an external tool which is why this VIP is marked as [research], but because it requires syntactic support, I believe it is more appropriate as a core Vyper feature).

Motivation

One of Vyper's goals is to create readable, auditable code which is difficult to be misleading. One tricky area for contract developers is accounting! Fortunately, accountants have developed techniques over the years to help with doing accounting precisely.

This proposal centers around two concepts: the journal entry and the chart of accounts. A journal entry is a collection of debits and credits. Here is an example journal entry:

Account Debit Credit
Equity 1.5 -
Assets - 1.5
-- -- --
Total 1.5 1.5

This journal entry balances, that is, the debits and credits sum to the same amount. A journal entry MUST balance in order to be valid. This preserves the balance sheet invariant, which is (in this case) that Equity == Assets. In fact, the chart of accounts defines multiple balance sheet invariants.

But first, a digression into some definitions around accounts relevant to this proposal. The chart of accounts is a hierarchical list of accounts. The basic chart of accounts has three top-level accounts: Equity, Liabilities and Assets. Equity and Liability accounts are credit-normal, while Asset accounts are debit-normal. This means that debits increase and credits decrease the balance of an Asset account, while debits decrease and credits increase the balance of an Equity or Liability account. The reason for this (cue sweeping statements) stems from how corporate accounting developed in the West; accounting entities do not have value. They may hold property (Asset accounts) but it is always canceled by claims by first lienholders (Liability accounts) and second lienholders (Equity accounts). This is represented by the accounting equation, Assets = Equity + Liabilities. So, when assets increase (are debited), liabilities or equity must increase (get credited) and vice versa.

Here is an example chart of accounts:

Equity:
  - Fees accrued
Assets:
  - Total ether
Liabilities:
  - User deposits:
    - User 1 deposits
    - User 2 deposits

Here is an example journal entry involving such accounts. A user deposits 1.1 ether into the contract, and the contract deducts 0.1 as a fee:

Account Debit Credit
Total ether 1.1 -
User 1 deposits - 1
Fees accrued - 0.1
-- -- --
Total 1.1 1.1

Clearly, the top-level balance sheet invariant is preserved:

Equity + Liabilities = Assets
0.1 eth + 1 eth = 1.1 eth

But there is another, 'phantom' balance sheet invariant implied by the chart of accounts which is equally important: User deposits = User 1 deposits + User 2 deposits. To show how journal entries maintain both invariants at the same time, suppose User 1 transfers ownership of their deposit to User 2 (and just for fun, charge a fee of 0.1 ether).

Account Debit Credit
User 1 deposits 1 -
User 2 deposits - 0.9
Fees accrued - 0.1
-- -- --
Total 1 1

The balance sheet invariants are now:

User 1 deposits + User 2 deposits = User deposits
0 eth + 0.9 eth = 0.9 eth

Equity + Liabilities = Assets
0.2 eth + 0.9 eth = 1.1 eth

In normal accounting, the intermediate invariants are implied (and also easily checked by simply summing the account balances). As a smart contract programmer, it can be useful to know that all of these invariants are maintained every step of the way (and since summing the account balances could have unbounded gas cost, to have the ability to store the balances of the intermediate accounts).

Specification (tentative)

A couple common use cases shape the syntax in this VIP. One is the use case of mapping identifiers to balances, and the other is the need to define custom debit and credit functions to interact with the outside world.

A data structure which handles funds is marked using the @funds decorator. In order so that the compiler knows what statements to generate for debits and credits, it must also be marked either @debitnormal or @creditnormal. In order to define a nested account, arguments can be provided to the @funds decorator. For example, the nested account

User deposits:
  [list of Token (TOKEN_ARG)]
    [list of User deposit by (USER_IDENTIFIER)]

can be defined like

@funds('TOKEN_ARG', 'USER_IDENTIFIER')
user_deposits: map(ERC20, map(address, uint256))

Defining a custom debit/credit can be done by defining a special function which is then decorated with @debit/@credit and its scope. For example, to debit and credit a phantom token balance, it could be defined as follows:

@debit('erc20_balance')
def _erc20_balance_debit(tok: ERC20, amt: uint256):
  tok.transferFrom(msg.sender, self, amt)
@credit('erc20_balance')
def _erc20_balance_credit(tok: ERC20, amt: uint256):
  tok.transfer(msg.sender, amt)

Finally, funds can only be touched with the special journal control structure. A journal entry can be created as follows:

journal:
  debits:
    - account: <account>
      amount: <amount>
    ...
  credits:
    - account: <account>
      amount: <amount>
    ...

The compiler should throw unless it can verify that the sum of the debits is equal to the sum of the credits (or the journal entry does not have at least one debit and at least one credit). Also, any function which has a journal entry should have a non-reentrant lock associated with it (cf. #1204).

Examples

Here is a sample token, with just mint, burn and transfer functions defined:

@funds('user')
@creditnormal
balances: public(map(address, uint256))

@funds
@debitnormal
supply: public(uint256)

# Default implementations generated by compiler
##@debit('user')
##def _debit_user(user: address, amt: uint256):
##  self.balances[user] -= amt
##@credit('user')
##def _credit_user(user: address, amt: uint256):
##  self.balances[user] += amt
##@debit('supply')
##def _debit_supply(amt: uint256):
##  supply += amt
##@credit('supply')
##def _credit_supply(amt: uint256):
##  supply -= amt

owner: address

def __init__():
  self.owner = msg.sender

def mint(amt: uint256):
  assert msg.sender == self.owner
  journal:
    debits:
    - account: supply
      amount: amt
    credits:
    - account: user(msg.sender)
      amount: amt

def burn(amt: uint256):
  journal:
    debits:
    - account: user(msg.sender)
      amount: amt
    credits:
    - account: supply
      amount: amt

def transfer(to: address, amt: uint256):
  journal:
    debits:
    - account: user(msg.sender)
      amount: amt
    credits:
    - account: user(to)
      amount: amt

Here is an example smart contract which illustrates some current error-prone techniques and how this proposal improves the situation, defined once using current Vyper, and once using this VIP. It is a payment splitter where a user can define multiple recipients for a single payment, held in escrow until the recipient withdraws the funds. The first implementation contains three accounting bugs which are easily caught by the journal entry format, and one bug which is prevented by the re-entrancy lock.

# Current Vyper

balances: map(ERC20, map(address, uint256))

def split_payment(tok: ERC20, amt: uint256, recipient1: address, recipient2: address):
  require(tok.transferFrom(recipient1, self, amt))
  if recipient1 != 0x00 and recipient2 != 0x00:
    self.balances[recipient1] += amt/2 # rounding error, 1 wei might be lost
    self.balances[recipient2] = amt/2 # typo! (should be `+=`)
  elif recipient1 != 0x00:
    self.balances[recipient1] += amt
  # missing case, funds are now irretrievable

def withdraw(tok: ERC20, amt: uint256):
  require(tok.transfer(msg.sender, amt)) # re-entrant
  self.balances[msg.sender] -= amt
# Vyper with this VIP

@funds('erc20', 'user')
@creditnormal
balances: map(ERC20, map(address, uint256))

# define single-entry debits and credits
# must be defined since there is no @funds('erc20_balance') defined.
# implied: @nonreentrant('balances')
@debit('erc20_balance')
def _debit_erc20(tok: ERC20, amt: uint256):
  require(tok.transferFrom(msg.sender, self, amt))
  # Perhaps the compiler should generate a data structure
  # for the intermediate balance and modify it here:
  # self.erc20_balances[tok] += amt

# implied: @nonreentrant('balances')
@credit('erc20_balance')
def _credit_erc20(tok: ERC20, amt: uint256):
  require(tok.transfer(msg.sender, amt))
  # Perhaps the compiler should generate a data structure
  # for the intermediate balance and modify it here:
  # self.erc20_balances[tok] -= amt

# Default implementation generated by the compiler.
##@nonreentrant('balances')
##@debit('erc20.user')
##def _debit_erc20_user(tok: ERC20, user: address, amt: uint256):
##  self.balances[tok][user] -= amt # implicit underflow check

# Default implementation generated by the compiler.
##@nonreentrant('balances')
##@credit('erc20.user')
##def _credit_erc20_user(tok: ERC20, user: address, amt: uint256):
##  self.balances[tok][user] += amt

def split_payment_fail(tok: ERC20, amt: uint256, recipient1: address, recipient2: address): 
  if recipient1 != 0x00 and recipient2 != 0x00:
    journal:
      debits:
      - account: erc20_balance(tok)
        amount: amt
      credits: # fail. compiler cannot prove that amt/2 + amt/2 == amt
      - account: erc20.user(tok)(recipient1)
        amount: amt/2
      - account: erc20.user(tok)(recipient2)
        amount: amt/2
  elif recipient1 != 0x00:
    journal:
      debits:
      - account: erc20_balance(tok)
        amount: amt
      credits:
      - account: erc20.user(tok)(recipient1)
        amount: amt
  # no lost funds because no transfer is made.

def split_payment_better(tok: ERC20, amt: uint256, recipient1: address, recipient2: address):
  if recipient1 != 0x00 and recipient2 != 0x00:
    journal:
      debits:
      - account: erc20_balance(tok)
        # OK because compiler can find matching amounts on
        # both sides of the entry.
        amount: amt/2
      - account: erc20_balance(tok)
        amount: amt/2
      credits:
      - account: erc20.user(tok)(recipient1)
        amount: amt/2
      - account: erc20.user(tok)(recipient2)
        amount: amt/2
  elif recipient1 != 0x00:
    journal:
      debits:
      - account: erc20_balance(tok)
        amount: amt
      credits:
      - account: erc20.user(tok)(recipient1)
        amount: amt
  # else: no lost funds because no transfer is made

def withdraw(tok: ERC20, amt: uint256):
  # implicit lock
  journal:
    debits:
    - account: erc20.user(tok)(msg.sender)
      amount: amt
    credits:
    - account: erc20_balance(tok)
      amount: amt

Unresolved design issues / design weaknesses

Backwards Compatibility

Possibly add a feature to nonreentrancy in private functions such that their reentrancy properties percolate up to all calling functions.

Dependencies

563

1204

Copyright

Copyright and related rights waived via CC0

jacqueswww commented 5 years ago

From Meeting: we will investigate this properly post RC1.

charles-cooper commented 5 years ago

Also: investigate more concise syntax

charles-cooper commented 5 years ago

A couple syntactic alternatives for the journal control structure: Use function call syntax for debits/credits

journal:
  debit(...)
  [more debits]
  credit(...)
  [more credits]

Throw out the journal keyword and use debits and credits sections

debits:
  debit(...)
credits:
  credit(...)

Throw out the journal keyword and use indentation to mark the debits and credits section

debit(...)
[more debits]
  credit(...)
  [more credits]

So far I like the first one best. Here is split_payment_better using this syntax, it's much more concise:

def split_payment_better(tok: ERC20, amt: uint256, recipient1: address, recipient2: address):
  if recipient1 != 0x00 and recipient2 != 0x00:
    journal:
      debit(erc20_balance(tok), amt/2)
      debit(erc20_balance(tok), amt/2)
      credit(erc20.user(tok)(recipient1), amt/2)
      credit(erc20.user(tok)(recipient2), amt/2)
  elif recipient1 != 0x00:
    journal:
      debit(erc20_balance(tok), amt)
      credit(erc20.user(tok)(recipient1), amt) 

Finally, we could get rid of the control structure entirely and just have a sequence of debits and credits

debit(...)
debit(...)
credit(...)
credit(...)

But it would make error messages more opaque if a user accidentally orphans one of the statements.

charles-cooper commented 5 years ago

Suggestion from @jacqueswww since the debits and credits must correspond (at least until we do an SMT checker), we can coalesce each pair into a transfer function. That shortens our function to

def split_payment_better(tok: ERC20, amt: uint256, recipient1: address, recipient2: address):
  if recipient1 != 0x00 and recipient2 != 0x00:
    journal:
      transfer(erc20_balance(tok), erc20.user(tok)(recipient1), amt/2)
      transfer(erc20_balance(tok), erc20.user(tok)(recipient2), amt/2)
  elif recipient1 != 0x00:
    journal:
      transfer(erc20_balance(tok), erc20.user(tok)(recipient1), amt) 

Or without the journal keyword

def split_payment_better(tok: ERC20, amt: uint256, recipient1: address, recipient2: address):
  if recipient1 != 0x00 and recipient2 != 0x00:
    transfer(erc20_balance(tok), recipient1, amt/2)
    transfer(erc20_balance(tok), recipient2, amt/2)
  elif recipient1 != 0x00:
    transfer(erc20_balance(tok), erc20.user(tok)(recipient1), amt) 

I kind of like the special control structure for clarity but it seems it's not necessary.

charles-cooper commented 4 years ago

Cf. https://github.com/flintlang/flint/pull/383 for a related approach with linear types

fubuloubu commented 4 years ago

Interesting implementation here vis a vis pooled values: https://diligence.consensys.net/blog/2020/05/an-experiment-in-designing-a-new-smart-contract-language/