vyperlang / vyper

Pythonic Smart Contract Language for the EVM
https://vyperlang.org
Other
4.86k stars 794 forks source link

VIP: Stack/Queue Support #484

Open fubuloubu opened 6 years ago

fubuloubu commented 6 years ago

Preamble

VIP: 484 Title: Stack/Queue Support Author: @fubuloubu Type: Standard Track Status: Draft Created: 2017-11-19

Simple Summary

Allow stacks and queues, which are infinite size lists with constant gas interaction methods (Create, Retrieve, Append, and Pop methods)

Abstract

One of Viper's core concepts is to avoid unpredictable gas costs per transaction by limiting the usage of potentially exploitable data structures such as non-fixed lists. However, there are use cases for a variable-sized list that would have methods to interact with that data structure in order to prevent variable gas costs (e.g. push(), pop(), etc.) I know @vbuterin, @DavidKnott, and I discussed one potential implementation (basically using a mapping under the hood), but I couldn't find that in the issues and wanted to formally record it before tomorrow's call.

Motivation

By allowing these limited type of data structures, we will enable more potential use cases without violating this major precept of the language. This will help adoption and encourage thoughtful programming patterns.

Specification

The crowd-funding example here is a great use case for a queue-type data structure. When a user participates in the crowdfunding campaign, they are added via the following code:

def participate():
    assert block.timestamp < self.deadline
    nfi = self.nextFunderIndex
    self.funders[nfi] = {address: msg.sender, value: msg.value}
    self.nextFunderIndex = nfi + 1

This could be reduced to the following:

def participate():
    assert block.timestamp < self.deadline
    self.funders.push({address: msg.sender, value: msg.value})

This method would have constant and more optimal gas cost due to optimizations that can happen behind the scene for the self.funders datatype, instead of managing the funder index for the funder mapping.

When the crowd-funding campaign does NOT succeed, typically there is a method to refund all participants. Because the self.funders list is variable-length, we want to avoid exceeding the gas limit by refunding a defined number of participants at a time. The code to do this in the example contract is:

def refund():
    assert block.timestamp >= self.deadline and self.balance < self.goal
    ind = self.refundIndex
    for i in range(ind, ind + 30):
        if i >= self.nextFunderIndex:
            self.refundIndex = self.nextFunderIndex
            return
        send(self.funders[i].sender, self.funders[i].value)
        self.funders[i] = None
    self.refundIndex = ind + 30

We can reduce this to:

def refund():
    assert block.timestamp >= self.deadline and self.balance < self.goal
    for funder in self.funders.popFirst(30):
        send(funder.address, funder.value)
    # To do this one at a time, we would do:
    #funder = self.funders.pop()
    #send(funder.address, funder.value)

    # Also perhaps at this point we could do
    if self.funder.length == 0:
        selfdestruct(self.owner) # Remove campaign contract

The popFirst() method of the variable list would take the up to the first 30 items in the list, remove them from the list, and perform the given operation over the iteration. This works because the method removes the associated members from the list meaning those members would no longer be returned if the popFirst() method were to be called again. The user(s) of this method would need to make sure that they called this method enough times to completely exhaust the list, and perhaps popFirst() would throw if the method is called when there are zero members of the returned list to prevent calling a method relying on the operation of this key method over and over again.

As a more complete example, imagine there was a method to undo your contribution to a crowd-funding campaign while it was still in process. That might look like:

def undo(index): # index could possible be a hash instead
    assert block.timestamp < self.deadline
    funder = self.funders.popAt(index)
    assert msg.sender == funder.address
    send(funder.address, funder.value)

In this example, the popAt() needs the specific index that you are looking to use to remove the entry from the list. In order to find this entry, you might get the index from a receipt from when you were added to the list. This might be problematic if num is used for indexing these types of lists, so I propose a method more similar to NoSQL-style databases (e.g. [Firebase]()) where a hash is used for the index (and the members are kept as a singly-linked list) so that you directly reference entries instead of needing to keep a running count of the index as the queue is added to or popped from. This might look like:

var_list = {length: num, list: [hash, item, nextHash] -> [hash, item, nextHash] -> ... [hash, item, 0]}

I'm not sure if this would be implemented as an update to the existing list type (usages of range(0, N) would instead use either i in list syntax if the length of the list is known/within gas limits, or be limited to popping chunks in order to operate over the entire list). This type would only be allowed as global, and would also need a length member to know the length of the list (@public globals would have get_[list]_length() and get_[list]_item_at(index) getter methods).

I hope this fully describes the feature I am proposing, and have adequately shown that this feature is:

  1. Useful
  2. Clearer than current solutions
  3. Does not violate Viper precepts such as variable gas

Backwards Compatibility

Depending on implementation, this may or may not affect the operation of the fixed-length list data structures we already have in place. It is still early, so we should consider this change now

Copyright

Copyright and related rights waived via CC0

fubuloubu commented 6 years ago

Alternatively, more of a functional syntax for pop()/popAt(idx):

with self.funders.pop() as funder:
    send(funder.address, funder.value)
fubuloubu commented 6 years ago

And if you wanted to get the information about your donation:

@constant
def get_record(index):
    return self.funder.retrieveAt(index)
fubuloubu commented 6 years ago

Made a mistake, would have to be a doubly linked list to support popAt() and retrieveAt().

I actually think we can do a trivial POC in Viper as is. I'll look into this tomorrow.

fubuloubu commented 6 years ago

@jacqueswww What do you think about revisiting this data type as a new feature?


Stacks/Queues are a very useful datatype for Plasma frameworks, which seek to manage a potentially unlimited length list (of root hashes), but only need constant-time access via a restricted API. We could design such a datatype under the hood to maintain Vyper's restrictions on constant gas computatability, but add the ability to iterate over them via:

recipient: address

Contributor = struct({
    acct: address,
    amt: uint256
})
contributors: List(Contributor)

@public
def __init__():
    self.recipient = msg.sender

@public
@payable
def contribute():
    contributors.append({acct: msg.sender, amt: msg.value})
    # NOTE: compiler error if appending mismatched datatype

@public
def getRefund(idx: uint256):
    c: Contributor = contributors[idx]  # Can access via an index, throws if not in [0, len)
    assert c.acct == msg.sender
    contributors.remove(idx)  # Can delete entry in list (protects against re-entrancy here)
    # NOTE: This is probably the hardest function to provide...
    msg.sender.send(c.amt)

@public
def refundAll():
    assert msg.sender == self.recipient
    # Refund up to 30 funders
    for idx in range(30):
        if contributors.length == 0:  # could also be `len(contributors)`
            break  # Stop iteration if stack is empty
        c = contributors.pop(0)  # throws if length is 0, hence the check above
        c.acct.send(c.amt)

The last function could also have a special API for iteration like:

@public
def refundAll():
    assert msg.sender == self.recipient
    # Refund up to 30 funders
    for c in contributors.popFirst(30):  # Obtains <= 30 members of the array
        c.acct.send(c.amt)
    # NOTE: `for x in List.popFirst(N)` is a macro substitute for the above loop
    #       `for x in List.popLast(N)` is the same macro using `List.pop()` instead

This could serve as a basis for the API of such a type: https://docs.python.org/3/tutorial/datastructures.html

fubuloubu commented 6 years ago

The List.remove(idx) functionality would be really hard..., but a pure stack/queue could definitely be supported as a minimal PoC. If we disallowed removals, you could still access in constant time (by using a mapping behind the scenes).

Adding removals would make that very difficult to contain without a lot of overhead.


Perhaps NOT calling it a list, and using .push(item) and .pop() -> item API would make it clearer that it doesn't support Python's List APIs

charles-cooper commented 5 years ago

I really like this idea of somehow supporting iterators/generators at the language level.

BTW, remove(idx) is doable in constant time/gas with a doubly-linked list.

struct Node : 
  val: int128
  next: uint256
  prev: uint256
nodes: map(uint256, Node)
head: uint256
tail: uint256
def remove(ix: uint256) : 
  assert head <= ix
  assert ix <= tail
  node = nodes[ix]
  prev = node.prev
  next = node.next
  next.prev = prev
  prev.next = next
  del nodes[ix]
  # handle edge cases for head/tail/empty list/singleton list
fubuloubu commented 5 years ago

Nice! Yeah, that seems obvious it is "O(1)" in retrospect haha.

Yeah, I would think this would probably be very valuable if Node.val was another struct like a Block, Exit or Deposit (as in Plasma)