vyperlang / vyper

Pythonic Smart Contract Language for the EVM
https://vyperlang.org
Other
4.89k stars 800 forks source link

VIP: 'Bigmath' functions #1086

Open charles-cooper opened 6 years ago

charles-cooper commented 6 years ago

Simple Summary

Add 'bigmath' functions which make it easier for contract authors to do more precise arithmetic.

Abstract

Add functions for dealing more precisely with overflow and underflow, especially where the implementations are not obvious. Specifically, a bigmul function which returns the result of multiplying two 256-bit uints as a pair of 256-bit uints (emulating a 512-bit uint), and a bigdiv function which returns the underflow portion of dividing the high word of a 512-bit uint into a 256-bit uint.

Notes

Motivation

Precise arithmetic is hard. For instance, the naive method to calculate multiplication by a fraction is subject to overflow attacks (consider 10 * 254/255 using 8-bit uints returns 0, instead of 9). Contract authors need a way to keep track of over and underflow without necessarily reaching for a full-fledged bignum implementation. Furthermore, efficient implementations (which take O(1) operations) are not obvious (the naive method being long multiplication or division). The two proposed functions provide a safe, efficient way to keep track of over and underflow, and in theory also provide enough machinery that higher-level abstractions like floating point arithmetic or bignum implementations could be built on top of them.

As an example, multiplying by a rational number could be implemented using these functions as follows:

# Sample implementation
def mul_frac(a: uint256, numerator: uint256, denominator: uint256) -> uint256:
  (r1, r0) = bigmul(a,numerator)
  assert r1 / denominator == 0 # overflow protection
  return bigdiv(r1, denominator) + r0 / denominator # Could be one bit of rounding error here

Specification

Implement the following interface, either as builtin functions or as part of a 'standard library'.

# Returns the result of multiplying two uint256s as an unsigned 512-bit integer, represented as a pair of uint256s.
# The high word should be the first value of the returned pair, and the low word should be the second value of the returned pair.
def bigmul(a: uint256, b: uint256) -> (uint256, uint256):

# Returns the result of ((a << 256) / b) % (2**256). If 512-arithmetic were represented using pairs of uint256, this would be the low word of the quotient of dividing the high word of a by b.
def bigdiv(a: uint256, b: uint256) -> uint256:

The following C code can be taken as a suggestion of how to implement this efficiently.

#include <stdio.h>
#include <stdint.h>

uint16_t bigmul1(uint8_t a, uint8_t b) {
  uint16_t a1 = a;
  return a1 * b;
}

// Return the result of multiplying two uint8_t as uint16_t.
uint8_t mulmod(uint8_t a, uint8_t b, uint8_t k) {
  uint16_t a1 = a;
  uint16_t c = a1*b;
  return (uint8_t)(c % k);
}

// Same thing but without depending on native uint16 multiplication
// Credits: https://medium.com/wicketh/mathemagic-full-multiply-27650fec525d
// Interestingly, the author claims that using this function, and checking the high-word for overflow supposedly takes less gas than a standard safeMul
uint16_t bigmul2(uint8_t a, uint8_t b) {
  uint8_t r0 = a * b; // i.e. mulmod(a,b, 256)
  uint8_t r1 = mulmod(a,b, UINT8_MAX);
  r1 = (r1 - r0) - (r1 < r0 ? 1 : 0); // chinese remainder

  uint16_t r = r1; // r1 is the high word, r0 is the low word.
  return (r << 8) + r0;
}

// Calculate ((a << 8) / b) mod 256, using native words
uint8_t bigdiv1(uint8_t a, uint8_t b) {
  uint16_t a1 = a;
  return (a << 8) / b;
}

// Same thing but without depending on native uint16_t division
uint8_t bigdiv2(uint8_t a, uint8_t b) {
  uint8_t m = UINT8_MAX % b;
  uint8_t r = UINT8_MAX / b;
  return r * a + (m + 1) * a / b;
}

// Run some sample cases, test bigdiv2 against bigdiv1
int main() {
  printf("%d\n", bigmul1(100,5));
  //printf("%d\n", mulmod(100,5, 7));
  printf("%d\n", bigmul2(100,5));
  printf("%d\n", bigdiv1(1,251));
  printf("%d\n", bigdiv2(1,251));
  printf("%d\n", bigdiv1(1,255));
  printf("%d\n", bigdiv2(1,255));
  printf("%d\n", bigdiv1(1,1));
  printf("%d\n", bigdiv2(1,1));
  printf("%d\n", bigdiv1(1,2));
  printf("%d\n", bigdiv2(1,2));
  printf("%d\n", bigdiv1(1,3));
  printf("%d\n", bigdiv2(1,3));
  for (int i = 0; i < 256; i++) {
    for (int j = 1; j < 256; j++) {
      int r1 = bigdiv1(i,j);
      int r2 = bigdiv2(i,j);
      if (r1 != r2) {
          printf("Bad pair (%d,%d) -> %d vs %d \n", i,j, r1,r2);
      }
    }
  }
  printf("end\n");
}

Backwards Compatibility

Existing code may have collisions with the function names.

Copyright

Copyright and related rights waived via CC0

fubuloubu commented 5 years ago

2**512 is astronomically big. It is much harder to make safety guarantees once you go down this path of multiple-register arthimitic. Do you have a use case you can point to that requires doing math that large?

P.S. we have a decimals type.

charles-cooper commented 5 years ago

Sure, any time you want to do accounting precisely. Consider that ERC20 tokens typically have 18 decimal places (roughly 60 bits). Once you start wanting to trade at different prices, make partial fills, do ring trades, the issue of bit truncation starts to become important.

Keep in mind, 2**256 is really big but we still check for overflow because not accounting for overflow results in bugs, and is an attack vector. This is just a generalization for more complex operations.

fubuloubu commented 5 years ago

Every arthimetic operation performed in Vyper is done by first validating it cannot overflow/underflow (reverting if it does), similar to the SafeMath library in common use on Solidity contracts. So that is not a concern.

You are indeed correct that most ERC20 tokens use 18 decimal places as their default, mostly by convention. That leaves over 60 decimal places for the whole number portion, which is much larger than any reasonable project I've encountered. These numbers are large enough to be very, very precise for most reasonable use cases, but if you have a specific example for me to examine, I would like to see it.

These data types are much larger than even what IEEE 64-bit floats can account for, and are much more precise because they are tracked as hard integers.

fulldecent commented 5 years ago

Here is a use case we can study:

https://github.com/compound-finance/compound-money-market

You can deposit funds, maybe a few Ether, and you earn interest PER BLOCK. This might be closer to quantization errors than other projects.


I believe adding functions to the standard library is always a huge deal. Instead, there should be a "contribs" package with all kinds of stuff like this. No need to justify adding something to contribs, just put it in. Then after the market clearly shows this is useful it gets added to the standard library.

jacqueswww commented 5 years ago

As discussed in call, we will leave this ticket open and marked as 'invalid' until we have a library interface we can use. See #885 #483 and #584 for details.

nrryuya commented 5 years ago

One use-case of big integer arithmetic is RSA Accumulator. https://github.com/LayerXcom/verified-vyper-contracts/pull/70

vbuterin commented 5 years ago

IMO if Vyper is going to go down this path, you should consider instead just adding bigmath functions that use bytes[n] variables and the MODEXP precompile. So possibly:

You'd probably want to implement this via a library, similar to the in-Vyper RLP decoding feature.

BTW if you want an efficient way to get the high order bits of 512-bit multiplication, one possible EVM-specific trick is something like (x*y) // 2**256 = mulmod(x, y, 2**256 - 1) - (x*y) (but I haven't verified this, there may be edge cases where it doesn't work so you might need a couple extra clauses).

charles-cooper commented 5 years ago

@vbuterin Thanks for the feedback! I was unaware of the MODEXP precompile and it would probably be a better fit for RSA accumulation. In fact, exposing MODEXP as a library function could be a good thing to expose generally although we would need to think about the API to expose for such arbitrary-width integer types.

BTW, the edge case you are looking for in the 512-bit multiplication is the Chinese remainder expression in the C code I added above - I tried to write it in a way which 'looks' like EVM code.

...
uint8_t r0 = a * b; // i.e. mulmod(a,b, 256)
uint8_t r1 = mulmod(a,b, UINT8_MAX);
r1 = (r1 - r0) - (r1 < r0 ? 1 : 0);
...
charles-cooper commented 5 years ago

Another use case for bigdiv - calculating 1/x.

thibauld commented 5 years ago

Bigmath would be very useful for projects implementing bonding curves (linear and non-linear). In bonding curves, if the buy slope is very small, the token supply and the reserve can quickly become very large and the calculus required often involve to power 2 or power 3 these numbers. We are currently having the issue while implementing the continuous organization reference smart-contract.

jacqueswww commented 5 years ago

@thibaild checkout this file https://github.com/andytudhope/Recollections/blob/master/vyper/math/math.vy for a potential power function in vyper.