nim-lang / Nim

Nim is a statically typed compiled systems programming language. It combines successful concepts from mature languages like Python, Ada and Modula. Its design focuses on efficiency, expressiveness, and elegance (in that order of priority).
https://nim-lang.org
Other
16.44k stars 1.47k forks source link

Add bigints to standard library #14696

Closed uninhm closed 1 year ago

uninhm commented 4 years ago

Summary

Add arbitrary precision integers to standard library

Solution

Add the bigints library (it is in nimble) to standard library

https://github.com/def-/nim-bigints

Araq commented 4 years ago

Yeah, maybe we should really do that. The compiler itself would benefit.

andreaferretti commented 4 years ago

Before sanctioning a particular implementation of bingints into the stdlib, some performance improvements are in order. The fact is, there is also https://github.com/FedeOmoto/bignum, which is a wrapper aroung GMP, and if I recall correctly it is significantly faster (maybe things have improved lately). See also the discussion in https://forum.nim-lang.org/t/3677. The ideal would be to have a fast, pure nim implementation of bigints

timotheecour commented 4 years ago

The ideal would be to have a fast, pure nim implementation of bigints

for something like bigint, I'd say performance matters more than pure nim implementation if I had to choose between the 2, unless performance difference is small enough. But it also has to work at CT, which requires either: CT FFI, pure nim implementation fallback for the VM, or integrating via vmops (which is not ideal as it ties the compiler to a bignum library)

Araq commented 4 years ago

For the Nim compiler I don't care much about the speed, it's much more important that it has no dependencies.

planetis-m commented 4 years ago

def-/nim-bigints is maintained by contributors. It is not a good candidate.

Araq commented 4 years ago

Why not? Sometimes software is finished.

bpr commented 4 years ago

Why not? Sometimes software is finished.

If you look at the TODO, it's clear that this isn't finished yet. It's a good start though.

I think it's important to have a pure Nim BigInt library, rather than calling out to some potentially faster other (C) based library. IMO, Nim is a systems programming language, and if we can't have a Nim (maybe Nim + asm) library that's roughly as performant as a C + asm library, then Nim needs to be enhanced.

alaviss commented 4 years ago

Personally I'd prefer stint due to it's maturity, but it appears that the library depends on parts of nim-stew...

Also, shouldn't something like this be added to fusion instead of the stdlib?

juancarlospaco commented 4 years ago

I think varints can be replaced by a new bigints on stdlib, then stdlib remains kinda constant in size.

Araq commented 4 years ago

I think varints can be replaced by a new bigints on stdlib, then stdlib remains kinda constant in size.

Er uh, I think that's not how it works. ;-)

juancarlospaco commented 4 years ago

I was thinking varints suppose to be a bigint kinda thing but was never completely implemented, sorry.

mratsim commented 4 years ago

GMP is a big no-no due to the license and not working at compile-time

I don't mind contributing a bigint from scratch, it's not like I wrote those 3 times already:

Note that Constantine backend or Stint current refactor are either 2x faster than GMP on modular arithmetic to sometimes slower for bigints below 1024 bits (didn't test much beyond and didn't test on non-modular arithmetic)

The main reasons behind stew dependencies are threefold:

In terms of BigInt usage, I would need to know more about the use-cases:

zah commented 4 years ago

If the compiler needs big integers, shouldn't it finally start to use Nimble packages? stint, as it exists right now is is a mature implementation of fixed-size integers. Duplicating it wouldn't be beneficial for anyone. The notion that the standard library is what the compiler needs to use is a damaging policy that prevents the Nim team to benefit from collaboration with any other team. It wouldn't be that hard to add a bunch of git submodules and -p:<path> directives in the compiler nim.cfg file and solve the problem in an afternoon.

Araq commented 4 years ago

@zah Agreed. However, I'm in no rush in adding big integer support for Nim and busy with other work considered more important.

uninhm commented 4 years ago

If the compiler needs big integers, shouldn't it finally start to use Nimble packages? stint, as it exists right now is is a mature implementation of fixed-size integers. Duplicating it wouldn't be beneficial for anyone. The notion that the standard library is what the compiler needs to use is a damaging policy that prevents the Nim team to benefit from collaboration with any other team. It wouldn't be that hard to add a bunch of git submodules and -p:<path> directives in the compiler nim.cfg file and solve the problem in an afternoon.

But, in competitive programming, the bigints are important, and you can't use external libraries

zah commented 4 years ago

What is competitive programming? Sites like HackerRank? IOI competitions? No libraries allowed there? Can't bring your own hard drive?

mratsim commented 4 years ago

If it helps competitive programming it's a side-effect but I don't think Nim should fall into competitive-programming driven development. Unless those people doing competitive programming have demonstrated that they stick around in the Nim community and so we can rely on them to maintain the code in the standard library (which is the main blocker for any module there).

juancarlospaco commented 4 years ago

Maybe some inspiration can be D stdlib BigInt ?: https://dlang.org/phobos/std_bigint.html

I think competitive programming is an example use case, but maybe others benefit too, like Scientific Nim projects.

timotheecour commented 4 years ago

If the compiler needs big integers, shouldn't it finally start to use Nimble packages?

I agree; what this would require is:

zah commented 4 years ago

@timotheecour, tracking the dependencies as git submodules solves both problems. You only have to do a little bit of manual work in adding the transitive dependencies by yourself.

timotheecour commented 4 years ago

@zah since you brought this up can you please create either an issue or RFC so it can be discussed exclusively and separately in a dedicated location instead of here (it's more general than bigint) ? I'll followup there

timotheecour commented 3 years ago

proposal

mini-gmp (subset of https://gmplib.org/) could be a good candidate to port/wrap:

the only potential concern is licensing; as I understand, dynamic linking to gmp shouldn't cause issues under LGPL; if that's a problem (eg static linking needed), an alternative (slower but binary compatible) implementation could be bundled with nim, and users could opt to use mini-gmp or gmp for performance as needed

GMP is a big no-no due to the license and not working at compile-time

vmops should solve the problem working at CT (that's how new hashes work in VM for eg); and there's also --experimental:compiletimeFFI

benchmark: computing factorial(100) 1 million times

obviously a very partial benchmark but it's a start, feel free to suggest improvements/corrections

(all tests with clang -O2 or similar; on a recent mac)

# times in seconds
gmp: 1.48
EDIT: kaushalmodi/bignum (nimble install bignum): 1.50
mini-gmp: 2.24
python: 11s
std.bigint (D): 12s
nim using pkg/bigints: 14s
nim using pkg/stint: 41s
Big-Integer-C: 45s
tiny-bignum-c: 2260s

D:

for D's std.bigint (https://dlang.org/phobos/std_bigint.html) with:

ldmd2 -of/tmp/z05 -O -inline -release $timn_D/tests/bigints/z04.d
version   1.23.0 (DMD v2.093.1, LLVM 9.0.1)
import std.bigint;
import std.stdio;
void test1(bool isEcho){
  BigInt n = "100";
  BigInt ret = BigInt("1");
  while(n!=0){
    ret = ret * n;
    n = n-1;
  }
  if(isEcho) writeln(ret);
}

void main(){
  for(int i=0;i<1000000;i++){
    test1(false);
  }
  test1(true);
}

Big-Integer-C:

see test here: https://github.com/ilia3101/Big-Integer-C/pull/3

python

def test1():
  n = 100
  ret = 1
  while(n!=0):
    ret = ret * n
    n = n-1
  # print(ret)

def main():
  for i in range(0,1000000): test1()
main()

nim + pkg/bigints

import pkg/bigints

proc test1(isEcho: bool) =
  let n = 100
  var n2 = n.initBigInt
  var ret = 1.initBigInt
  while n2 != 0:
    # ret = ret * n2
    ret *= n2
    n2.dec
  if isEcho:
    echo ret

proc main=
  for i in 0..<1000000:
    test1(isEcho=false)
  test1(isEcho=true)

main()

nim + pkg/stint

https://github.com/status-im/nim-stint

import pkg/stint

proc test1(isEcho: bool)=
  const d = 1024
  var n = 100.stint(d)
  var ret = 1.stint(d)
  while n != 0.stint(d):
    ret = ret * n
    n = n-1.stint(d)
  if isEcho:
    echo ret

proc main=
  for i in 0..<1_000_000:
    test1(isEcho=false)
  test1(isEcho=true)
main()

EDIT: nim + pkg/bignum

this option (https://github.com/kaushalmodi/bignum) hasn't been considered in above thread; it's a fork of https://github.com/FedeOmoto/bignum that got a few updates to make it work (unfortunately under same nimble name see https://github.com/FedeOmoto/bignum/issues/3) it (unsurprisingly) gives same performance as gmp, via a high level wrapper around nim's nim gmp wrapper; test file used in above benchmark:

when true:
  import pkg/bignum
  proc test1(n,ret: var Int, isEcho: bool)=
    discard n.set 100
    discard ret.set 1
    while n != 0:
      ret *= n
      n.dec(1)
    if isEcho:
      echo ret

  proc main=
    var n = 100.newInt
    var ret = 1.newInt
    for i in 0..<1_000_000:
      test1(n, ret, isEcho=false)
    test1(n, ret, isEcho=true)
  main()

links

alternative not evaluated: openssl bn

see https://www.openssl.org/docs/man1.0.2/man3/bn.html, its license should not cause concern

mratsim commented 3 years ago

Hard no for GMP in the compiler. The license would be a nightmare.

Coincidentally, I've started to export Constantine bigints internal into a self-contained variable-size bigint library https://github.com/SciNim/megalo

Regarding Stint perf, the blocker is this: https://github.com/status-im/nim-stint/issues/87 which leads to quadratic bad behavior. A complete rewrite of the backend is WIP: https://github.com/status-im/nim-stint/pull/104

arnetheduck commented 3 years ago

considering that there are 5 options already for bigints in this thread and new ones keep popping up, and that the standard library is a graveyard for unmaintained code given how few people care to keep it up to date, perhaps it's wiser to let the dust settle before adding yet another library?

niekbouman commented 3 years ago

Dear Nim-devs,

About two years ago I wrote a C++ constexpr bigint library (https://github.com/niekbouman/ctbignum)

Recently, I gave Nim a try and translated a small part of the library to Nim. Maybe it could help as inspiration for a standard library bignum implementation.

kind regards, Niek

template doubleWidthType(x: type): type =
  when x is uint8:  
      uint16
  elif x is uint16:  
      uint32
  elif x is uint32:  
      uint64

template bitWidth(x: type): int =
  when x is uint8:  
      8
  elif x is uint16:  
      16
  elif x is uint32:  
      32
  elif x is uint64:  
      64

assert doubleWidthType(uint8) is uint16  
assert bitWidth(uint8) == 8 

func addIgnoreLastCarry[N: static int; T](a: array[N, T], b: array[N, T]): array[N, T] = 
  var carry = T(0)
  for i in 0..<N:
    let aa: T = a[i]
    let sum: T = aa + b[i]
    let res: T = sum + carry
    carry = T((sum < aa) or (res < sum))
    result[i] = res

func mul[M, N: static int; T](u: array[M, T], v: array[N, T]): array[M + N, T] =
  type TT = doubleWidthType(T)
  for j in 0..<N:
    var k = T(0)
    for i in 0..<M:
      var t : TT  = TT(u[i]) * TT(v[j]) + result[i + j] + k
      result[i + j] = cast[T](t)
      k = T(t shr bitWidth(T))
    result[j + M] = k

func partialMul[M, N: static int; T](L: static int, u: array[M, T], v: array[N, T]): array[L, T] =
  type TT = doubleWidthType(T)
  for j in 0..<N:
    var k = T(0)
    var m = min(M, L - j)
    for i in 0..<m:
      var t : TT  = TT(u[i]) * TT(v[j]) + result[i + j] + k
      result[i + j] = cast[T](t)
      k = T(t shr bitWidth(T))
    if j + M < L:  
      result[j + M] = k

func toDigitArray(t: type, str: static string): array[str.len(), t] =
  for i, c in str:
    let x = int(c) - 48
    assert 0 <= x and x < 10
    result[i] = t(x)

func tightLen[N: static int; T](x : array[N,T]) : int = 
  for i in countdown(x.len - 1 ,0):
    if x[i] != 0:
      return i + 1
  return 0

func parseBigintHelper(T: type, x: static string): auto =
  const l = x.len;
  const N = int(1 + (10 * l) / (3 * bitWidth(T)))
  let digits = toDigitArray(T,x)
  var num : array[N, T]
  var powOfTen : array[N, T]
  powOfTen[0] = T(1)
  for i in countdown(l - 1, 0):
    num = addIgnoreLastCarry(num, partialMul(N,[digits[i]], powOfTen))
    powOfTen = partialMul(N,[T(10)], powOfTen)
  return num

func parseBigint(T: type, x: static string): auto =
  const num = parseBigintHelper(typeof T, x)
  const l = tightLen(num)
  var result : array[l, T]
  for i in 0..<l:
    result[i] = num[i]
  return result  

template bn32(x: static[string]): auto =  
  parseBigint(uint32, x)    

const example = bn32"4294967296"
echo example
timotheecour commented 3 years ago

@niekbouman judging from https://github.com/niekbouman/ctbignum/issues/39

With ctbignum, I am focusing on compile-time computations and run-time computations with integers whose limb size are known at compile time, and modular arithmetic with a modulus known at compile-time. I specifically target on operands with a few hundred bits (say, two to four 64-bit limbs), whereas mp++ seems to focus on implementing a "small-vector optimization" for big integers of arbitrary size.

in nim, for CT, performance sensitive code (like bigint/bignum) would best be handled via a vmops instead of relying on VM (eg, see hashVmImplByte), so CT aspect is not a primary concern, but performance at RT is IMO the primary concern. Do you have links to benchmarks comparing ctbignum to mp++ (https://github.com/bluescarni/mppp) ? Even better if it distinguishes between use cases (eg bigints of fixed size vs bigints of arbitrary size)

Maybe it could help as inspiration for a standard library bignum implementation.

help welcome; wrapping might be easier than porting, at least in the meantime

@mratsim

Hard no for GMP in the compiler. The license would be a nightmare.

What about porting and/or wrapping https://github.com/bluescarni/mppp instead:

mratsim commented 3 years ago

mppp wraps GMP

Built on top of the GNU multiprecision stack (GMP, MPFR, MPC)

I looked into the implementation and it's only wrapping which means there is the same licensing woes.

Besides the implementation in cryptographic libraries (which don't support unsigned). The fastest runtime C++ library today is likely to be https://github.com/fabhaas/xenonis With the following limitations:

Alternatively, you can lift LLVM APint https://llvm.org/doxygen/classllvm_1_1APInt.html

So, my take is, it's fine to wrap any of those libraries but in the standard library we should have a pure Nim bigint library that doesn't have C++ gotchas.

Unfortunately for about 40 days my main computer was off just after I started Megalo and now I'm also trying to push CPS forward https://github.com/nim-lang/RFCs/issues/295.

That said, Megalo should be in a shape good enough so that someone can pick up the code and finish the non-compiletime non-assembly implementation:

So what's left?

I think this is less work than figuring out the intricacies of wrapping C++, and the last 3 items have to be done anyway for a wrapper.

Later, optimizing to reach decent performance at minimum requires:

niekbouman commented 3 years ago

I recently learned from Jonathan Protzenko (Microsoft Research) that one of his colleagues, Marina Polubelova, is working on an F* (formally verified) implementation of bignum arithmetic, which compiles to C. That could also serve as a good basis. (They release their work as Apache 2)

https://github.com/project-everest/hacl-star/tree/polubelova_bignum

niekbouman commented 3 years ago

BTW, those benchmarks are a bit cherry-picking; as it only shows 1 and 2-limb sizes.

As it was already concluded in this discussion that GMP is unsuitable for licensing reasons, it could still be interesting (for the sake of inspiration) to have a look at the 'generic' (non-asm) implementations of the mpn api (https://gmplib.org/manual/Low_002dlevel-Functions), and at this post by Fredrik Johansson: https://gmplib.org/list-archives/gmp-devel/2018-April/004853.html He basically says in that post that mpn's mpn_mul is slow for few limbs as it goes through many (runtime) case distinctions, which makes it slow. In Nim, such case distinctions could of course be handled at CT with when. The way out (when using the mpn api) is to use mpn_sec_mul (schoolbook only) or the internal (not part of public api) mpn_mul_basecase.

juancarlospaco commented 3 years ago

Haskell comes with BigInt now.

juancarlospaco commented 3 years ago

BigInt stdlib https://nim-lang.github.io/Nim/jsbigints.html

:)

konsumlamm commented 3 years ago

What are the plans regarding a future std/bigints and std/jsbigints? Will the JS version still be necessary if we have std/bigints? If so, why?

Araq commented 3 years ago

We should only have std/bigints I guess.

arnetheduck commented 3 years ago

like @zah mentioned, there's no practical reason for a bigint library to exist in the std lib since it's easy to develop one outside - if one were to gain traction as the obviously better approach it could get included for convenience, but given that libraries in the std lib make innovation difficult, it seems like a poor approach to prematurely include someones pet peeve approach without it actually being proven by real use in actual applications.

As an alternative, a nice approach would be to support arbitrary size integers as part of the language - at a language level, the compiler could do a better job than a library because it can now optimize operations based on knowledge that is typically not available to libraries, just like a compiler typically does with "normal" integers (though these rules are sometimes a bit unintuitive, given wrapping etc) - http://blog.llvm.org/2020/04/the-new-clang-extint-feature-provides.html is an example of what this looks like.

Araq commented 3 years ago

I fail to see how we need "innovation" on a bignum's interface.

As an alternative, a nice approach would be to support arbitrary size integers as part of the language

I consider this to be a terrible solution. The current type zoo is problematic -- with even more integer types it's a recipe for an unlimited amount of accidentical complexity -- (how can I store my int14 in MariaDB?). But of course, it's hard to argue against what C will provide anyway and we have to provide too, just for compatibility.

juancarlospaco commented 3 years ago

Most people when you say "bigints" think of "arbitrary sized integers" anyway... :shrug:

juancarlospaco commented 2 years ago

C is adding Decimal128, I was wondering if Nim can use that (?) 🤔
https://en.cppreference.com/w/c/23

ghost commented 2 years ago

@juancarlospaco 128 isn't "bigint", also we already have a pure Nim library for decimals specifically https://github.com/JohnAD/decimal128 , so I don't see how is this relevant.

juancarlospaco commented 2 years ago

User needs to pack a fat number in a type basically, therefore int128 and decimal128 is relevant.

bpr commented 2 years ago

This discussion in the Rust belt is relevant. There's an experimental API available on Rust nightly.

TL;DR add four ops that provide carry to the built in integer types which could possibly be implemented as compiler intrinsics or in custom assembly.

juancarlospaco commented 2 years ago

add four ops that provide carry to the built in integer types

Thats really interesting. 🙂:+1:

Araq commented 2 years ago

What's wrong with https://github.com/nim-lang/bigints ?

juancarlospaco commented 2 years ago

LGTM, PR it to the stdlib, title/OP literally is about stdlib. 🙂

BTW so far no critical nor showstopper bugs for std/jsbigints, so looks like it is not too costly to maintain...

SolitudeSF commented 2 years ago

What's wrong with nim-lang/bigints ?

https://github.com/nim-lang/bigints#current-limitations-and-possible-enhancements

juancarlospaco commented 2 years ago

https://github.com/nim-lang/bigints#current-limitations-and-possible-enhancements

I think readme is outdated (?), I see bitwise ops commits in git history.

bpr commented 2 years ago

My link to the Rust API extensions is more about making it easier to implement bignum libraries in general, since I think many of them do implement those basic functions. I know when I started to implement a bignum lib, I did, and I wish that I had access to such an API. It's not about picking a specific implementation to go into some batteries included std lib.

pietroppeter commented 2 years ago

https://github.com/nim-lang/bigints#current-limitations-and-possible-enhancements

I think readme is outdated (?), I see bitwise ops commits in git history.

The readme is in fact outdated. I did fix the issue with self multiplication but it was not yet merged when @narimiran did the fork; it could be easily added. I only recently discovered that @narimiran forked it, added bitwise operators and "fixed" BigInt type (now an object with isNegative as bool field).

The remaining issues would be:

To me (fwiw) it looks anyway good to be added to stdlib. Becoming part of stdlib will likely attract additional effort to improve it. The api looks very natural and we should expect nimble packages to improve on that (there was a very good start by @mratsim with https://github.com/SciNim/megalo that could be picked up again).

narimiran commented 2 years ago

I did fix the issue with self multiplication but it was not yet merged when @narimiran did the fork; it could be easily added.

Self-multiplication is now supported.

The readme is updated to better reflect the reality.

juancarlospaco commented 2 years ago

I think that performance optimizations should not be a showstopper, then the 32-bit limitation can be documented and should be good to go...

arnetheduck commented 2 years ago

+1 for focusing on primitives (such as add-and-carry) in the std library that allow bigint libraries to be developed - bigints come in many different flavors and tradeoffs depending on where and how they're used - the primitives are unlikely to change (given they're rooted in hardware) and are excellent std lib candidates - a bigint library, much less so - there's no compelling reason for having it in the std lib vs a standalone library