Add bigints to standard library

uninhm commented 4 years ago

Summary

Add arbitrary precision integers to standard library

Solution

Add the bigints library (it is in nimble) to standard library

https://github.com/def-/nim-bigints

Araq commented 3 years ago

there's no compelling reason for having it in the std lib vs a standalone library

There is one good reason and that is "interop", BigInt is a "vocabulary type" in my view. But it's fine under nim-lang/bigints too, if only people knew about it.

But the primitives are nice to have indeed, regardless.

pietroppeter commented 3 years ago

if only people knew about it

well, since now https://github.com/nim-lang/bigints is an improved fork of https://github.com/def-/nim-bigints we should probably change the repo for bigints package in nimble packages.json.

I am not sure if @def- knows about this recent developments, tagging him so that he can express his feelings about it (and we might need then to add a note to def- repo that redirects to nim-lang fork)

def- commented 3 years ago

That's cool, I'm fine with using nim-lang/bigints and making mine link to it.

arnetheduck commented 3 years ago

well, bigints are complex beasts - there are many libraries out there because there's no trivial one-size-fits-all solution - for cryptography, one variation will be needed, for finance another and for casual use, a third, each coming with its own performance and convenience tradeoffs - that makes it less of a vocabulary type than one might think, also because it doesn't really have hardware or OS backing and what makes a good bigint changes over time, as platforms and environments evolve.

Adding one to the std lib paints the language into a corner of supporting an implementation with an ABI that's certain to be stale in 2-3 years time.

bpr commented 3 years ago

I agree that there's no one size fits all solution to bigints, but I don't think that should preclude providing a straightforwardly written version in the stdlib or one of the fusion libraries. @arnetheduck, I take it that you're not opposed to the latter, only to putting it in stdlib?

arnetheduck commented 3 years ago

not opposed to the latter,

not at all - what I like to see ideally is investments into features that lower the barrier of using things outside of the std library so as to allow separate release cycles, independent upgrades and a natural way for libraries that have lost maintainers and user interest to go away - putting things in the standard library puts a heavy burden on the core team to maintain it, but also makes it difficult to improve the bigint library itself as well as well as the compiler and language. fusion at least is disconnected from the library and compiler, in this way, and upgrades to fusion don't block upgrades to nim, and vice versa.

fusion is essentially a curated collection of random stuff, which is fine - by and large, it doesn't affect those that don't use it, and above all, it doesn't affect users of the compiler and std lib, allowing those users to update the two things independently. In a larger code base (such as happens when you start reusing code, because you have a vibrant ecosystem of useful libraries), it's very difficult already to upgrade Nim, and every additional component in there makes that process even harder. This is often not an iussue for devs following devel and making changes to their personal code together with Nim, but it is a significant issue for users of Nim that are working on production code and rely on the stability of the core Nim offering.

bpr commented 3 years ago

If this is going to be the reference implementation, it would be great if it could be written with Nim destructors in mind. After all, the motivating example in the reference section looks an awful lot like the implementation of the BigInt type.

Araq commented 3 years ago

What do you mean? seq is built on top of destructors with --gc:arc/orc and the other memory management modes are moribund.

bpr commented 3 years ago

Ha, you'd know best! When started hacking up my own bignum lib for fun I used that demo code on the destructors page for my limbs/digits, probably a waste of time.

pietroppeter commented 3 years ago

That's cool, I'm fine with using nim-lang/bigints and making mine link to it.

@narimiran should we proceed and do this, i.e. make nim-lang/bigints the officially maintained bigints library?

I see the following steps:

[x] change link in packages.json from def- to nim-lang
[ ] add a big disclaimer on def-/bigints to point at nim-lang/bigints
[ ] activate issues on nim-lang/bigints

it would probably would be good also to:

[x] port a recent fix to def-/bigints from contributor (see https://github.com/def-/nim-bigints/pull/38)
[ ] release a new version
[ ] maybe even post on the forum about this for more visibility (and maybe getting a few stars)?

I can help with some of the above if needed. This is independent of whether or not we decide to put this in standard library.

edit: the better plan now seems to be moving the repo and force push current nim-lang/bigints. the above plan is now obsolete

narimiran commented 3 years ago

change link in packages.json from def- to nim-lang

https://github.com/nim-lang/packages/pull/2059

port a recent fix to def-/bigints from contributor (see https://github.com/def-/nim-bigints/pull/38)

Done: https://github.com/nim-lang/bigints/commit/3c634ffcfa513783cfcf72f9b6eea1880b664a8c

pietroppeter commented 3 years ago

add a big disclaimer on def-/bigints to point at nim-lang/bigints

https://github.com/def-/nim-bigints/pull/39

Vindaar commented 3 years ago

add a big disclaimer on def-/bigints to point at nim-lang/bigints

https://github.com/def-/nim-bigints/pull/39

I agree with @def-. Why fork and redirect instead of moving over the repository? From the outside it looks like there wasn't even an attempt to include them in the discussion. Licensing aside, trying to talk to people first is imo the right way.

def- commented 3 years ago

nim-lang/bigints was forked some time ago and commit history diverged. I think the effort would be worth though.

After I move the repo, we can force-push the current nim-lang/bigints state onto it. @narimiran is that ok for you?

narimiran commented 3 years ago

we can force-push the current nim-lang/bigints state onto it

I thought you might, for example, want to keep operations on BigInt and int32, which we deliberately don't support.

I personally don't mind having both repos, especially now when nimble points bigints to the nim-lang version; but if y'all think it is better to just have one repo - fine by me.

def- commented 2 years ago

I'm not planning to maintain my own nim-bigints repo when there is an official one. Makes more sense to only have one.

@narimiran I get this error trying to move: "nim-lang already has a repository in the def-/nim-bigints network ", I guess you have to remove it first.

konsumlamm commented 2 years ago

@narimiran any update on this? Is it planned to move the repo? I'd like to contribute to nim-lang/bigints, but it still has no issues enabled and I'm not sure if I should wait for a move.

narimiran commented 2 years ago

it still has no issues enabled

I'm guessing github disables them automatically for forked repos? Whatever it might be, issues are now enabled.

any update on this?

No updates, sorry: it fell behind some other stuff. But it is time to finally solve this.

I'm not sure how to deal with:

"nim-lang already has a repository in the def-/nim-bigints network ", I guess you have to remove it first.

i.e. if we remove our repo, do we lose our commits? Can we just cherry-pick the missing commits from def-/nim-bigints?

ghost commented 2 years ago

@narimiran I think one way would be to create a PR from nim-lang/bigints to def-/bigints, merge it, then remove the nim-lang repo and move the def- one?

def- commented 2 years ago

i.e. if we remove our repo, do we lose our commits? Can we just cherry-pick the missing commits from def-/nim-bigints?

In that case we still have two repos, so nothing won.

@narimiran I think one way would be to create a PR from nim-lang/bigints to def-/bigints, merge it, then remove the nim-lang repo and move the def- one?

Yes, that would work.

narimiran commented 2 years ago

@narimiran I think one way would be to create a PR from nim-lang/bigints to def-/bigints, merge it, then remove the nim-lang repo and move the def- one?

2/3 done (PR and remove of nim-lang's version), now it is time to move def-'s one over.

def- commented 2 years ago

Done: https://github.com/nim-lang/nim-bigints I think I'm missing the permissions to move it from nim-bigints to bigints though (and I'm not 100% sure if Github will keep the redirects properly if we do two in a row)

narimiran commented 2 years ago

I think I'm missing the permissions

Speaking of permissions: I don't know if it is now up to you or @Araq, but I will need maintainer permissions for this new repo :)

def- commented 2 years ago

Not up to me, I don't have permissions to add maintainers either.

ringabout commented 2 years ago

Core team group should add more repos.

def- commented 2 years ago

The double redirect works btw, old link from https://github.com/def-/nim-bigints goes to https://github.com/nim-lang/bigints

konsumlamm commented 2 years ago

I think this can be closed now, since https://github.com/nim-lang/bigints exists.

dlesnoff commented 2 years ago

The library does not even appear in the docs https://nim-lang.org/docs/lib.html Maybe the issue might be renamed (or closed and open a new issue) for integration within the compiler. I would like to know what kind of critical feature for the integration is missing, before spending time into efficiency improvement (if I can actually find the time for it). @mratsim raised an important issue about the library : we should consider using uint64 instead of uint32 which gives better ~~asymptotic algorithms~~ constants for all operations, especially if we implement naive operations like multiplication in O(n^2). Half the number of limbs, that's four times less operations.

I know that @Araq does not care about performance, but the computations need to stay practically feasible. Benchmarking will enable us to detect regressions and/or improvements. For benchmarking and test purposes, we will need to generate random bigints.

There is still some discrepancies between bigints and int, see : https://github.com/nim-lang/bigints/issues/90 (div/mod behaviour) and : https://github.com/nim-lang/bigints/issues/51 (bitwise operators for negative numbers).

dlesnoff commented 2 years ago

@timotheecour proposed some benchmarking at the beginning of the thread :

benchmark: computing factorial(100) 1 million times obviously a very partial benchmark but it's a start, feel free to suggest improvements/corrections

This kind of benchmark is not a good indicator for at least three kind of reasons. I would like to list them

this is not the algorithm we want to use in practice to compute a factorial. The binary splitting algorithm is more practical than the naïve algorithm, and depending on the number of digits there is an even better algorithm. GMP comes with a pre-built factorial mpz_fac_ui see : https://gmplib.org/manual/Number-Theoretic-Functions I have done some tests with the factorial too at : https://forum.nim-lang.org/t/8967 We are on the order of several seconds for one factorial computation of the six digit number 123456! while it only takes 23ms for dedicated software for arithmetic in Pari/GP (like GMP). It gets worse when scaling, but isn't it already acceptable timings for what we want to do ?
Anyway, n! has about n*log(n) digits (Stirling's approximation) so 100! has of the order of 600/700 digits. If we divide by 64, we can see, it is only a number of 10 limbs. So we are only testing a very specific case of the library ! We are only testing the multiplication with the factorial computation ! When increasing the number of limbs, memory management and cache tuning becomes more crucial and needs to be taken into account.
we do the same computation of the exact same thing. Processors with predictive branching will optimize repetitive computations. So results may vary greatly between different processors models/manufacturers.
we should launch some unmeasured computation before benchmarking. It "warms up" the processor for the computation. These "blank" computations does not appear in @timotheecour's scripts.

I wonder which operations will be the most useful for the compiler. Which number of limbs interests us ? Aren't bit manipulation like setBit, logical operations and inc, dec operations more needed than costly arithmetic ops (multiplication, division essentially) ? I believe we should tune the library specifically for the needs of the compiler. (If you disagree, please consider explaining why rather than just 👎🏽-ing this post). The benefits of this library are very limited compared to the SciNim wrapper of GMP for other applications like scientific computation.

There are several other pitfalls in benchmarking, but the best I could think of is to actually test all the exported functions of the libraries on random bigints of given bit sizes and do these on batches of bigints. If we try to measure the timings of just one multiplication operation, it might not be noticeable. So if we want to generate random numbers, and measure multiple multiplications, we don't want to measure the generation of the random numbers (nor the computation of multiplication). But if we try to measure the timings of ten multiplication, we can average the timings. So I would generate a batch of 2*ten BigInts, and do the multiplication on each pair of operands.

P.S: I'll edit the post when I'll be home for simplification. Sorry in the meantime. I am writing this from my job.

konsumlamm commented 2 years ago

The library does not even appear in the docs https://nim-lang.org/docs/lib.html

Yes, but that's a different issue (and it also applies to the other official libraries, like https://github.com/nim-lang/threading). It would be nice to get some more exposure for it though, so that hopefully more people suggest features, report bugs, etc.

Maybe the issue might be renamed (or closed and open a new issue) for integration within the compiler.

Agreed, that's also a different issue.

There is still some discrepancies between bigints and int, see : nim-lang/bigints#90 (div/mod behaviour) and : nim-lang/bigints#51 (bitwise operators for negative numbers).

I'm well aware that there are still problems to be solved (heck, I opened those issues) and functions to be optimized and I appreciate your comments on that, but I don't think this issue is the right place to discuss that, https://github.com/nim-lang/bigints/issues is.

This issue is about adding bigints to std and we have an official library for bigints now, so it has been solved from my POV. @juancarlospaco can you explain why you don't think this issue should be closed?

pietroppeter commented 2 years ago

the issue is about adding a bigints library to stdlib. we have a good candidate (https://github.com/nim-lang/bigints), which is now managed by nim-lang organization, but it still is not inside stdlib, it is an external package. The difference is that you need to install bigints with nimble to access it (it does not ship with nim) and versioning is independent from nim (which has its pros and cons). The issue can be closed when:

either there is a decision that we will not add bigints to stdlib and we keep bigints as an external package
or we decide to use https://github.com/nim-lang/bigints (or another library) inside stdlib, which usually means the code of https://github.com/nim-lang/bigints will be moved in nim-lang/Nim repo and that repo will be archived/deprecated.

dlesnoff commented 2 years ago

I'm well aware that there are still problems to be solved (heck, I opened those issues) and functions to be optimized and I appreciate your comments on that, but I don't think this issue is the right place to discuss that, https://github.com/nim-lang/bigints/issues is.

I know that you are. Others aren't. I wasn't only addressing you. I wanted to summarize important issues that I felt necessary to tackle for integration in the stdlib (before or after v2). Yes people may check https://github.com/nim-lang/bigints/issues but actually nobody has the time to go through each of the ~15 currently opened issues, and the numerous technical posts of each of them, just to understand the state of the library. That is why I created a special issue to do a roadmap (not up to date) https://github.com/nim-lang/bigints/issues/76 as well as to discuss the versioning system.

I have been confused just like you by the difference between standard and official library. So:

Maybe the issue might be renamed (or closed and open a new issue) for integration within the compiler.

"Add bigints to standard library" is actually clear so no need to rename the issue.

mratsim commented 2 years ago

Aren't bit manipulation like setBit, logical operations and inc, dec operations more needed than costly arithmetic ops (multiplication, division essentially) ?

No, if bit manipulation was needed a BitSeq is very simple and easy to optimize. Multiplication is the backbone of bigints.

I believe we should tune the library specifically for the needs of the compiler. (If you disagree, please consider explaining why rather than just 👎🏽-ing this post).

The compiler needs are already covered by int128, which have been tested extensively after the past 3 years, but introduced many regressions at first

Furthermore, the current nim/bigint will be slower because of the limb size and also heap allocation. Combined that with huge regression risks (for example mod/remainder behavior) I only see downside to have the compiler use it.

Re benchmarking:

often bigint libraries I used for RSA demos so modular exponentiation should be a benchmark: https://forum.nim-lang.org/t/7276
similarly modular inversion is a key operation in many cryptographic protocol and it uses GCD so a GCD benchmark is useful
As a reference you can use Constantine benchmarks and just benchmark everything:
- https://github.com/mratsim/constantine/blob/7d29cb9/benchmarks/bench_fields_template.nim#L30-L61
- https://github.com/mratsim/constantine/blob/7d29cb9/benchmarks/bench_blueprint.nim

juancarlospaco commented 2 years ago

needs are already covered by int128, which have been tested extensively after the past 3 years

But then int128 needs to be made public and documented properly. 🤔

mratsim commented 2 years ago

No it doesn't, it's compiler only.

A public int128 should instead use the C compiler builtins like: https://github.com/rockcavera/nim-nint128/blob/f88047b/src/nint128/nint128_cint128.nim#L72-L77

juancarlospaco commented 2 years ago

I understand, then maybe a public and documented int128 backed by compiler is doable, just documenting its limitations should be enough.

dlesnoff commented 2 years ago

I post here an old discussion (31 january 2022) on Discord concerning the desire to change to bigints: 2022-08-03-231508_679x541_scrot

Araq commented 1 year ago

https://github.com/nim-lang/bigints/ is official.

nim-lang / Nim

Add bigints to standard library #14696

Summary

Solution