Closed uninhm closed 1 year ago
there's no compelling reason for having it in the std lib vs a standalone library
There is one good reason and that is "interop", BigInt is a "vocabulary type" in my view. But it's fine under nim-lang/bigints too, if only people knew about it.
But the primitives are nice to have indeed, regardless.
if only people knew about it
well, since now https://github.com/nim-lang/bigints is an improved fork of https://github.com/def-/nim-bigints we should probably change the repo for bigints package in nimble packages.json.
I am not sure if @def- knows about this recent developments, tagging him so that he can express his feelings about it (and we might need then to add a note to def- repo that redirects to nim-lang fork)
That's cool, I'm fine with using nim-lang/bigints and making mine link to it.
well, bigints are complex beasts - there are many libraries out there because there's no trivial one-size-fits-all solution - for cryptography, one variation will be needed, for finance another and for casual use, a third, each coming with its own performance and convenience tradeoffs - that makes it less of a vocabulary type than one might think, also because it doesn't really have hardware or OS backing and what makes a good bigint changes over time, as platforms and environments evolve.
Adding one to the std lib paints the language into a corner of supporting an implementation with an ABI that's certain to be stale in 2-3 years time.
I agree that there's no one size fits all solution to bigints, but I don't think that should preclude providing a straightforwardly written version in the stdlib or one of the fusion libraries. @arnetheduck, I take it that you're not opposed to the latter, only to putting it in stdlib?
not opposed to the latter,
not at all - what I like to see ideally is investments into features that lower the barrier of using things outside of the std library so as to allow separate release cycles, independent upgrades and a natural way for libraries that have lost maintainers and user interest to go away - putting things in the standard library puts a heavy burden on the core team to maintain it, but also makes it difficult to improve the bigint library itself as well as well as the compiler and language. fusion
at least is disconnected from the library and compiler, in this way, and upgrades to fusion don't block upgrades to nim, and vice versa.
fusion is essentially a curated collection of random stuff, which is fine - by and large, it doesn't affect those that don't use it, and above all, it doesn't affect users of the compiler and std lib, allowing those users to update the two things independently. In a larger code base (such as happens when you start reusing code, because you have a vibrant ecosystem of useful libraries), it's very difficult already to upgrade Nim, and every additional component in there makes that process even harder. This is often not an iussue for devs following devel
and making changes to their personal code together with Nim, but it is a significant issue for users of Nim that are working on production code and rely on the stability of the core Nim offering.
If this is going to be the reference implementation, it would be great if it could be written with Nim destructors in mind. After all, the motivating example in the reference section looks an awful lot like the implementation of the BigInt type.
What do you mean? seq
is built on top of destructors with --gc:arc/orc and the other memory management modes are moribund.
Ha, you'd know best! When started hacking up my own bignum lib for fun I used that demo code on the destructors page for my limbs/digits, probably a waste of time.
That's cool, I'm fine with using nim-lang/bigints and making mine link to it.
@narimiran should we proceed and do this, i.e. make nim-lang/bigints the officially maintained bigints library?
I see the following steps:
it would probably would be good also to:
I can help with some of the above if needed. This is independent of whether or not we decide to put this in standard library.
edit: the better plan now seems to be moving the repo and force push current nim-lang/bigints. the above plan is now obsolete
change link in packages.json from def- to nim-lang
https://github.com/nim-lang/packages/pull/2059
port a recent fix to def-/bigints from contributor (see https://github.com/def-/nim-bigints/pull/38)
Done: https://github.com/nim-lang/bigints/commit/3c634ffcfa513783cfcf72f9b6eea1880b664a8c
add a big disclaimer on def-/bigints to point at nim-lang/bigints
add a big disclaimer on def-/bigints to point at nim-lang/bigints
I agree with @def-. Why fork and redirect instead of moving over the repository? From the outside it looks like there wasn't even an attempt to include them in the discussion. Licensing aside, trying to talk to people first is imo the right way.
nim-lang/bigints was forked some time ago and commit history diverged. I think the effort would be worth though.
After I move the repo, we can force-push the current nim-lang/bigints state onto it. @narimiran is that ok for you?
we can force-push the current nim-lang/bigints state onto it
I thought you might, for example, want to keep operations on BigInt
and int32
, which we deliberately don't support.
I personally don't mind having both repos, especially now when nimble points bigints to the nim-lang version; but if y'all think it is better to just have one repo - fine by me.
I'm not planning to maintain my own nim-bigints repo when there is an official one. Makes more sense to only have one.
@narimiran I get this error trying to move: "nim-lang already has a repository in the def-/nim-bigints network ", I guess you have to remove it first.
@narimiran any update on this? Is it planned to move the repo? I'd like to contribute to nim-lang/bigints, but it still has no issues enabled and I'm not sure if I should wait for a move.
it still has no issues enabled
I'm guessing github disables them automatically for forked repos? Whatever it might be, issues are now enabled.
any update on this?
No updates, sorry: it fell behind some other stuff. But it is time to finally solve this.
I'm not sure how to deal with:
"nim-lang already has a repository in the def-/nim-bigints network ", I guess you have to remove it first.
i.e. if we remove our repo, do we lose our commits? Can we just cherry-pick the missing commits from def-/nim-bigints
?
@narimiran I think one way would be to create a PR from nim-lang/bigints to def-/bigints, merge it, then remove the nim-lang repo and move the def- one?
i.e. if we remove our repo, do we lose our commits? Can we just cherry-pick the missing commits from def-/nim-bigints?
In that case we still have two repos, so nothing won.
@narimiran I think one way would be to create a PR from nim-lang/bigints to def-/bigints, merge it, then remove the nim-lang repo and move the def- one?
Yes, that would work.
@narimiran I think one way would be to create a PR from nim-lang/bigints to def-/bigints, merge it, then remove the nim-lang repo and move the def- one?
2/3 done (PR and remove of nim-lang's version), now it is time to move def-'s one over.
Done: https://github.com/nim-lang/nim-bigints I think I'm missing the permissions to move it from nim-bigints to bigints though (and I'm not 100% sure if Github will keep the redirects properly if we do two in a row)
I think I'm missing the permissions
Speaking of permissions: I don't know if it is now up to you or @Araq, but I will need maintainer permissions for this new repo :)
Not up to me, I don't have permissions to add maintainers either.
Core team group should add more repos.
The double redirect works btw, old link from https://github.com/def-/nim-bigints goes to https://github.com/nim-lang/bigints
I think this can be closed now, since https://github.com/nim-lang/bigints exists.
The library does not even appear in the docs https://nim-lang.org/docs/lib.html
Maybe the issue might be renamed (or closed and open a new issue) for integration within the compiler.
I would like to know what kind of critical feature for the integration is missing, before spending time into efficiency improvement (if I can actually find the time for it).
@mratsim raised an important issue about the library : we should consider using uint64 instead of uint32 which gives better asymptotic algorithms constants for all operations, especially if we implement naive operations like multiplication in O(n^2).
Half the number of limbs, that's four times less operations.
I know that @Araq does not care about performance, but the computations need to stay practically feasible. Benchmarking will enable us to detect regressions and/or improvements. For benchmarking and test purposes, we will need to generate random bigints.
There is still some discrepancies between bigints and int, see : https://github.com/nim-lang/bigints/issues/90 (div/mod behaviour) and : https://github.com/nim-lang/bigints/issues/51 (bitwise operators for negative numbers).
@timotheecour proposed some benchmarking at the beginning of the thread :
benchmark: computing factorial(100) 1 million times obviously a very partial benchmark but it's a start, feel free to suggest improvements/corrections
This kind of benchmark is not a good indicator for at least three kind of reasons. I would like to list them
this is not the algorithm we want to use in practice to compute a factorial. The binary splitting algorithm is more practical than the naΓ―ve algorithm, and depending on the number of digits there is an even better algorithm. GMP comes with a pre-built factorial mpz_fac_ui
see : https://gmplib.org/manual/Number-Theoretic-Functions
I have done some tests with the factorial too at : https://forum.nim-lang.org/t/8967
We are on the order of several seconds for one factorial computation of the six digit number 123456! while it only takes 23ms for dedicated software for arithmetic in Pari/GP (like GMP).
It gets worse when scaling, but isn't it already acceptable timings for what we want to do ?
Anyway, n! has about n*log(n) digits (Stirling's approximation) so 100! has of the order of 600/700 digits. If we divide by 64, we can see, it is only a number of 10 limbs. So we are only testing a very specific case of the library ! We are only testing the multiplication with the factorial computation ! When increasing the number of limbs, memory management and cache tuning becomes more crucial and needs to be taken into account.
we do the same computation of the exact same thing. Processors with predictive branching will optimize repetitive computations. So results may vary greatly between different processors models/manufacturers.
we should launch some unmeasured computation before benchmarking. It "warms up" the processor for the computation. These "blank" computations does not appear in @timotheecour's scripts.
I wonder which operations will be the most useful for the compiler. Which number of limbs interests us ? Aren't bit manipulation like setBit, logical operations and inc, dec operations more needed than costly arithmetic ops (multiplication, division essentially) ? I believe we should tune the library specifically for the needs of the compiler. (If you disagree, please consider explaining why rather than just ππ½-ing this post). The benefits of this library are very limited compared to the SciNim wrapper of GMP for other applications like scientific computation.
There are several other pitfalls in benchmarking, but the best I could think of is to actually test all the exported functions of the libraries on random bigints of given bit sizes and do these on batches of bigints. If we try to measure the timings of just one multiplication operation, it might not be noticeable. So if we want to generate random numbers, and measure multiple multiplications, we don't want to measure the generation of the random numbers (nor the computation of multiplication). But if we try to measure the timings of ten multiplication, we can average the timings. So I would generate a batch of 2*ten BigInts, and do the multiplication on each pair of operands.
P.S: I'll edit the post when I'll be home for simplification. Sorry in the meantime. I am writing this from my job.
The library does not even appear in the docs https://nim-lang.org/docs/lib.html
Yes, but that's a different issue (and it also applies to the other official libraries, like https://github.com/nim-lang/threading). It would be nice to get some more exposure for it though, so that hopefully more people suggest features, report bugs, etc.
Maybe the issue might be renamed (or closed and open a new issue) for integration within the compiler.
Agreed, that's also a different issue.
There is still some discrepancies between bigints and int, see : nim-lang/bigints#90 (div/mod behaviour) and : nim-lang/bigints#51 (bitwise operators for negative numbers).
I'm well aware that there are still problems to be solved (heck, I opened those issues) and functions to be optimized and I appreciate your comments on that, but I don't think this issue is the right place to discuss that, https://github.com/nim-lang/bigints/issues is.
This issue is about adding bigints to std and we have an official library for bigints now, so it has been solved from my POV. @juancarlospaco can you explain why you don't think this issue should be closed?
the issue is about adding a bigints library to stdlib. we have a good candidate (https://github.com/nim-lang/bigints), which is now managed by nim-lang organization, but it still is not inside stdlib, it is an external package. The difference is that you need to install bigints with nimble to access it (it does not ship with nim) and versioning is independent from nim (which has its pros and cons). The issue can be closed when:
I'm well aware that there are still problems to be solved (heck, I opened those issues) and functions to be optimized and I appreciate your comments on that, but I don't think this issue is the right place to discuss that, https://github.com/nim-lang/bigints/issues is.
I know that you are. Others aren't. I wasn't only addressing you. I wanted to summarize important issues that I felt necessary to tackle for integration in the stdlib (before or after v2). Yes people may check https://github.com/nim-lang/bigints/issues but actually nobody has the time to go through each of the ~15 currently opened issues, and the numerous technical posts of each of them, just to understand the state of the library. That is why I created a special issue to do a roadmap (not up to date) https://github.com/nim-lang/bigints/issues/76 as well as to discuss the versioning system.
I have been confused just like you by the difference between standard and official library. So:
Maybe the issue might be renamed (or closed and open a new issue) for integration within the compiler.
"Add bigints to standard library" is actually clear so no need to rename the issue.
Aren't bit manipulation like setBit, logical operations and inc, dec operations more needed than costly arithmetic ops (multiplication, division essentially) ?
No, if bit manipulation was needed a BitSeq is very simple and easy to optimize. Multiplication is the backbone of bigints.
I believe we should tune the library specifically for the needs of the compiler. (If you disagree, please consider explaining why rather than just ππ½-ing this post).
The compiler needs are already covered by int128, which have been tested extensively after the past 3 years, but introduced many regressions at first
Furthermore, the current nim/bigint will be slower because of the limb size and also heap allocation. Combined that with huge regression risks (for example mod/remainder behavior) I only see downside to have the compiler use it.
Re benchmarking:
needs are already covered by int128, which have been tested extensively after the past 3 years
But then int128
needs to be made public and documented properly. π€
No it doesn't, it's compiler only.
A public int128 should instead use the C compiler builtins like: https://github.com/rockcavera/nim-nint128/blob/f88047b/src/nint128/nint128_cint128.nim#L72-L77
I understand, then maybe a public and documented int128
backed by compiler is doable,
just documenting its limitations should be enough.
I post here an old discussion (31 january 2022) on Discord concerning the desire to change to bigints:
https://github.com/nim-lang/bigints/ is official.
Summary
Add arbitrary precision integers to standard library
Solution
Add the bigints library (it is in nimble) to standard library
https://github.com/def-/nim-bigints