lukebhan / numlib

A completely self-dependent aribitrary precision numerical library.
0 stars 1 forks source link

Bignum internal representation #4

Closed timothycobrien closed 4 years ago

timothycobrien commented 4 years ago

Any reason why the internal representation is as decimals and not as binary? Just curious @luke-bhan

lukebhan commented 4 years ago

It should certainly should be binary, I was just super dumb when I wrote this.

timothycobrien commented 4 years ago

Haha no worries, let's just write the remaining functions for decimal for now I think and converting to binary should not be particularly difficult? Just change internal representation and then modify ctor and to_string?

lukebhan commented 4 years ago

sounds good. Does your div method keep the remainder as well? It would be nice because our modulo function is super slow as of right now.

timothycobrien commented 4 years ago

As of right now, no, it should be an exact division I think.

Obviously for integers, this is incredibly easy to convert to a modulo by truncating the non-integer part of the quotient, multiplying back through, and subtracting, but I'm curious if this is as performant as we'd like.

Doing some research on my method w.r.t modulo now, will update

timothycobrien commented 4 years ago

Ok, so there are definitely better algorithms, but I'm not sure if it's worth the time because this stuff is wack

https://eprint.iacr.org/2014/755.pdf

lukebhan commented 4 years ago

I think we can just use the simple form of the barrett reduction for mod honestly.

timothycobrien commented 4 years ago

Ok nevermind, seems like these all use bit-shift, so if we decide to implement them @luke-bhan we should hold off

ayazhafiz commented 4 years ago

My two cents:

What is a bignum?

There are a few reasons you might want to use a high-fidelity, large "number"-like container:

  1. You need to store a very big integer
  2. You need to store a very big float
  3. You need to store a float to arbitrary precision
  4. You need to store a float to a some precision N
  5. You need to store a rational number that is not well-formed in binary
  6. ...

Which of these use cases should a bignum container support? It's impossible to support all of these cases, at least not in a way that would be useful practically. For example, consider we want to compute

7/22 - \pi

On the LHS we have a well-defined rational, on the RHS we have an irrational. So the options are

  1. return an arbitrary-precision float
  2. return a low-fidelity N-precision float, whose error is now greater than that of \pi
  3. throw your hands up and say it can't be subtracted

Out of these choices, I think only (1) is reasonable. But arbitrary-precision floats are expensive (basically it just has to carry around a stack call with it), and maybe the user doesn't want to give up the small size of a rational against an irrational.

So what should be included in a bignum container? Left to a single container, rational people can disagree. Instead, consider supporting multiple containers. NPrecision, BigInt, Rational, etc. These containers can share a common interface that makes interaction easy, but crossing container boundaries is left to the discretion of the user.

Not saying that this all needs to be implemented right now, but it might be worth thinking about.

How to bignum?

I don't know the answer to this one; many papers have been written about this. But on intuition maybe there are a couple of ideas.

Decimal vs Binary representation

Binary is almost certainly a more efficient choice, but you have to be careful here. For example, in storing an arbitrary-precision float you cannot store the underlying number in decimal (tangent for encoding, see below) just due to fallible base 10 -> base 2 mapping. On the other hand, for an integer this works great. This is another way supporting multiple containers may excel -- you can optimize for different things in each one.

Binary encoding

So let's say you want to store some decimal number. Previously we did this by keeping a contiguous array of u8s where each u8 represents a digit. This is definitely inefficient because you're throwing 5 bits on a number that only needs 3.

A trivial improvement is to encode the decimals in binary (okay, pedantically this was done before too, but here we're being explicit about it). Maybe we want to store 12 -- this could be something like 1 << 3 + 2. Now you get to store 2 decimals per u8; use a u128 and you can store 46 (!!). This means for many cases, you might not even need to allocate memory at all!

lukebhan commented 4 years ago

The multiple container Idea is very smart. I think we should strongly consider that and start refactoring to just use a BigInt for now. Also, I think the bignum idea could seriously benefit from setting up some project structure (@ayazhafiz did tell me this was a real project, when I tried to hack my way through it naively) Now, for the real question, do you guys really have an interest in building this out. It seems as if rug is built almost exactly as Ayaz recommends from their docs and I personally don't see a dying need for bignum in the rust community. I wouldn't mind maybe finishing up division, exp and leaving this as a hacked together repo (maybe even get rid of it) and move more focus to slide. Thoughts @ayazhafiz @timothycobrien

timothycobrien commented 4 years ago

On the whole, makes a lot of sense Ayaz.

Agree with most of the points, what I will say is that while 7/22 - \pi is a subtraction we may want to support, I also see merit in letting a user decide whether or not an "implicit" conversion like this makes sense. Moreover, I think it makes sense that 7/22 and \pi would be different types of containers i.e. a rational and a transcendental(? symbolic maybe?).

timothycobrien commented 4 years ago

The multiple container Idea is very smart. I think we should strongly consider that and start refactoring to just use a BigInt for now. Also, I think the bignum idea could seriously benefit from setting up some project structure (@ayazhafiz did tell me this was a real project, when I tried to hack my way through it naively) Now, for the real question, do you guys really have an interest in building this out. It seems as if rug is built almost exactly as Ayaz recommends from their docs and I personally don't see a dying need for bignum in the rust community. I wouldn't mind maybe finishing up division, exp and leaving this as a hacked together repo (maybe even get rid of it) and move more focus to slide. Thoughts @ayazhafiz @timothycobrien

BigInt is nice, because an arbitrary precision float can be converted to a BigInt by multiplying by the base^power.

I think implementing it could be worthwhile, but there's definitely an argument to be made for not reinventing the wheel.

ayazhafiz commented 4 years ago

Now, for the real question, do you guys really have an interest in building this out. I personally don't see a dying need for bignum in the rust community. I wouldn't mind maybe finishing up division, exp and leaving this as a hacked together repo (maybe even get rid of it) and move more focus to slide.

Well it depends what the motivation here is. If it's to make a production-ready arbitrary-precision library, that goal will never be hit. If it's to learn, the story is different. Personally I am happy to learn, just hearing how you guys are implementing this thing is teaching me a lot.

Re slide: these two projects are orthogonal. The only blocker to contributing to both is time. I think it is reasonable to continue working on both, balancing the % time spent on what needs attention and is interesting.

I also see merit in letting a user decide whether or not an "implicit" conversion like this makes sense. Moreover, I think it makes sense that 7/22 and \pi would be different types of containers i.e. a rational and a transcendental(? symbolic maybe?)

Yeah, sadly implicit conversions are nearly impossible in Rust. But you can hack around it with something like type erasure and boxed downcasting..

Symbolic sounds like a great idea

timothycobrien commented 4 years ago

I'm having a lot of fun working on this, and it's helping me get up to speed with this unholy language (if I wanted a unique pointer I would make one). So I'm down to keep working on it.