libtom / libtommath

LibTomMath is a free open source portable number theoretic multiple-precision integer library written entirely in C.
https://www.libtom.net
Other
656 stars 194 forks source link

Improve mp_root_n code #532

Open ssinai1 opened 2 years ago

ssinai1 commented 2 years ago

Set a closer initial value for iteration and remove some unnecessary code. By the way, in the first for loop, one of the err=MP_OKAY branches (never reached) did not set c.

sjaeckel commented 2 years ago

@czurnieden could you maybe have a look if you have the time?

You've added parts of the removed code in 984d3ff67990209285b454c189d9a90181ee41ca and I'm not sure whether this should/can really be removed.

ssinai1 commented 2 years ago

I based it on the development branch. I thought it was more up to date.

ssinai1 commented 2 years ago

I don't know if it is how readable in this form:

Starting with x[i+1] = x[i] - f(x[i])/f'(x[i]), where f(x) = x^b - a

So x[i+1] = x[i] - (x[i] ^ b - a) / ( b * x[i] ^ (b-1) ) (this is in the code)

When x[i] ^ b > a, x[i+1] will be rounded up. And because of the convexity, 'x' cannot go below the root even after rounding.

Algebraically: r:=a^(1/n) x[i] - x[i+1] = (x[i] ^ b - a) / ( b x[i] ^ (b-1) ) = (x[i] - r) (x[i]^(b-1) + x[i]^(b-2)r + ... + x[i]r^(b-2) + r^(b-1)) / (n * x[i]^(b-1)) < (x[i] - r) So x[i+1] > r, when x[i] ^ b > a

Hence the code removal (unless I'm missing something...).

czurnieden commented 2 years ago

It's too warm too sleep, so here we go:

You've added parts of the removed code in https://github.com/libtom/libtommath/commit/984d3ff67990209285b454c189d9a90181ee41ca and I'm not sure whether this should/can really be removed.

That was three years ago! But I appreciate your optimism regarding my long-term memory ;-)

I am sure that I did no proper analysis, so there are some places with…let's call it "belt&girdle" code. But better late then never.

The start-value gets calculated based on the equality n^(1/b) = 2^((log_2(n)/b). The function mp_count_bits computes floor(log_2(n)) + 1 and the division by b is truncating so we can only compute 2^( floor( (floor(log_2(n)) + 1) / b ) ) which can be too small in certain circumstances. Simple example: with b=3 and n in {31,32,33} the results are 2, 4, 4 respectively.

With addition of unity we get the inequality 2^( floor( (floor(log_2(n)) + 1) / b ) + 1 ) >= floor(n^(1/b)) but we need x[0] ^ b > a as @ssinai1 has shown but with that correction we would get the results 3, 5, 5 with the example above and because 3^3 < 31 the condition x[0] ^ b > a does not hold anymore. But it also the correct result and the main loop does not change it. Is that always the case for 2^( floor( (floor(log_2(n)) + 1) / b ) + 1 ) == floor(n^(1/b))?

I'm not sure anymore why I added the loop but I also had some code to find a more exact start-value by doing some rounds of bisection (works well for larger bases and also a little bit for smaller bases but it adds a lot of code and nobody wanted it ;-) ), so it might be a leftover from that experiment. But no matter the exact reason: it is superfluous code in case of 2^( floor( (floor(log_2(n)) + 1) / b ) + 2 ) and most likely 2^( floor( (floor(log_2(n)) + 1) / b ) + 1 ), too.

One last question: are you, @ssinai1 sure that the main loop cannot oscillate?

ssinai1 commented 2 years ago

Thanks for the explanation.

My reasoning is as follows: log2(n)/b < (floor(log2(n)) + 1) / b <= ceil((floor(log2(n)) + 1) / b) = (floor(log2(n)) + 1 + b - 1) // b = floor(log2(n)) // b + 1 where // is the integer division (floor(x/y))

This means that not only is 2^( (floor(log_2(n)) + 1) // b + 1 ) > n^(1/b) but also 2^( (floor(log_2(n))) // b + 1 ) = 2^( (mp_count_bits - 1) // b ) + 1 ) > n^(1/b)

As for the second part of the question: using exact arithmetic, as long as x[i]^b > a, it is true that x[i+1] < x[i] and x[i+1] > a^(1/b), and since the code is rounding upwards, there is no possibility that x[i+1] ^ b < a (and so x[i+2] > x[i+1]) due to the imprecision of the integer division.

czurnieden commented 2 years ago

What is that in lines 110ff? Ouch! How could that have slipped through the tests for over three years? Thanks a lot for finding and repairing!

But while we are at it: Two loops with an expensive exponentiation each, that is a lot of overhead!

If we can assume that the error is at most one with a start-value of (floor(log_2(n)) + 1) // b (just mp_count_bits(a) // b for simplicity) the possible outcomes of the test-exponentiation t = r^b are only t == a for a perfect power (return b) and t > a (return b-1) and t < a (return b).

Is that correct?

ssinai1 commented 2 years ago

I rewrote it for downward rounding iteration. The mp_div seems to round towards 0, so it needed the remainder...

czurnieden commented 2 years ago

The mp_div seems to round towards 0

No, it is truncating! E.g.: both 3/2 and -3/2 get truncated to 1 and -1 respectively and not the 1 and -2 a rounding towards zero would give.

But I like where it is going to! Thanks for the work!

BTW: Libtommath is not as frugal as it looks from the outside, it has some useful macros in tommath.h. For example:

#define mp_iszero(a) ((a)->used == 0)
#define mp_isneg(a)  ((a)->sign == MP_NEG)

These are a bit cheaper than a full call to mp_cmp_d and, as I find (YMMV as always), better readable.

If I understood it correctly t3 is never negative and you added a check for t3 == 0 in the loop, so the do-while check can be reduced to simply check if t3 == 1. Something in the line of a macro like e.g.:

#define mp_isone(a) ( ((a)->used == 1u) && ((a)->dp[0] == 1u) )

That macro is not in tommath.h or tommath_private.h. Should it? I don't know, although I use it quite often.

ssinai1 commented 2 years ago

Anyway, in this case a division rounding to negative infinity would be more practical.

sjaeckel commented 2 years ago
#define mp_isone(a) ( ((a)->used == 1u) && ((a)->dp[0] == 1u) )

That macro is not in tommath.h or tommath_private.h. Should it? I don't know, although I use it quite often.

Do you use this often inside the library or in code using libtommath? IMO it'd be fine to go in either of tommath.h or tommath_private.h, depending on what makes sense. (My guess goes to tommath.h but asking just to be sure.)

czurnieden commented 2 years ago

Do you use this often inside the library or in code using libtommath?

Mainly externally, so tommath.h would be ideal. It might be tempting to generalize it but since the size of a mp_digit is not fixed over different architectures it would get too complex.

sjaeckel commented 2 years ago

Sorry @ssinai1 for highjacking your PR for this discussion :)

I've created #537 for that purpose now.

We should continue with this question https://github.com/libtom/libtommath/pull/532#issuecomment-1228932509 now :) (and by 'we' I mostly mean @ssinai1 and @czurnieden :D thanks)

czurnieden commented 1 year ago

October 2022. Oh, my, completely forgotten, sorry!

@ssinai1 if you are still out there: could you please rebase this PR to the current develop branch so I can give it my "seal of approval"? Thank you very much!

sjaeckel commented 8 months ago

@ssinai1 Can you please rebase on top of develop, squash those commits and write a bit of a commit message which summarizes what was done.