libtom / libtommath

LibTomMath is a free open source portable number theoretic multiple-precision integer library written entirely in C.
https://www.libtom.net
Other
650 stars 194 forks source link

API discussions/overhaul #243

Closed minad closed 5 years ago

minad commented 5 years ago

Starting from the discussion in #240 with @czurnieden, I thought a bit about the exposed API.

It might make sense to do a bit of an overhaul and make some functions private. In particular mp_digit is exposed to often in the public API and it would be more useful to provide functions taking signed ints or longs. I consider mp_digit to be more of an implementation detail.

Deprecation of the mp_*_d functions could be done gracefully since they can be replaced by the more general mp_*_int functions.

~What could also be discussed is going to defined bit sizes since we include stdint.h and provide functions like mp_add_i32 instead of mp_add_int/mp_add_long. This would make the API platform independent. However this might be a bit of a too large deviation from how things are now.~ (DONE)

~Furthermore mp_set is not needed anymore as soon as ~#221~ #253 has been implemented. Right now mp_set still makes sense since it is guaranteed to not allocate.~ (KEEP)

Then there are a few functions which seem to be internal and it might make sense to not expose them.

~Then there is the rand_digit function which is a remnant of before #236 and could be deprecated.~ (DEPRECATED)

~Concerning the bit manipulation functions, there is mp_get_bit but no mp_set_bit. Either add mp_set_bit or internalize mp_get_bit as s_mp_get_bit?~

~The one complement bitwise functions could also be internalized, since the two complement functions for positive ints are the equivalent.~

czurnieden commented 5 years ago
minad commented 5 years ago
sjaeckel commented 5 years ago
* concerning the fast parameters, do they signify non-constant-time. I would suggest to provide instead to be more explicit, mp_n_root_consttime or mp_expt_consttime

yeah, that would've been a better solution ...

* `mp_n_root_ex`  (what is this fast parameter for?)
  @sjaeckel added it once for heimdahl (see #66) don't know what happened to it since. i also remember that the theme "is your mp_n_root implementation constant time" came up later but can't find it quickly. Heimdahls problem was not with `mp_n_root` but with `mp_expt_d` and that's the reason `mp_expt_d_ex` has a `fast` flag.

... but I was young, heimdal was complaining ... and in the end never updated ltm to the version that introduced this ... :-\

mp_expt_d, mp_expt_d_ex -> mp_expt_int (not yet available)

TBH I don't know why it wouldn't make sense to provide a mp_expt() (and then also provide mp_expt_int(), mp_expt_long(), mp_expt_long_long() which could then simply be mp_set_X() + mp_expt())

While looking at the mp_expt_d_ex() implementation I realized that the 'supposed to be constant time' implementation also isn't ... or am I wrong there?

The rest of the to-be-deprecated/-internalized functions LGTM

* @czurnieden is your sieve as fast as this one and can really replace it as the first step of is_prime? Then I am all for replacing.

good question!

czurnieden commented 5 years ago

(I took one day off and all hell broke loose here! ;-) )

is your sieve as fast as this one and can really replace it as the first step of is_prime?

Is a small table, tiny enough to fit in the L0 cache of the bigger CPUs faster than a full sieve? Mmh… ;-)

But the sieve fits in the L1 cache, so the second time it is needed it runs faster. The only "benchmark" I made is the observation, that computing the primesum up to 2^32 needs as much time as PARI/GP but generating the primes is a very small part of computing the primesum, admitted.

The same with is_prime: the time needed to generate the primes/loading the table is negligible in contrast to the work needed to do the actual trial- divisions.

Ok, ok, ok, I'll do a benchmark this WE.

While looking at the mp_expt_d_ex() implementation I realized that the 'supposed to be constant time' implementation also isn't ... or am I wrong there?

Well, strictly spoken: it is not. Nothing in LTM is. You need more than the high-level functions run in constant-time, you need the basic arithmetics, too, even down to CPU-level. See e.g.: Bear-SSL line 713ff "Constant-time primitives" (example chosen for legibility, nothing overly special with Bear-SSL, although it can be made quite small). Yes, an extreme example, I know.

Uhm… where was I? Oh, yes. It is not constant-time as it is because you only multiply if the bit is set, You should always multiply, no matter what, either with g or with 1. But that is for the branching only, you would need constant time methods for squaring and multiplying, too. All "easy" with modular exponentiation, not so much with the normal one. But it is a quick hack to include the always-multiply-no-matter-what optimization and might even silence the conscience a bit.

Is normal exponentiation even used in cryptography? Evaluation of the nth-root?

czurnieden commented 5 years ago

is your sieve as fast as this one and can really replace it as the first step of is_prime?

A first quick test (no full benchmark!) WIth MP_DIGIT_BIT == 60 on a 64-bit machine I got for generating one 1024-bit prime: 37 sec with the sieve (first 1,000 primes for trial division and comparing) and 49 sec. with the table. I don't think that the error-bar is that large here, so for this(!) test the sieve is faster.

More interesting will be the results for low-mp, especially for MP_8BIT, which cannot hold most of the 1,000 primes in a mp_digit and must do bigint arithmetic instead. But it's gone late now, so that's for later in the evening.

minad commented 5 years ago

Cool. Then we can definitely use your sieve instead 😊

czurnieden commented 5 years ago

Cool. Then we can definitely use your sieve instead

Um…it is a bit more complicated than that, of course, the error bar is really that big.

Current benchmark, nothing fancy, just to get a hint for the direction it goes.

All tests with 1,000 primes for trial-division, 1,000 for comparing, and 3 runs if not noted otherwise.

MP_64BIT, table
time for i in {1..1000};do ./randprime 64;done       26 - 28 seconds     ( ~4 seconds sys.)
time for i in {1..100};do ./randprime 128;done       9 - 10 seconds      ( .75 - 1 second sys.)
time for i in {1..100};do ./randprime 256;done       48 - 65 seconds     ( 3 - 4 seconds sys.)
time for i in {1..100};do ./randprime 512;done       7.5 - 8 minutes     ( ~20 seconds sys.)
time for i in {1..10};do ./randprime 1024;done       7.5, 16.75, 24 minutes (10, 22, 31 seconds sys.)

MP_64BIT, sieve
time for i in {1..1000};do ./randprime 64;done       43 - 47 seconds     ( 3.5 - 4 seconds sys.)
time for i in {1..100};do ./randprime 128;done       16 - 19.5 seconds   ( .75 - 1 second sys.)
time for i in {1..100};do ./randprime 256;done       90 - 94 seconds     ( ~3.4 seconds sys.)
time for i in {1..100};do ./randprime 512;done       10 - 12 minutes     ( 20 - 24 seconds sys.)
time for i in {1..10};do ./randprime 1024;done       11, 13, 18 minutes  ( 14, 17, 22.5 seconds sys.)

MP_64BIT, sieve, 256 primes for trial-division and 256 for comparing

time for i in {1..1000};do ./randprime 64;done       28 - 29 seconds     ( ~4  seconds sys.)
time for i in {1..100};do ./randprime 128;done       9 - 12 seconds      (.75 - 1 second sys.)
time for i in {1..100};do ./randprime 256;done       58 - 68 seconds     ( 3.2 - 3.7 seconds sys.)
time for i in {1..100};do ./randprime 512;done       7 -  9 minutes      ( 20 - 24 seconds sys.)
time for i in {1..10};do ./randprime 1024;done       11, 11, 15 minutes  (14, 14, 18.5 seconds sys.)

MP_8BIT, table
31 primes for trial-division and 31 for comparing
time for i in {1..1000};do ./randprime 64;done       41 - 43 seconds     ( ~5 seconds sys.)
time for i in {1..100};do ./randprime 128;done       45 - 48 seconds     ( ~1 seconds sys.)
time for i in {1..100};do ./randprime 256;done       11 - 13 minutes     ( 3.5 - 4 seconds sys.)

MP_8BIT, sieve, bigint 
1,000 primes for trial-division and 31 for comparing
time for i in {1..1000};do ./randprime 64;done       3.5 - 4 minutes     ( ~5 seconds sys.)
time for i in {1..100};do ./randprime 128;done       2.5 - 3.5 minutes   ( 1 - 1.2 seconds sys.)

MP_8BIT, sieve, bigint
256 primes for trial-division and 31 for comparing
time for i in {1..1000};do ./randprime 64;done       91 - 94 seconds     ( ~4.5 seconds sys.)
time for i in {1..100};do ./randprime 128;done       72 - 89 seconds     ( 1 - 1.2 seconds sys.)
time for i in {1..100};do ./randprime 256;done       17 - 19 minutes     ( 3.9 - 4.4 seconds sys.)

256 primes is the size of the original table, 31 that for MP_8BIT. It is ok for MP_64BIT, the difference is negligible but the use of bigint ruins it for MP_8BIT. I can get it even with the table with 64 primes.

But is that worth the change of the API (the sieve needs to be initiated in mp_prime_rand and transported from there) without any of the other functions that sieve is supposed to support later?

I don't know *sigh*

minad commented 5 years ago

@czurnieden Thx for the numbers, I don't think MP_8BIT is the use case we should optimize for.

One general thing about the API overhaul: At some places it makes sense to modify the type signatures of functions, for example the return type of mp_set_int to void in #253. What could be done about such cases? Introduce another function mp_set_int_with_some_other_name? The goal should be that we don't end up with ugly names after the overhaul, e.g. mp_set_int_v2... Hmm...

minad commented 5 years ago

For init and set I would propose the following to get/set signed values. Deprecate all mp_set, mpset, mp_init_set_int, mpget. Furthermore add this:

Variant I:

int mp_init_setl(mp_int*, long);
void mp_setl(mp_int*, long);
void mp_setll(mp_int*, long long);
void mp_setull(mp_int*, unsigned long long);
long mp_getl(const mp_int*);
long long mp_getll(const mp_int*);
unsigned long long mp_getull(const mp_int*);

(int mp_init_seti(mp_int*, int);)
(int mp_geti(const mp_int*);)
(void mp_seti(mp_int*, int);)

Variant II (naming bikeshed):

int mp_init_set_l(mp_int*, long);
void mp_set_l(mp_int*, long);
void mp_set_ll(mp_int*, long long);
void mp_set_ull(mp_int*, unsigned long long);
long mp_get_l(const mp_int*);
long long mp_get_ll(const mp_int*);
unsigned long long mp_get_ull(const mp_int*);

Variant III (sized api):

int mp_init_set32(mp_int*, int32_t);
void mp_set32(mp_int*, int32_t);
void mp_set64(mp_int*, int64_t);
void mp_setu64(mp_int*, uint64_t);
int32_t mp_get32(const mp_int*);
int64_t mp_get64(const mp_int*);
uint64_t mp_getu64(const mp_int*);

This will be the first easy step. Then the add_d, sub_d, ... apis could follow in a similar fashion. Note that for those we would probably only add the long variant.

@sjaeckel @czurnieden Please tell me which variant you prefer.

czurnieden commented 5 years ago

I don't think MP_8BIT is the use case we should optimize for.

I do have some pride left! ;-)

At some places it makes sense to modify the type signatures of functions, for example the return type of mp_set_int to void in #253. What could be done about such cases?

Going from returning a type to returning nothing is not much of a problem, only the other way around.

Please tell me which variant you prefer.

Mmh…do you have a variant IV? ;-)

Var. III is of course the most precise one but one of the users of LTM does not like it if they have to include stdint.h.

What about keeping the old names and just add unsigned variants? That changes the behaviour instead of changing it all completely but it would be easier for the user to adapt their code, just do a global s/set_long_long/set_unsigned_long_long/g and so on. And then add Var. III on top of that. If some users don't want them they can remove them from the build. Main disadvantage: that's a lot of code to maintain.

minad commented 5 years ago

Var. III is of course the most precise one but one of the users of LTM does not like it if they have to include stdint.h.

Note that we already include stdint.h. I would prefer variant III for a new lib, since this is the most precise one. This gives some kind of bold statement that we encourage using precise type. However this is not what is done in tommath right now. Mostly unprecise types are used everywhere and I don't want to rewrite everything. Therefore variant I/II are probably a better fit here.

What about keeping the old names and just add unsigned variants? That changes the behaviour instead of changing it all completely but it would be easier for the user to adapt their code, just do a global s/set_long_long/set_unsigned_long_long/g and so on.

Not possible, since the old names already take unsigned types. We could however add signed variants. But then I would rather clean it up nicely for once. Note that currently mp_set_int returns int, in the proposed new API, they return void, since these functions will never fail due to #253. Basically there is no way around new functions.

And then add Var. III on top of that. If some users don't want them they can remove them from the build. Main disadvantage: that's a lot of code to maintain.

I don't like it if we accumulate just more stuff. I think we should decide on one variant. I would rather keep the deprecated APIs around for some longer time (until 2.0 or something) instead of keeping around all APIs forever.

czurnieden commented 5 years ago

Note that we already include stdint.h

It was the use of the content of stdint.h, not the inclusion itself. But I haven't heard anything to this regard for some time, maybe they gave up/in?

Not possible, since the old names already take unsigned types

That's what I wanted to change: to let e.g.: mp_set_long actually take a signed long, just as the label says and rename, no, correct the label on the old ones to reflect their function properly. It is easy for the users (a simple C&R) but nevertheless changes the API massively, it ruins backward compatibility. Hence: not good.

I still like Var. III but Var. I or II is probably the better choice for now. Var. II would fit a bit better into the naming style of LTM, so Var. II?

minad commented 5 years ago

It was the use of the content of stdint.h, not the inclusion itself. But I haven't heard anything to this regard for some time, maybe they gave up/in?

I don't know. There are uintx_t uses in tommath.h. But I also must say - even if you want to restrict yourself to a very restricted c subset, or legacy variant like c89, stdint.h is a very trivial header. Even in environments without libc, I always provide stdint.h just because I need the precise types. There is really no reason in avoiding it.

That's what I wanted to change: to let e.g.: mp_set_long actually take a signed long, just as the label says and rename, no, correct the label on the old ones to reflect their function properly. It is easy for the users (a simple C&R) but nevertheless changes the API massively, it ruins backward compatibility. Hence: not good.

Yes, I excluded that option right from the start due to incompatible changes.

I still like Var. III

I also like Var III for the precision. If we decide for Var I/II, then Var II is probaby better because of the naming scheme. I will play around a bit and prepare little prototypes then we can still decide.

czurnieden commented 5 years ago

There is really no reason in avoiding [stdint.h].

You're preaching to the choire here ;-)

minad commented 5 years ago

UPDATE: This is mostly done except for the addition of functions of the type mp_add_i32 (in contrast to the existing mp_add_d).

mp_cmp_d, mp_add_d, mp_sub_d, mp_div_d, mp_mod_d, mp_expt_d -> mp_*_int or mp_*_long

I am however not convinced anymore that it makes sense to deprecate the _d functions since their particular advantage is that they can rely on operating on single limbs in some cases, e.g., mp_add_d. However others don't rely on the digit size, like mp_expt_d, which means that this one can be replaced. But let's do #294 first.

minad commented 5 years ago

The deprecation is done more or less I think. mp_cmp_d, mp_add_d, mp_sub_d, mp_div_d, mp_mod_d should not be deprecated and should stay since they are optimized single digit variants. Maybe it makes sense to add specialised mp_add_i32 functions, but let's discuss this in a separate issue if someone is interested. #304 is the last PR from me concerning this overhaul.