Closed minad closed 5 years ago
mp_rshd
, mp_lshd
multiplying/dividing by multiples of bases is not uncommon and these functions allow for it without knowing the exact size of the base.
mp_prime_is_divisble
+ ltm_prime_tab
that's what the sieve should have replaced, for example. Here it would get rid of the harsh restrictions for low-mp. And the table has just 256 small primes in it, that is not really much.
Yes, I'm not ashamed at all to beat the big drum for my stuff ;-)
mp_n_root_ex
(what is this fast parameter for?)
@sjaeckel added it once for heimdahl (see #66) don't know what happened to it since. i also remember that the theme "is your mp_n_root implementation constant time" came up later but can't find it quickly. Heimdahls problem was not with mp_n_root
but with mp_expt_d
and that's the reason mp_expt_d_ex
has a fast
flag.
BTW: while searching for it I found this thesis did they contact anybody here or did I just forgot the name? No, I didn't read it, gave it a short glimpse only.
mp_get_bit
, mp_set_bit
I needed mp_get_bit
in two different functions and thought it might be useful for others, too, so I made it a non-static function. It is no problem to complete the set: mp_get_bit
, mp_set_bit
, and mp_toggle_bit
if it is deemed useful or put it behind the scenes if not.
* concerning the fast parameters, do they signify non-constant-time. I would suggest to provide instead to be more explicit, mp_n_root_consttime or mp_expt_consttime
yeah, that would've been a better solution ...
* `mp_n_root_ex` (what is this fast parameter for?) @sjaeckel added it once for heimdahl (see #66) don't know what happened to it since. i also remember that the theme "is your mp_n_root implementation constant time" came up later but can't find it quickly. Heimdahls problem was not with `mp_n_root` but with `mp_expt_d` and that's the reason `mp_expt_d_ex` has a `fast` flag.
... but I was young, heimdal was complaining ... and in the end never updated ltm to the version that introduced this ... :-\
mp_expt_d
,mp_expt_d_ex
->mp_expt_int
(not yet available)
TBH I don't know why it wouldn't make sense to provide a mp_expt()
(and then also provide mp_expt_int()
, mp_expt_long()
, mp_expt_long_long()
which could then simply be mp_set_X()
+ mp_expt()
)
While looking at the mp_expt_d_ex()
implementation I realized that the 'supposed to be constant time' implementation also isn't ... or am I wrong there?
The rest of the to-be-deprecated/-internalized functions LGTM
* @czurnieden is your sieve as fast as this one and can really replace it as the first step of is_prime? Then I am all for replacing.
good question!
(I took one day off and all hell broke loose here! ;-) )
is your sieve as fast as this one and can really replace it as the first step of is_prime?
Is a small table, tiny enough to fit in the L0 cache of the bigger CPUs faster than a full sieve? Mmh… ;-)
But the sieve fits in the L1 cache, so the second time it is needed it runs faster. The only "benchmark" I made is the observation, that computing the primesum up to 2^32 needs as much time as PARI/GP but generating the primes is a very small part of computing the primesum, admitted.
The same with is_prime
: the time needed to generate the primes/loading the table is negligible in contrast to the work needed to do the actual trial- divisions.
Ok, ok, ok, I'll do a benchmark this WE.
While looking at the
mp_expt_d_ex()
implementation I realized that the 'supposed to be constant time' implementation also isn't ... or am I wrong there?
Well, strictly spoken: it is not. Nothing in LTM is. You need more than the high-level functions run in constant-time, you need the basic arithmetics, too, even down to CPU-level. See e.g.: Bear-SSL line 713ff "Constant-time primitives" (example chosen for legibility, nothing overly special with Bear-SSL, although it can be made quite small). Yes, an extreme example, I know.
Uhm… where was I? Oh, yes. It is not constant-time as it is because you only multiply if the bit is set, You should always multiply, no matter what, either with g
or with 1
. But that is for the branching only, you would need constant time methods for squaring and multiplying, too. All "easy" with modular exponentiation, not so much with the normal one.
But it is a quick hack to include the always-multiply-no-matter-what optimization and might even silence the conscience a bit.
Is normal exponentiation even used in cryptography? Evaluation of the nth-root?
is your sieve as fast as this one and can really replace it as the first step of is_prime?
A first quick test (no full benchmark!) WIth MP_DIGIT_BIT == 60 on a 64-bit machine I got for generating one 1024-bit prime: 37 sec with the sieve (first 1,000 primes for trial division and comparing) and 49 sec. with the table. I don't think that the error-bar is that large here, so for this(!) test the sieve is faster.
More interesting will be the results for low-mp, especially for MP_8BIT, which cannot hold most of the 1,000 primes in a mp_digit and must do bigint arithmetic instead. But it's gone late now, so that's for later in the evening.
Cool. Then we can definitely use your sieve instead 😊
Cool. Then we can definitely use your sieve instead
Um…it is a bit more complicated than that, of course, the error bar is really that big.
Current benchmark, nothing fancy, just to get a hint for the direction it goes.
All tests with 1,000 primes for trial-division, 1,000 for comparing, and 3 runs if not noted otherwise.
MP_64BIT, table
time for i in {1..1000};do ./randprime 64;done 26 - 28 seconds ( ~4 seconds sys.)
time for i in {1..100};do ./randprime 128;done 9 - 10 seconds ( .75 - 1 second sys.)
time for i in {1..100};do ./randprime 256;done 48 - 65 seconds ( 3 - 4 seconds sys.)
time for i in {1..100};do ./randprime 512;done 7.5 - 8 minutes ( ~20 seconds sys.)
time for i in {1..10};do ./randprime 1024;done 7.5, 16.75, 24 minutes (10, 22, 31 seconds sys.)
MP_64BIT, sieve
time for i in {1..1000};do ./randprime 64;done 43 - 47 seconds ( 3.5 - 4 seconds sys.)
time for i in {1..100};do ./randprime 128;done 16 - 19.5 seconds ( .75 - 1 second sys.)
time for i in {1..100};do ./randprime 256;done 90 - 94 seconds ( ~3.4 seconds sys.)
time for i in {1..100};do ./randprime 512;done 10 - 12 minutes ( 20 - 24 seconds sys.)
time for i in {1..10};do ./randprime 1024;done 11, 13, 18 minutes ( 14, 17, 22.5 seconds sys.)
MP_64BIT, sieve, 256 primes for trial-division and 256 for comparing
time for i in {1..1000};do ./randprime 64;done 28 - 29 seconds ( ~4 seconds sys.)
time for i in {1..100};do ./randprime 128;done 9 - 12 seconds (.75 - 1 second sys.)
time for i in {1..100};do ./randprime 256;done 58 - 68 seconds ( 3.2 - 3.7 seconds sys.)
time for i in {1..100};do ./randprime 512;done 7 - 9 minutes ( 20 - 24 seconds sys.)
time for i in {1..10};do ./randprime 1024;done 11, 11, 15 minutes (14, 14, 18.5 seconds sys.)
MP_8BIT, table
31 primes for trial-division and 31 for comparing
time for i in {1..1000};do ./randprime 64;done 41 - 43 seconds ( ~5 seconds sys.)
time for i in {1..100};do ./randprime 128;done 45 - 48 seconds ( ~1 seconds sys.)
time for i in {1..100};do ./randprime 256;done 11 - 13 minutes ( 3.5 - 4 seconds sys.)
MP_8BIT, sieve, bigint
1,000 primes for trial-division and 31 for comparing
time for i in {1..1000};do ./randprime 64;done 3.5 - 4 minutes ( ~5 seconds sys.)
time for i in {1..100};do ./randprime 128;done 2.5 - 3.5 minutes ( 1 - 1.2 seconds sys.)
MP_8BIT, sieve, bigint
256 primes for trial-division and 31 for comparing
time for i in {1..1000};do ./randprime 64;done 91 - 94 seconds ( ~4.5 seconds sys.)
time for i in {1..100};do ./randprime 128;done 72 - 89 seconds ( 1 - 1.2 seconds sys.)
time for i in {1..100};do ./randprime 256;done 17 - 19 minutes ( 3.9 - 4.4 seconds sys.)
256 primes is the size of the original table, 31 that for MP_8BIT. It is ok for MP_64BIT, the difference is negligible but the use of bigint ruins it for MP_8BIT. I can get it even with the table with 64 primes.
But is that worth the change of the API (the sieve needs to be initiated in mp_prime_rand and transported from there) without any of the other functions that sieve is supposed to support later?
I don't know *sigh*
@czurnieden Thx for the numbers, I don't think MP_8BIT is the use case we should optimize for.
One general thing about the API overhaul: At some places it makes sense to modify the type signatures of functions, for example the return type of mp_set_int to void in #253. What could be done about such cases? Introduce another function mp_set_int_with_some_other_name? The goal should be that we don't end up with ugly names after the overhaul, e.g. mp_set_int_v2... Hmm...
For init and set I would propose the following to get/set signed values. Deprecate all mp_set, mpset, mp_init_set_int, mpget. Furthermore add this:
Variant I:
int mp_init_setl(mp_int*, long);
void mp_setl(mp_int*, long);
void mp_setll(mp_int*, long long);
void mp_setull(mp_int*, unsigned long long);
long mp_getl(const mp_int*);
long long mp_getll(const mp_int*);
unsigned long long mp_getull(const mp_int*);
(int mp_init_seti(mp_int*, int);)
(int mp_geti(const mp_int*);)
(void mp_seti(mp_int*, int);)
Variant II (naming bikeshed):
int mp_init_set_l(mp_int*, long);
void mp_set_l(mp_int*, long);
void mp_set_ll(mp_int*, long long);
void mp_set_ull(mp_int*, unsigned long long);
long mp_get_l(const mp_int*);
long long mp_get_ll(const mp_int*);
unsigned long long mp_get_ull(const mp_int*);
Variant III (sized api):
int mp_init_set32(mp_int*, int32_t);
void mp_set32(mp_int*, int32_t);
void mp_set64(mp_int*, int64_t);
void mp_setu64(mp_int*, uint64_t);
int32_t mp_get32(const mp_int*);
int64_t mp_get64(const mp_int*);
uint64_t mp_getu64(const mp_int*);
This will be the first easy step. Then the add_d, sub_d, ... apis could follow in a similar fashion. Note that for those we would probably only add the long variant.
@sjaeckel @czurnieden Please tell me which variant you prefer.
I don't think MP_8BIT is the use case we should optimize for.
I do have some pride left! ;-)
At some places it makes sense to modify the type signatures of functions, for example the return type of mp_set_int to void in #253. What could be done about such cases?
Going from returning a type to returning nothing is not much of a problem, only the other way around.
Please tell me which variant you prefer.
Mmh…do you have a variant IV? ;-)
Var. III is of course the most precise one but one of the users of LTM does not like it if they have to include stdint.h.
What about keeping the old names and just add unsigned
variants?
That changes the behaviour instead of changing it all completely but it would be easier for the user to adapt their code, just do a global s/set_long_long/set_unsigned_long_long/g
and so on.
And then add Var. III on top of that. If some users don't want them they can remove them from the build.
Main disadvantage: that's a lot of code to maintain.
Var. III is of course the most precise one but one of the users of LTM does not like it if they have to include stdint.h.
Note that we already include stdint.h. I would prefer variant III for a new lib, since this is the most precise one. This gives some kind of bold statement that we encourage using precise type. However this is not what is done in tommath right now. Mostly unprecise types are used everywhere and I don't want to rewrite everything. Therefore variant I/II are probably a better fit here.
What about keeping the old names and just add
unsigned
variants? That changes the behaviour instead of changing it all completely but it would be easier for the user to adapt their code, just do a globals/set_long_long/set_unsigned_long_long/g
and so on.
Not possible, since the old names already take unsigned types. We could however add signed variants. But then I would rather clean it up nicely for once. Note that currently mp_set_int
returns int, in the proposed new API, they return void, since these functions will never fail due to #253.
Basically there is no way around new functions.
And then add Var. III on top of that. If some users don't want them they can remove them from the build. Main disadvantage: that's a lot of code to maintain.
I don't like it if we accumulate just more stuff. I think we should decide on one variant. I would rather keep the deprecated APIs around for some longer time (until 2.0 or something) instead of keeping around all APIs forever.
Note that we already include stdint.h
It was the use of the content of stdint.h, not the inclusion itself. But I haven't heard anything to this regard for some time, maybe they gave up/in?
Not possible, since the old names already take unsigned types
That's what I wanted to change: to let e.g.: mp_set_long
actually take a signed long
, just as the label says and rename, no, correct the label on the old ones to reflect their function properly. It is easy for the users (a simple C&R) but nevertheless changes the API massively, it ruins backward compatibility.
Hence: not good.
I still like Var. III but Var. I or II is probably the better choice for now. Var. II would fit a bit better into the naming style of LTM, so Var. II?
It was the use of the content of stdint.h, not the inclusion itself. But I haven't heard anything to this regard for some time, maybe they gave up/in?
I don't know. There are uintx_t uses in tommath.h. But I also must say - even if you want to restrict yourself to a very restricted c subset, or legacy variant like c89, stdint.h is a very trivial header. Even in environments without libc, I always provide stdint.h just because I need the precise types. There is really no reason in avoiding it.
That's what I wanted to change: to let e.g.: mp_set_long actually take a signed long, just as the label says and rename, no, correct the label on the old ones to reflect their function properly. It is easy for the users (a simple C&R) but nevertheless changes the API massively, it ruins backward compatibility. Hence: not good.
Yes, I excluded that option right from the start due to incompatible changes.
I still like Var. III
I also like Var III for the precision. If we decide for Var I/II, then Var II is probaby better because of the naming scheme. I will play around a bit and prepare little prototypes then we can still decide.
There is really no reason in avoiding [stdint.h].
You're preaching to the choire here ;-)
UPDATE: This is mostly done except for the addition of functions of the type mp_add_i32
(in contrast to the existing mp_add_d
).
mp_cmp_d, mp_add_d, mp_sub_d, mp_div_d, mp_mod_d, mp_expt_d -> mp_*_int or mp_*_long
I am however not convinced anymore that it makes sense to deprecate the _d
functions since their particular advantage is that they can rely on operating on single limbs in some cases, e.g., mp_add_d. However others don't rely on the digit size, like mp_expt_d, which means that this one can be replaced.
But let's do #294 first.
The deprecation is done more or less I think. mp_cmp_d, mp_add_d, mp_sub_d, mp_div_d, mp_mod_d
should not be deprecated and should stay since they are optimized single digit variants. Maybe it makes sense to add specialised mp_add_i32 functions, but let's discuss this in a separate issue if someone is interested. #304 is the last PR from me concerning this overhaul.
Starting from the discussion in #240 with @czurnieden, I thought a bit about the exposed API.
It might make sense to do a bit of an overhaul and make some functions private. In particular
mp_digit
is exposed to often in the public API and it would be more useful to provide functions taking signed ints or longs. I considermp_digit
to be more of an implementation detail.mp_cmp_d
,mp_add_d
,mp_sub_d
,mp_div_d
,mp_mod_d
->mp_*_int
ormp_*_long
(not yet available)mp_expt_d
->mp_expt_int
~ (DEPRECATED in open #304)mp_set
->mp_set_int
,mp_set_long
,mp_set_long_long
(already there)~ (DONE)mp_init_set
->mp_init_set_int
(already there)~ (DONE)Deprecation of the
mp_*_d
functions could be done gracefully since they can be replaced by the more generalmp_*_int
functions.~What could also be discussed is going to defined bit sizes since we include
stdint.h
and provide functions likemp_add_i32
instead ofmp_add_int
/mp_add_long
. This would make the API platform independent. However this might be a bit of a too large deviation from how things are now.~ (DONE)~Furthermore
mp_set
is not needed anymore as soon as ~#221~ #253 has been implemented. Right nowmp_set
still makes sense since it is guaranteed to not allocate.~ (KEEP)Then there are a few functions which seem to be internal and it might make sense to not expose them.
mp_exch
~ (KEEP since it pretty much harmless)mp_rshd
,mp_lshd
(shifting by digits)~ (KEEP?)mp_prime_is_divisble
+ltm_prime_tab
(to special and dependent on MP_PRIME_SIZE, not generally useful?)~ (DEPRECATED in #288)mp_n_root_ex
(what is this fast parameter for?)~ (DEPRECATED in #294)mp_n_expt_d_ex
(what is this fast parameter for?)~ (DEPRECATED in #294)~Then there is the rand_digit function which is a remnant of before #236 and could be deprecated.~ (DEPRECATED)
~Concerning the bit manipulation functions, there is
mp_get_bit
but nomp_set_bit
. Either addmp_set_bit
or internalizemp_get_bit
ass_mp_get_bit
?~mp_get_bit
,mp_set_bit
~ (DEPRECATED)~The one complement bitwise functions could also be internalized, since the two complement functions for positive ints are the equivalent.~
mp_xor
,mp_or
,mp_and
->mp_tc_*
(already there)~ (DEPRECATED)mp_tc_div_2d
->mp_bit_rsh
~ (DEPRECATED)