add `MP_SMALL_STACK_SIZE` option

sjaeckel commented 2 years ago

This adds an option to use a heap-buffer for the usually stack-based MP_WARRAY-sized temporary buffers.

Per default it will reserve a single buffer, which can be modified

at compile-time via the MP_WARRAY_NUM define
at run-time by calling mp_warray_init()

The internal structure can only be created once. If one wants to modify the maximum number of elements, the entire structure has to be free'd by calling mp_warray_free().

In case one wants to use this option with multiple threads, one shall use the mp_warray_init() function and pass appropriate locking functions.

Timings

op / algo	ECC	RSA
decrypt_key
encrypt_key
sign_hash
verify_hash

Related to: #441 #447 #511

sjaeckel commented 1 year ago

@jlaitine could you maybe have a look and tell me whether that's an acceptable solution for you?

czurnieden commented 1 year ago

What happened to this PR? Did you change your mind? I find it quite interesting to have a choice between stack and heap for this. On the other side: it breaks our general thread-safety and needs locks. But I'm still supportive.

sjaeckel commented 7 months ago

What happened to this PR? Did you change your mind?

No, I just didn't have the passion to follow up :)

On the other side: it breaks our general thread-safety and needs locks.

That's indeed a thing. I'm not sure whether it'd make sense to replace the locks by taking care of that ourselves and using lock-free/atomic operations under the hood!?

sjaeckel commented 7 months ago

I'm not sure whether it'd make sense to replace the locks by taking care of that ourselves and using lock-free/atomic operations under the hood!?

What do you think? Better like that!?

czurnieden commented 7 months ago

Sorry, took a bit longer to check.

Better like that!?

Yes, looks reasonable.

Caveat: I cannot test the Windows stuff now: my last PC here with Windows (what is the plural of Windows?) let the magic smoke out, need a visit to Ebay before going on.

PS: I am aware of my TODO list, it's not forgotten!

sjaeckel commented 7 months ago

what is the plural of Windows?

My SO just explained to me that the plural only works with the complete name! Microsoft Windows -> Microsofts Windows

my TODO list

No need to hurry, we have time.

test the Windows stuff

I'll see what I can do by extending CI.

czurnieden commented 7 months ago

My SO just explained to me that the plural only works with the complete name! Microsoft Windows -> Microsofts Windows

Ah, ok, my thanks to your SO!

I'll see what I can do by extending CI.

That would be really nice!

No need to hurry, we have time.

You have, but I'm an overweight old fart already! ;-)

My main problem is the error I got with the base2->base10 conversion with a single round of NR to compute the next divisor at a very large (close to MP_MAX_DIGIT_COUNT with MP_64BIT ) number. Sadly a randomly generated large number and it hasn't been logged what it was (Full HD. Yes, I'm an idiot). Haven't been able to repeat the error and am trying to compute the accumulated error of the NR rounds now manually. Fiddly.

We need to be able to do at most 22 rounds with n_0 = 1000 and the individual absolute error is about 2^(-34). But it is not floating point, it is integer math, so what is the accumulated error at the end? Does it even accumulate?

Was the aformentioned single error just a random bitflip? An error elsewhere?

*hng*!

sjaeckel commented 7 months ago

Ah, ok, my thanks to your SO!

You're very welcome ;-)

I'll see what I can do by extending CI.

wow, what a ride ... In between I was not sure whether I'd simply disable the feature for MSVC ... but now it seems to work^TM ...

sjaeckel commented 7 months ago

@jlaitine can you please check if this works for you?

sjaeckel commented 6 months ago

In between I was not sure whether I'd simply disable the feature for MSVC ... but now it seems to work^TM ...

aaaand in the end I disabled it for MSVC since IMO the most elegant solution is via TLS and MSVC seems to have a problem with TLS in DLLs. I'm not 100% sure what's going wrong, but since it was going fine with the static library [0] I suspect that we'd have to implement what's described in [1] and I'm simply not willing to do that, nor would I accept a PR that implements it.

If I'm mistaken there and someone with more MSVC experience wants to chime in and provide an easy fix, I'll happily accept that :)

[0] https://ci.appveyor.com/project/libtom/libtommath/builds/49507598 [1] https://learn.microsoft.com/en-us/windows/win32/dlls/using-thread-local-storage-in-a-dynamic-link-library

czurnieden commented 6 months ago

https://learn.microsoft.com/en-us/windows/win32/dlls/using-thread-local-storage-in-a-dynamic-link-library

So that's the superlativ of "unnecessary complicated", isn't it?

jlaitine commented 6 months ago

@jlaitine can you please check if this works for you?

Hi sorry for the very long response time, I have been busy with some other things. I see that this already got merged, so I tested the development branch directly and this change works perfectly!

I still have a small compilation error, in PX4, which is compiled with "-Wall". Not related to this PR. I created anoter PR out of that: https://github.com/libtom/libtommath/pull/578

Thanks again, Jukka

sjaeckel commented 6 months ago

Thanks a lot for the feedback! Glad to hear that it works for you :)

libtom / libtommath

add `MP_SMALL_STACK_SIZE` option #538

Timings