randombit / botan

Cryptography Toolkit
https://botan.randombit.net
BSD 2-Clause "Simplified" License
2.52k stars 557 forks source link

Possibility to use a custom thread pool #4194

Open polarnis opened 1 month ago

polarnis commented 1 month ago

It seems that currently Botan uses a global instance of an internal thread pool implementation. It works pretty well, however some users might already depend on a different thread pool for other CPU heavy workloads. Has it been considered to expose an API to inject an external thread pool replacing the internal one?

Generally idling background threads are pretty cheap to keep alive but some platforms and users might want to limit that number either way (by having just one thread pool in an application). Some might want to just provide a highly optimized thread pool implementation that they use for other purposes.

I'm not sure how intrusive such a change would be, but - in advance - thanks for considering it!

randombit commented 1 month ago

Totally makes sense. If you're in a situation where the application has multiple threads and also calling various libraries that spawn threads, it's very easy to oversubscribe the CPU.

I don't think this would be too much work to support. There would be some refactoring behind the scenes but I imagine the interface for the application would look something like

class Thread_Pool_Impl {
public:
   /// This function is called with a series of std::function<void ()>.
   /// It promises to run these, eventually, in arbitrary order
   virtual void queue_work(std::function<void ()> work) = 0;
};

void set_thread_runner(std::shared_ptr<Thread_Pool_Impl> runner);

If the application calls this before calling any other Botan function (or at least before any that uses a thread), then all work is dispatched through that class.

Please comment on the API good or bad, this was literally the first thing I thought of that would probably work, not sure if it's ideal.

reneme commented 1 month ago

I also agree that it is worth looking into that. May I suggest that we take one or two steps back, though:

As it is now, the public API does not provide any cues that Botan might spin up long-lived threads when creating an RSA key pair, for instance. _The Thread_Pool is essentially hidden global state that a user has limited control over._

Instead, I suggest to extend the public API of the affected algorithms and allow the user to provide a callback to schedule async operations for any given algorithm. This way, they could even implement different things for different algorithms. E.g. RSA keygen could look like that:

asio::io_service ioc;
auto async_work_scheduler = [&](std::function<void()> work) {
   ioc.post(work);
};

auto rng = Botan::AutoSeeded_RNG();
auto keypair = Botan::create_private_key("RSA", rng, "2048", "base" /* provider */, async_work_scheduler);

... the default callback of create_private_key could just keep using the existing Thread_Pool for backward compatibility and API stability.

This way, we provide the required flexibility to the user, but also clearly communicate in the API that there is an async notion in RSA (or elsewhere) _and we open an avenue to replace the hidden global-state Thread_Pool in the long run._

For the record: Currently, the thread pool is used in Argon2, RSA (private operations) and XMSS (tree hashing). It might be useful in other places, LMS and SPHINCS+ are obvious candidates but there's likely more potential here.

randombit commented 1 month ago

I really don't like that approach. It's very cross cutting. Consider TLS. Sometimes, it performs RSA signatures. Are you really suggesting we overload everything in TLS with a work scheduler parameter so that, in case we perform an RSA signature, we can provide the work scheduler? That policy creates an enormous amount of work, and makes adding new uses of threading almost insurmountably expensive. Given that multithreading, where used appropriately, often gets us a 2-4x speedup, that seems an unfortunate outcome.

It also is not sufficient. Consider Thunderbird. It (hypothetically IDK) might have a thread pool, or otherwise desire to tweak how/where our work runs. Thunderbird uses RNP for PGP. RNP uses Botan (and does through using the C API!). How does the work scheduler argument make it all the way down to us?

reneme commented 1 month ago

I really don't like that approach. It's very cross cutting.

That's a downside indeed. And, in fact, I got the RSA example wrong in that I thought the parallelism was used in the key generation (where it could potentially be useful as well).

Let me re-sketch this for signature generation: The "work scheduler" would be needed in the PK_Signer object as well, like:

asio::io_service ioc;
auto async_work_scheduler = [&](std::function<void()> work) {
   ioc.post(work);
};

auto rng = Botan::AutoSeeded_RNG();

// assuming that the key generation is also taking advantage of multithreading in the future
auto keypair = Botan::create_private_key("RSA", rng, "2048", "base" /* provider */, async_work_scheduler);

PK_Signer signer(keypair, rng, "some padding", "base" /* provider */, async_work_scheduler);
auto sig = signer.sign_message(msg, rng);

The work scheduler is a cross-cutting concern for sure. But so is the RNG and also the "provider", both of which have to be passed into functions that may or may not use it. The PK_Signer is a good example for that, in fact.

Now, admittedly, the RNG is an obvious requirement for many cryptography things, but it's conceptually just as cross-cutting as an explicit work scheduler would be. With the advantage, that the work scheduler could default to the existing thread pool virtually everywhere. Or am I missing something?

Regarding TLS: I think there are options here. Just like with the RNG, the Policy, the Callbacks and more: one could provide a TLS-global work scheduler in the TLS::Client/Server constructor which then is passed around to whoever needs it internally. Users could also provide it explicitly by overriding Callbacks::tls_sign_message(), if they need to do so selectively.

All that to say: I do agree that it requires adding an (optional) parameter in several places, but I still think that it is manageable. It does become cumbersome when "new" places start requiring a work scheduler, as this always requires an API-extension and users that rely on their own scheduler must adapt their code if they want to take advantage. Many potential users are covered by abstract APIs though, so the change is often internal. For instance, the proposed PK_Signer above: if SPHINCS+ (SLH-DSA) would start being multi-threaded, it could just make use of the work scheduler that is already there.