openssl / project

Tracking of project related issues
2 stars 1 forks source link

Performance Improvements #185

Open arapov opened 1 year ago

arapov commented 1 year ago

We need to have a plan for further improvements in v3 performance.

re: https://github.com/openssl/openssl/issues/20286 https://github.com/openssl/openssl/issues/17064 https://github.com/openssl/openssl/issues/17627

### Tasks
- [ ] https://github.com/openssl/project/issues/453
- [ ] https://github.com/openssl/project/issues/353
- [ ] https://github.com/openssl/project/issues/569
- [ ] https://github.com/openssl/project/issues/566
- [ ] https://github.com/openssl/project/issues/358
- [ ] https://github.com/openssl/project/issues/354
- [ ] https://github.com/openssl/project/issues/441
- [ ] https://github.com/openssl/project/issues/214
- [ ] https://github.com/openssl/project/issues/100
- [ ] https://github.com/openssl/project/issues/357
- [ ] https://github.com/openssl/project/issues/356
- [ ] https://github.com/openssl/project/issues/102
- [ ] https://github.com/openssl/project/issues/54
- [ ] https://github.com/openssl/project/issues/359
- [ ] openssl/openssl#22729
- [ ] openssl/openssl#21833
- [ ] https://github.com/openssl/project/issues/571
- [ ] https://github.com/openssl/project/issues/572
- [ ] https://github.com/openssl/project/issues/573
mattcaswell commented 1 year ago

Also see the proposal from Nico in openssl/openssl#20715

nicowilliams commented 1 year ago

Also see the proposal from Nico in openssl/openssl#20715

Note that that really amounts to "shared-nothing-mutable threading". That's really the key: share nothing mutable, and for the immutable things use RCU or RCU-like interfaces.

Because SSL contexts are mutable, it creating other objects with them should probably copy the context, and maybe there should be an API to freeze a context so that downstream objects can have a reference counted reference to the context.

The point is that if you share no mutable state between threads -and leave to the application any locking around threaded access to the same objects- then there should be no need for mutually exclusive or read-write locks at all.

There are some necessarily-shared (because of the OpenSSL API design) globals, like the provider list, and this should either evolve with "versions" where the "current version" can be fetched with RCU, or you can do what Richard Salz says: allow the providers list to be frozen -- and once frozen, it must stay frozen in order to allow fast, unsynchronized access.

The performance problems in OpenSSL 3.x are very similar to those resulting from having a GIL (global interpreter lock) in some languages, like Python). Python threads should have been shared-nothing or shared-nothing-mutable. Shared-nothing-mutable in Python threading would have required language support for immutable values.

Shared-nothing threading is much easier to implement than shared-nothing-mutable threading, but for OpenSSL's API shared-nothing threading is not really possible, so the next best thing is shared-nothing-mutable. Shared-nothing-mutable can be achieved even when the APIs don't quite permit it by having writers to shared mutable objects maintain a set of immutable snapshot copies of the object inside the mutable object.

Anything other than the various shared-nothing threading options can only be an attempt to optimize the current mess. That may be serviceable to some degree, but a threading architecture known to scale and compose well is better overall if you can manage it.

There's ~50 files where read-write locks are used, and ~170 call sites to CRYPTO_THREAD_read_lock() and CRYPTO_THREAD_write_lock(), so a conversion to some shared-nothing-mutable flavor should be tractable.

The more nothing you share between threads, the better the scalability threads of the OpenSSL API with increasing thread counts because there should be no contention on the nothing that isn't shared. With RCU-ish schemes for shared-nothing-mutable you can still scale very well with thread counts -- see, for example, the Linux kernel! Conversely, schemes that share a lot of mutable state w/o versioning will have lots of contention, which accurately describes the current state of OpenSSL 3.x.

In any case, read-write locks are never really a good tool to use if one wants to scale performance with thread count.

mattcaswell commented 1 year ago

Came in privately: a description of a performance issue calling Python's "load_default_certs" function shows possible performance degradation from multiple threads. Possibly a locking issue?

nhorman commented 5 months ago

Note: In reviewing these performance issues, I feel like we need to take a pause on some of them, until such time as we've resolved our performance measurement suite. I've prioritized those tasks for refinement and execution so that we can have a more clear view of what our performance was on older implementations vs newer so that we can effectively prioritize the remaining tasks here

quarckster commented 1 month ago

Note: In reviewing these performance issues, I feel like we need to take a pause on some of them, until such time as we've resolved our performance measurement suite. I've prioritized those tasks for refinement and execution so that we can have a more clear view of what our performance was on older implementations vs newer so that we can effectively prioritize the remaining tasks here

I believe the performance measurement suite is ready at least for the internal evaluation. The automation runs benchmarks on nightly basis and gather results into the DB. The results are represented as time series charts in a Grafana dashboard.