openssl / project

Tracking of project related issues
2 stars 1 forks source link

Candidate read locks for conversion to RCU #428

Open mattcaswell opened 7 months ago

mattcaswell commented 7 months ago

I've performed some analysis of the read locks that we take during a TLS handshake. I used the handshake test in the tools/perf performance testing suite as a basis. The counts below represent the number of times during a complete combined (client and server) TLS handshake that a read lock is taken.

I have identified 13 different read lock locations that are called at least once per combined handshake. Some of those locations are very hot, with one one them being taken 161 times in a single combined handshake.

Many of these look like they could be good candidates for conversion to RCU since writes are infrequent.

The locations are:

Filename Function Number of times called
crypto/core_namemap.c ossl_namemap_name2num() 161
crypto/property/property.c ossl_property_read_lock() 133
crypto/ex_data.c get_and_lock() 86
crypto/engine/tb_asnmth.c ENGINE_pkey_asn1_find_str 17
crypto/rand/rand_lib.c RAND_get_rand_method 12
crypto/core_namemap.c ossl_namemap_doall_names() 5
crypto/bn/bn_mont.c BN_MONT_CTX_set_locked() 5
ssl/ssl_lib.c SSL_has_matching_session_id 4
ssl/ssl_sess.c ssl_generate_session_id 4 (2 * 2 locks)
crypto/x509/x509_lu.c x509_store_read_lock 3
crypto/rsa/rsa_ossl.c rsa_get_blinding 1
crypto/encode_decode/decoder_pkey.c OSSL_DECODER_CTX_new_for_pkey 1
crypto/objects/o_names.c OBJ_NAME_get 1
### Tasks
- [ ] https://github.com/openssl/openssl/pull/23671
- [ ] https://github.com/openssl/openssl/pull/23680
- [ ] Explore optimizing ossl_property_read_lock /converting to rcu_lock usage
- [ ] optimize use of ex_data get_and_lock
- [ ] https://github.com/openssl/project/issues/537
- [ ] https://github.com/openssl/project/issues/538
mattcaswell commented 7 months ago

@nhorman

nhorman commented 7 months ago

I've looked quite a bit at the namemap code, and yes, its iterated over alot. I think the best approach here is to start converting the hash table to use rcu locks (or preferably re-implement a private version of it), that can use rcu locks under the covers). Doing a re-implementation would also allow us to avoid the pointer conversion ubsan issues in the current implementation that require extra thunking callback layers

t8m commented 7 months ago

I suspect the granularity of the locks are wrong here. We should be able to read lock the provider plumbing in libctx once per call into a fetch function. A read-only operation in case the caches are warm should take very little time by itself so there should be no problem with contention even if there are some writers involved.