Open ghost opened 1 year ago
Well reading the documentation
Neither OBJ_create() nor OBJ_add_sigid() do any locking and are thus not thread safe. Moreover, none of the other functions should be called while concurrent calls to these two functions are possible.
seems to indicate this to be a well-known bug. And if it's documented it doesn't look like an easy one to fix for OpenSSL.
Thus, what about creating a mutex around the registration loop?
Well reading the documentation
Neither OBJ_create() nor OBJ_add_sigid() do any locking and are thus not thread safe. Moreover, none of the other functions should be called while concurrent calls to these two functions are possible.
seems to indicate this to be a well-known bug. And if it's documented it doesn't look like an easy one to fix for OpenSSL.
I feel stupid now… I haven't taken the time to read the doc of the what-I-called buggy function OBJ_create
…
Sorry for the noise here. Definitely not a bug in oqs-provider.
Thank you for your reply @baentsch
Definitely not a bug in oqs-provider.
No -- but one that hits it and IMO is worth while considering (read: providing a work-around). My suggestion is to keep it open and resolve it. If the Mutex idea isn't too good, what about asking the OpenSSL team for other suggestions?
if the Mutex idea isn't too good, what about asking the OpenSSL team for other suggestions?
I dont' have a opinion on this Mutex idea, but I do think it's worth asking the OpenSSL team!
Hmm, I was in the middle of writing an email to the OpenSSL mailing list when I saw something quite interesting: you mentioned the OpenSSl 3.0 man page of OBJ_create
, which indeed mentions the fact that OBJ_create
isn't thread safe:
BUGS
Neither OBJ_create() nor OBJ_add_sigid() do any locking and are thus not thread safe. Moreover, none of the other functions should be called while concurrent calls to these two functions are possible.
However, that BUG section in OpenSSL 3.1 man page of OBJ_create
was replaced by the following statement in the NOTE section:
These functions were not thread safe in OpenSSL 3.0 and before.
Since I'm using OpenSSL 3.1 (and/or 3.2 alpha), I shouldn't encounter this bug, right?
Since I'm using OpenSSL 3.1 (and/or 3.2 alpha), I shouldn't encounter this bug, right?
Good catch -- sorry I didn't check different doc versions :-( Conceptually, I'd agree, you shouldn't encounter the issue based on this doc improvement.
But I fail to see how this code https://github.com/openssl/openssl/blob/9c20f5db0feaddc4c9ea4c4b2b07e6d87d6701f1/crypto/provider_core.c#L2127-L2133 can be thread-safe:
return OBJ_txt2nid(oid) != NID_undef
|| OBJ_create(oid, sn, ln) != NID_undef;
Thread 1 calls OBJ_txt2nid
, gets suspended, thread 2 executes both statements (successfully) and thread 1 will be left with a (by now) incorrect result and fail (when calling OBJ_create
based on the wrong retval of its prior invocation of OBJ_txt2nid
).
In sum, I'd think this is a question to be asked to the OpenSSL community to help make the core better.
But whatever results from those discussions, this will not help with existing code (incl. OpenSSL3.0 installations), so we ought to also add your proposal (discarding the error message as per your patch -- possibly augmented by another call to OBJ_txt2nid
to ascertain that the object in fact has been created) to oqsprovider
. PR welcome :-)
Hi,
I'm currently testing oqs-provider (from commit
fab30c7
) with OpenSSL 3.1 (from commit9c20f5d
) in a multi-threaded application.In this multi-threaded application, I instantiate a
OSSL_LIB_CTX
per thread, and for each library context I load the default provider and the oqsprovider sequentially:(Note that I'm compiling oqs-provider as a static library, but it doesn't seem to be involved in this issue)
However, it sometimes fails with the following error:
I've tracked down the problem, and here is what I found: the oqs-provider init function calls the function pointer
c_obj_create
which happens to becrypto/provider_core.c:core_obj_create
:https://github.com/open-quantum-safe/oqs-provider/blob/fab30c7f4bb40898aab2a967cdd6b0f990091778/oqsprov/oqsprov.c#L748-L754
By looking at the
c_obj_create
from OpenSSL, it makes a call toOBJ_create
:https://github.com/openssl/openssl/blob/9c20f5db0feaddc4c9ea4c4b2b07e6d87d6701f1/crypto/provider_core.c#L2127-L2133
OBJ_create
verifies certain things, and eventually inserts the new NID/OID to a global list:https://github.com/openssl/openssl/blob/831602922f19a8f39d0c0fae425b81e9ab402c69/crypto/objects/obj_dat.c#L774-L794
It seems that two threads may compete between the check in
c_obj_create
and the actual call toOBJ_create
:leading to the obvious error displayed above, because
OBJ_create
is going to complain that the OID already exists.In this specific case,
ERR_LIB_OBJ … OBJ_R_OID_EXISTS
is thrown. I'm wondering if we could check for this error and ignore it if returned byc_obj_create
. I've attached a patch fix_race_condition_obj_oid_exists.patch that may temporary solve the issue in oqsprov.c, and the following is a small PoC to reproduce the error:PoC
```c #includeHere is the environment I used to build the PoC:
fab30c7
9c20f5d
-DCMAKE_BUILD_TYPE=Release
and-DOQS_PROVIDER_BUILD_STATIC=ON
--debug
(it also works with--release
).Note that I also encountered this bug with OpenSSL 3.2.
The question is: is it a bug from oqs-provider (is something wrongly done?) or from OpenSSL? From my point of view, OpenSSL maintains a global (i.e. accessible from within all threads) list of
OBJ
, and this list should be self-contained in aOSSL_LIB_CTX
object.What do you think about this?