open-quantum-safe / oqs-provider

OpenSSL 3 provider containing post-quantum algorithms
https://openquantumsafe.org
MIT License
229 stars 88 forks source link

Race condition with `c_obj_create`. #272

Open ghost opened 1 year ago

ghost commented 1 year ago

Hi,

I'm currently testing oqs-provider (from commit fab30c7) with OpenSSL 3.1 (from commit 9c20f5d) in a multi-threaded application.

In this multi-threaded application, I instantiate a OSSL_LIB_CTX per thread, and for each library context I load the default provider and the oqsprovider sequentially:

extern OSSL_provider_init_fn oqs_provider_init;

static int initialize_lib_ctx(OSSL_LIB_CTX *lib_ctx) {
  if (OSSL_PROVIDER_load(lib_ctx, "default") == NULL) {
    goto err;
  }
  if (OSS_PROVIDER_add_builtin(lib_ctx, "oqsprovider", oqs_provider_init) != 1) {
    goto err;
  }
  if (OSS_PROVIDER_load(lib_ctx, "oqsprovider") == NULL) {
    goto err;
  }
  …
  return OK;

err:
  …
}

(Note that I'm compiling oqs-provider as a static library, but it doesn't seem to be involved in this issue)

However, it sometimes fails with the following error:

error registering NID for dilithium2

I've tracked down the problem, and here is what I found: the oqs-provider init function calls the function pointer c_obj_create which happens to be crypto/provider_core.c:core_obj_create:

https://github.com/open-quantum-safe/oqs-provider/blob/fab30c7f4bb40898aab2a967cdd6b0f990091778/oqsprov/oqsprov.c#L748-L754

By looking at the c_obj_create from OpenSSL, it makes a call to OBJ_create:

https://github.com/openssl/openssl/blob/9c20f5db0feaddc4c9ea4c4b2b07e6d87d6701f1/crypto/provider_core.c#L2127-L2133

OBJ_create verifies certain things, and eventually inserts the new NID/OID to a global list:

https://github.com/openssl/openssl/blob/831602922f19a8f39d0c0fae425b81e9ab402c69/crypto/objects/obj_dat.c#L774-L794

It seems that two threads may compete between the check in c_obj_create and the actual call to OBJ_create:

static int core_obj_create(const OSSL_CORE_HANDLE *prov, const char *oid,
                           const char *sn, const char *ln)
{
    /* Check if it already exists and create it if not */
    return OBJ_txt2nid(oid) != NID_undef             // Race condition here?
           || OBJ_create(oid, sn, ln) != NID_undef;
}

leading to the obvious error displayed above, because OBJ_create is going to complain that the OID already exists.

In this specific case, ERR_LIB_OBJ … OBJ_R_OID_EXISTS is thrown. I'm wondering if we could check for this error and ignore it if returned by c_obj_create. I've attached a patch fix_race_condition_obj_oid_exists.patch that may temporary solve the issue in oqsprov.c, and the following is a small PoC to reproduce the error:

PoC ```c #include #include #include static const size_t N_THREADS = 32; extern OSSL_provider_init_fn oqs_provider_init; static void load_oqs_provider(OSSL_LIB_CTX* lib_ctx) { if (OSSL_PROVIDER_load(lib_ctx, "default") != NULL) { if (OSSL_PROVIDER_add_builtin(lib_ctx, "oqsprovider", oqs_provider_init) == 1) { if (OSSL_PROVIDER_load(lib_ctx, "oqsprovider") != NULL) { } else { putchar('-'); } } else { putchar('+'); } } else { putchar('@'); } } static void *thread_create_ossl_lib_ctx(void *) { OSSL_LIB_CTX *lib_ctx = OSSL_LIB_CTX_new(); load_oqs_provider(lib_ctx); OSSL_LIB_CTX_free(lib_ctx); return NULL; } int main() { pthread_t threads[N_THREADS]; start: for (size_t i = 0; i < N_THREADS; ++i) { pthread_create(threads + i, NULL, thread_create_ossl_lib_ctx, NULL); } for (size_t i = 0; i < N_THREADS; ++i) { pthread_join(threads[i], NULL); } goto start; } ```

Here is the environment I used to build the PoC:

Note that I also encountered this bug with OpenSSL 3.2.

The question is: is it a bug from oqs-provider (is something wrongly done?) or from OpenSSL? From my point of view, OpenSSL maintains a global (i.e. accessible from within all threads) list of OBJ, and this list should be self-contained in a OSSL_LIB_CTX object.

What do you think about this?

baentsch commented 1 year ago

Well reading the documentation

Neither OBJ_create() nor OBJ_add_sigid() do any locking and are thus not thread safe. Moreover, none of the other functions should be called while concurrent calls to these two functions are possible.

seems to indicate this to be a well-known bug. And if it's documented it doesn't look like an easy one to fix for OpenSSL.

Thus, what about creating a mutex around the registration loop?

ghost commented 1 year ago

Well reading the documentation

Neither OBJ_create() nor OBJ_add_sigid() do any locking and are thus not thread safe. Moreover, none of the other functions should be called while concurrent calls to these two functions are possible.

seems to indicate this to be a well-known bug. And if it's documented it doesn't look like an easy one to fix for OpenSSL.

I feel stupid now… I haven't taken the time to read the doc of the what-I-called buggy function OBJ_create

Sorry for the noise here. Definitely not a bug in oqs-provider.

Thank you for your reply @baentsch

baentsch commented 1 year ago

Definitely not a bug in oqs-provider.

No -- but one that hits it and IMO is worth while considering (read: providing a work-around). My suggestion is to keep it open and resolve it. If the Mutex idea isn't too good, what about asking the OpenSSL team for other suggestions?

ghost commented 1 year ago

if the Mutex idea isn't too good, what about asking the OpenSSL team for other suggestions?

I dont' have a opinion on this Mutex idea, but I do think it's worth asking the OpenSSL team!

ghost commented 1 year ago

Hmm, I was in the middle of writing an email to the OpenSSL mailing list when I saw something quite interesting: you mentioned the OpenSSl 3.0 man page of OBJ_create, which indeed mentions the fact that OBJ_create isn't thread safe:

BUGS

Neither OBJ_create() nor OBJ_add_sigid() do any locking and are thus not thread safe. Moreover, none of the other functions should be called while concurrent calls to these two functions are possible.

However, that BUG section in OpenSSL 3.1 man page of OBJ_create was replaced by the following statement in the NOTE section:

These functions were not thread safe in OpenSSL 3.0 and before.

Since I'm using OpenSSL 3.1 (and/or 3.2 alpha), I shouldn't encounter this bug, right?

baentsch commented 1 year ago

Since I'm using OpenSSL 3.1 (and/or 3.2 alpha), I shouldn't encounter this bug, right?

Good catch -- sorry I didn't check different doc versions :-( Conceptually, I'd agree, you shouldn't encounter the issue based on this doc improvement.

But I fail to see how this code https://github.com/openssl/openssl/blob/9c20f5db0feaddc4c9ea4c4b2b07e6d87d6701f1/crypto/provider_core.c#L2127-L2133 can be thread-safe:

return OBJ_txt2nid(oid) != NID_undef
           || OBJ_create(oid, sn, ln) != NID_undef;

Thread 1 calls OBJ_txt2nid, gets suspended, thread 2 executes both statements (successfully) and thread 1 will be left with a (by now) incorrect result and fail (when calling OBJ_create based on the wrong retval of its prior invocation of OBJ_txt2nid).

In sum, I'd think this is a question to be asked to the OpenSSL community to help make the core better.

But whatever results from those discussions, this will not help with existing code (incl. OpenSSL3.0 installations), so we ought to also add your proposal (discarding the error message as per your patch -- possibly augmented by another call to OBJ_txt2nid to ascertain that the object in fact has been created) to oqsprovider. PR welcome :-)