intel / QAT_Engine

Intel QuickAssist Technology( QAT) OpenSSL Engine (an OpenSSL Plug-In Engine) which provides cryptographic acceleration for both hardware and optimized software using Intel QuickAssist Technology enabled Intel platforms. https://developer.intel.com/quickassist
BSD 3-Clause "New" or "Revised" License
410 stars 128 forks source link

bssl: Could not obtain wait context for job #287

Open sharksforarms opened 1 year ago

sharksforarms commented 1 year ago

Hello, I've observed a sigsegv under the described conditions: an error path which does not clear the current job/tlv.

I added some instrumentation to my application with the intel qat engine:

// rsa decrypt error
h2o[151965]: DEBUG [bssl_qat_async_start_job:479] saving job ctx:0x6030002489b8 job:0x6060001e3208 waitctx:0x6060001e3268
h2o[151965]: DEBUG [bssl_offload_create_request:1362] nb job:0x6030002489b8
h2o[151965]: [/build/deps/neverbleed/neverbleed.c] bssl_offload_decrypt:RSA decrypt failure:-:error:04000070:RSA routines:OPENSSL_internal:DATA_LEN_NOT_EQUAL_TO_MOD_LEN ossl err  qat_hw_rsa.c 1664
h2o[151965]: [openssl-privsep] RSA decrypt failure
h2o[151965]: DEBUG [bssl_offload_free_request:698] nb finish job:0x6030002489b8
h2o[151965]: DEBUG [bssl_qat_async_finish_job:500] clearing ctx:0x6030002489b8 job:0x6060001e3208 waitctx:0x6060001e3268

// rsa sign
h2o[151965]: DEBUG [bssl_qat_async_start_job:479] saving job ctx:0x6030002489e8 job:0x6060001e32c8 waitctx:0x6060001e3328
h2o[151965]: DEBUG [bssl_offload_create_request:1362] nb job:0x6030002489e8
h2o[151965]: DEBUG [bssl_offload_digestsign:1418] rsa:0x611000059c88
h2o[151965]: DEBUG [qat_rsa_priv_enc:1021] here
h2o[151965]: DEBUG [qat_rsa_priv_enc:1106] here error
h2o[151965]: DEBUG [qat_rsa_priv_sign:1635] len:0, ASYNC_get_current_job(): 0x6060001e3208, (nil)

h2o[151965]: received SIGSEGV: si_code=1 (SEGV_MAPERR: Address not mapped to object) si_addr=(nil)
h2o[151965]: received SIGSEGV (cont): rip=(nil) rsp=0x7fe1bab6b688 rbp=(nil)

We can see that ASYNC_get_current_job(): 0x6060001e3208 which is the previous job, the failed rsa decryption.

Here's how the API is being used at a high level which causes the issue

// rsa decrypt
async_ctx = bssl_qat_async_start_job()
    bssl_qat_async_save_current_job // --> set's the current job
meth = bssl_engine_get_rsa_method();
meth->decrypt(...)
    // in_len != rsa_size --> OPENSSL_PUT_ERROR, return error
bssl_qat_async_finish_job(async_ctx)
// note: the current job is still set here...

// rsa sign
async_ctx = bssl_qat_async_start_job()
meth = bssl_engine_get_rsa_method();
meth->sign_raw(...) 
    qat_rsa_priv_sign
        qat_rsa_priv_enc
          qat_rsa_decrypt
              qat_init_op_done
                  job = ASYNC_get_current_job() // this is the previous job, which has been freed/zero'd
              qat_setup_async_event_notification(job)
                  ASYNC_get_wait_ctx(job) is NULL --> WARN("Could not obtain wait context for job\n");
        ASYNC_current_job_last_check_and_get // this is the previous job, which has been freed/zero'd
            job->tlv_destructor // --> SIGSEGV

Engine Logs:

[DEBUG][4211829.817191] PID [304707] Thread [7ff7a2c5e700][qat_hw_rsa.c:198:qat_rsa_decrypt()] - Started
[WARN][4211829.817195] PID [304707] Thread [7ff7a2c5e700][qat_events.c:129:qat_setup_async_event_notification()] Could not obtain wait context for job
[WARN][4211829.817200] PID [304707] Thread [7ff7a2c5e700][qat_hw_rsa.c:210:qat_rsa_decrypt()] Failed to setup async event notifications
[WARN][4211829.817206] PID [304707] Thread [7ff7a2c5e700][qat_hw_rsa.c:1017:qat_rsa_priv_enc()] Failure in qat_rsa_decrypt  fallback = 0

I can see two potential solutions:

  1. (diff) one would be to call _ret = ASYNC_current_job_last_check_and_get(); before those earlier returns
  2. call tlv_destructor (if set) in bssl_qat_async_finish_job

Any thoughts?

Yogaraj-Alamenda commented 11 months ago

@sharksforarms Looks out to be an issue in the scenario. Can you please raise PR for the changes mentioned in the diff. We will evaluate and let you know the feedback.

sharksforarms commented 11 months ago

@sharksforarms Looks out to be an issue in the scenario. Can you please raise PR for the changes mentioned in the diff. We will evaluate and let you know the feedback.

PR is here: https://github.com/intel/QAT_Engine/pull/291/files

Thanks!

Yogaraj-Alamenda commented 11 months ago

@sharksforarms Looks out to be an issue in the scenario. Can you please raise PR for the changes mentioned in the diff. We will evaluate and let you know the feedback.

PR is here: https://github.com/intel/QAT_Engine/pull/291/files

Thanks!

Thanks We will check and let you know.

krithikx commented 10 months ago

@sharksforarms , We followed the steps shared by you and we are not able to see the issue in our reproduction. Do you have any test application to reproduce the scenario? Also what is the boringSSL version you have used?