intel / QAT_Engine

Intel QuickAssist Technology( QAT) OpenSSL Engine (an OpenSSL Plug-In Engine) which provides cryptographic acceleration for both hardware and optimized software using Intel QuickAssist Technology enabled Intel platforms. https://developer.intel.com/quickassist
BSD 3-Clause "New" or "Revised" License
398 stars 127 forks source link

possible bug in optimization of qat wake signals #306

Open sharksforarms opened 4 months ago

sharksforarms commented 4 months ago

Hi all,

I believe I have encountered a bug with the optimization around the reduction of wake signals for qat offload which causes unneeded latency. The issue seems to be around the usage of thread-local.

Commit which added this optimization: https://github.com/intel/QAT_Engine/commit/32f37108333f712f7a0debe1a5dac9e6d79cdfb7

This optimization seems to make the assumption that a offload request will be resumed on the same thread that started it, see this pseudo-example:

This example considers a single epoll instance, with multiple worker threads waiting for events.

int global_epoll_inst = epoll_create()

Thread 1:
- app: bssl_qat_async_start_job -> ASYNC_WAIT_CTX_get_all_fds -> epollctl(global_epoll_inst. EPOLL_CTL_ADD, fd)
- engine: tlv->localOpsInFlight = 0
- engine: qat_rsa_decrypt -> QAT_INC_IN_FLIGHT_REQS -> sem_post
- engine: tlv->localOpsInFlight = 1

Intel QAT Polling Thread:
- sem_timedwait -> WAKE
- poll

Thread 2:
- app: epoll_wait(global_epoll_inst) --> resumed
- app: bssl_qat_async_ctx_copy_result(ctx)
- tlv->localOpsInFlight = 0
- engine: QAT_DEC_IN_FLIGHT_REQS
- tlv->localOpsInFlight = -1

Thread 1:
- app: bssl_qat_async_start_job -> ASYNC_WAIT_CTX_get_all_fds -> epollctl(global_epoll_inst. EPOLL_CTL_ADD, fd)
- engine: tlv->localOpsInFlight = 1
- engine: qat_rsa_decrypt -> QAT_INC_IN_FLIGHT_REQS (*bug* no call to sem_post!)
- engine: tlv->localOpsInFlight = 2

Intel QAT Polling Thread:
- sem_timedwait -> TIMEOUT
- poll

...

The side-effect is that the TLV get's into a state where localOpsInFlight it will never == 1 and so sem_post never get's called anymore and then the polling thread's sem times out.

The relevant code section seems to be: https://github.com/intel/QAT_Engine/blob/ba2035c04e826018478fdb458ce51f545de076eb/qat_hw_rsa.c#L324-L327

I'm wondering if it would be possible to use num_requests_in_flight == 1 (atomic) in the if ? I did not test this, but the use of a TLV here seems like it could be problematic to consumers

venkatesh6911 commented 4 months ago

I will check this and get back to you soon. Thanks for reporting.

venkatesh6911 commented 4 months ago

We have created an internal defect to track and will keep you updated on the progress.

sharksforarms commented 4 months ago

Thanks - for my use case I was able to work around the issue by changing the threading model from a single epoll instance shared across threads, to having an epoll instance per thread. This allows the qat transactions to be started / resumed on the same thread.