Open sharksforarms opened 7 months ago
I will check this and get back to you soon. Thanks for reporting.
We have created an internal defect to track and will keep you updated on the progress.
Thanks - for my use case I was able to work around the issue by changing the threading model from a single epoll instance shared across threads, to having an epoll instance per thread. This allows the qat transactions to be started / resumed on the same thread.
Hi all,
I believe I have encountered a bug with the optimization around the reduction of wake signals for qat offload which causes unneeded latency. The issue seems to be around the usage of thread-local.
Commit which added this optimization: https://github.com/intel/QAT_Engine/commit/32f37108333f712f7a0debe1a5dac9e6d79cdfb7
This optimization seems to make the assumption that a offload request will be resumed on the same thread that started it, see this pseudo-example:
This example considers a single epoll instance, with multiple worker threads waiting for events.
The side-effect is that the TLV get's into a state where
localOpsInFlight
it will never== 1
and sosem_post
never get's called anymore and then the polling thread's sem times out.The relevant code section seems to be: https://github.com/intel/QAT_Engine/blob/ba2035c04e826018478fdb458ce51f545de076eb/qat_hw_rsa.c#L324-L327
I'm wondering if it would be possible to use
num_requests_in_flight == 1
(atomic) in theif
? I did not test this, but the use of a TLV here seems like it could be problematic to consumers