Closed SergeyKopienko closed 2 weeks ago
@danhoeflinger, @julianmi how do you think, are we ready to merge this PR ?
LGTM. However, I'd like to understand why this is faster than break statements. There might be potential performance benefits in other code parts where we use break statements in case this is a general issue.
Just now I found one place with break pattern in Kernel code and going to investigate this place separatelly.
It's inside struct multiple_match_pred
My guess about the break statements is that it is more difficult for thread divergence within a subgroup. It does seem like the compiler should be able to generate code for a break; which is equivalently performant as a loop exit flag though.
It would be interesting to boil down to a very simple example and examine the resulting generated code.
In this PR we implement break of for-loop inside
__early_exit_find_or
after first found element. The approach from the PR https://github.com/oneapi-src/oneDPL/pull/1624 is applicable for the first/last element search. This approach gives us a good performance boost. Technically thisbreak
call implemented thought additional local variable with check inside of continue-condition infor
-loop due performance results.Also in this PR, we analyze the result of
compare_exchange_strong()
call and break loop if the value of atomic has been changed correctly to avoid extra atomic valueload()
calls.