Closed sinsemellan closed 3 years ago
Sorry, across_seg_boundary
doesn't work like that; you can't use it to set a flag at exactly the segment boundary. In an actually parallel scan (unlike the debug version), the scan expression is evaluated for array elements that are not neighbors. See this slide for an idea:
https://andreask.cs.illinois.edu/cs598apk-f18/notes.pdf#page=206
When across_seg_boundary
is true
, the scan expression must act as if the a
input to the expression were not supplied. Yours does not meet this criterion. I'd be grateful for a doc contribution explaining this.
When using a scan expression containing a constant value e.g.
scan_expr="across_seg_boundary ? 10 : 5"
the first value (10) gets selected after the first crossed segment boundary for all remaining entries. A kernel generated by GenericDebugScanKernel produces correct results. Tested with version 2020.3.1 and an oclgrind, intel cpu and NVIDIA device. Failing example test case, first assertion is ok (using GenericDebugScanKernel) second one fails (using GenericScanKernel):