Open uditagarwal97 opened 7 months ago
Tag @againull for awareness. Could this be due to the known timing approximation issues?
I'm observing a similar problem with Basic/submit_time.cpp on linux/CL. I've found you need a bit of system load and a lot of runs to reproduce but it's consistently do-able within 20 or so iterations. An interesting data point would be whether this reproduces on cuda/hip.
On l0 and cl this could be explained by discrepancies between the timers used for the common DeviceAndHostTimer implementation they both share, which is used to cache the event's submit time here, and the separate mechanisms both adapters have for retrospectively querying out an event's start time (l0, cl).
I'm observing a similar problem with Basic/submit_time.cpp on linux/CL. I've found you need a bit of system load and a lot of runs to reproduce but it's consistently do-able within 20 or so iterations. An interesting data point would be whether this reproduces on cuda/hip.
On l0 and cl this could be explained by discrepancies between the timers used for the common DeviceAndHostTimer implementation they both share, which is used to cache the event's submit time here, and the separate mechanisms both adapters have for retrospectively querying out an event's start time (l0, cl).
Yes, I observed a similar flaky failure in Basic/submit_time.cpp
: https://github.com/intel/llvm/actions/runs/9406901188/job/25911860208?pr=14002
Describe the bug
Failed run: https://github.com/intel/llvm/actions/runs/8886566095/job/24401423571?pr=13588 Successful run: https://github.com/intel/llvm/actions/runs/8886566095/job/24406513670
I observed this behavior L0 GPU on Windows, but now sure if we could also reproduce this flaky behavior on other Linux or devices.
To reproduce
DPC++ commit: c2cc3a1327f668795881a7b157388ad516bdd472
Environment
OS: Windows Device: L0 Gen12
sycl-ls --verbose
Additional context
No response