Open nirvedhmeshram opened 3 months ago
I suspect that some semaphore has failed, causing the hal.fence.await
to fail. We need a better error message in this case to see the original message that causes the wait to fail.
Maybe inside the wait function we could query all the semaphores and chain/annotate the ABORTED status with the ones that are failing. I don't think we should be too concerned about performance on the error path.
I tried running an example similar to the checked in test here with the exception that I would like to run across GPU and CPU. Here is the sample mlir test I used
Here is the iree-compile command I use to run on a rocm gpu and cpu
Here is the run command
And here is the error