oneapi-src / level-zero

oneAPI Level Zero Specification Headers and Loader
https://spec.oneapi.com/versions/latest/elements/l0/source/index.html
MIT License
208 stars 90 forks source link

[Feature Request] Add equivalent of cudaDeviceSynchronize() #118

Open jbrodman opened 1 year ago

jbrodman commented 1 year ago

Many applications written in CUDA rely on the whole device synchronization behavior of cudaDeviceSynchronize(). Trying to migrate applications that use this to SYCL, for example, is not really possible.

Question: Why does this need to be in L0? Why can't you solve it at a higher level? Answer: The higher level layer may not have 100% visibility over how level zero is being used. If the L0 plugin in DPC++ tried to add something like this - programs may have incorrect behavior if the user application ALSO uses L0 directly - the plugin has no knowledge of any queues created in the user application or libaries.

jandres742 commented 1 year ago

I dont see a problem adding this. cudaDeviceSynchronize "Blocks until the device has completed all preceding requested tasks." https://docs.nvidia.com/cuda/cuda-runtime-api/group__CUDART__DEVICE.html#group__CUDART__DEVICE_1g10e20b05a95f638a4071a655503df25d so it means we would just wait all task, submitted to all engines, to complete.

@jbrodman:

does it need to be per context or for all contexts running in the device? CUDA is just void, so I guess is the latter for them.

jandres742 commented 1 year ago

BTW: @jbrodman this is the repo for the loader. Please open it in the repo for the spec in level-zero

eero-t commented 10 months ago

@jbrodman If you added ticket to spec repo, could you close this one? (link to that spec bug you filed would be nice too)