oneapi-src / oneCCL

oneAPI Collective Communications Library (oneCCL)
https://oneapi-src.github.io/oneCCL
Other
185 stars 66 forks source link

[bug] sycl_allreduce_test failed on Intel(R) Arc(TM) A770 Graphics #123

Open ClarkChin08 opened 1 month ago

ClarkChin08 commented 1 month ago

oneCCL commit: 5e7c7b7e33f5f679cb82547c4f7e49623ff0ab09 build: cmake .. -DCMAKE_C_COMPILER=icx -DCMAKE_CXX_COMPILER=icpx -DCOMPUTE_BACKEND=dpcpp run command: examples/sycl$ mpirun -n 2 ./sycl_allreduce_test gpu

Log: preferred platform: Intel(R) Level-Zero, found: 8 GPU device(s) preferred platform: Intel(R) Level-Zero, found: 8 GPU device(s) Created context from devices of type: gpu Devices [1]: [0]: [Intel(R) Arc(TM) A770 Graphics] Created context from devices of type: gpu Devices [1]: [0]: [Intel(R) Arc(TM) A770 Graphics] terminate called after throwing an instance of 'sycl::_V1::runtime_error'
what(): Native API failed. Native API returns: -999 (Unknown PI error) -999 (Unknown PI error) terminate called after throwing an instance of 'sycl::_V1::runtime_error'
what(): Native API failed. Native API returns: -999 (Unknown PI error) -999 (Unknown PI error)

garzaran commented 1 month ago

Could you try to run an allgather test and see if that passes, since Allgather will not have computation.

ClarkChin08 commented 1 month ago

@garzaran Run this command: mpirun -n 2 ./sycl_allgatherv_test gpu The same error.

nikitaxgusev commented 1 month ago

@ClarkChin08 I see that you have sycl runtime error. Does sycl work successfully in your environment? First, you can check by some simple sycl example if sycl works without oneccl, you can also check if sycl-ls reports correct information.

ClarkChin08 commented 1 month ago

@ClarkChin08 I see that you have sycl runtime error. Does sycl work successfully in your environment? First, you can check by some simple sycl example if sycl works without oneccl, you can also check if sycl-ls reports correct information.

sycl works well in my machine. This is the sycl-ls reports. image

nikitaxgusev commented 3 weeks ago

@ClarkChin08 based on your provided error, I assume that you haven't reached the oneccl init because I see sycl issues. Is that correct?

ClarkChin08 commented 2 weeks ago

@ClarkChin08 based on your provided error, I assume that you haven't reached the oneccl init because I see sycl issues. Is that correct?

I checked out the tag 2021.11 and it worked.

nikitaxgusev commented 2 weeks ago

@ClarkChin08 can you switch on your workable tag? if yes, please close the issue.