oneapi-src / unified-runtime

https://oneapi-src.github.io/unified-runtime/
Other
39 stars 117 forks source link

Inconsistent behavior between UR_DEVICE_INFO_UUID vs UR_DEVICE_INFO_PCI_ADDRESS #1177

Open al42and opened 11 months ago

al42and commented 11 months ago

Both UR_DEVICE_INFO_UUID and UR_DEVICE_INFO_PCI_ADDRESS return char[]. However, for PCI address, it is a pretty-printed string:

$ stdbuf -i0 -o0 -e0 ./bin/urinfo --verbose 2>&1 | grep --binary-files=text PCI_ADDRESS
      PCI_ADDRESS: 0000:03:00.0
      PCI_ADDRESS: 0000:00:02.0
      PCI_ADDRESS: 0000:0a:00.0
      PCI_ADDRESS: 0000:0C:00.0

But for UUID, it's the raw 16 bytes of the UUID itself (and the reason grep needs extra flags, because they don't fit ASCII when treated as a string):

$ stdbuf -i0 -o0 -e0 ./bin/urinfo --verbose 2>&1 | grep --binary-files=text UUID
      UUID: ���V
      UUID: ���F

      UUID: XX
      UUID: sa����"��%~�]�

Both of these parameters are containing an object that is, at its core, a fixed-length sequence of bytes, but which also has a well-esteblished string representation. I don't have any preferences, but it would be nice to have a uniform treatment, or at least make the docs clear what exactly is returned in each case. char[] return type, as opposed to unsigned char[] suggests a formatted string.

P.S.: It's not very relevant for GROMACS project (although, we're considering querying PCI Addresses for CPU-GPU affinity checks); just reporting a counterintuitive behavior.

alycm commented 10 months ago

Thank you for the report!

Clearly urinfo should not be printing raw output like that, perhaps the type of UR_DEVICE_INFO_UUID should be uint8_t[] (this is what it is in Level Zero).

al42and commented 10 months ago

Would it make sense to have PCI_ADDRESS also return raw data as uint8_t[] then? Just for consistency sake.

kbenzie commented 10 months ago

To implemement the UR_DEVICE_INFO_PCI_ADDRESS query:

I think from a practical viewpoint as an adapter maintainer I'd be inclinded to keep it as a string.

I could see the utility of this being a struct instead, as Level Zero does it, if a user wants to make logical choices based on those attributes. This would be extra work for CUDA/HIP. I don't think it will ever be implemented in all adapters though.

alycm commented 10 months ago

OpenCL has the cl_khr_pci_bus_info extension, but OpenCL can be implemented on non-PCI devices, e.g. CPUs, so there will always be situations where this extension is not provided. Same situation as Native CPU adapter.

kbenzie commented 10 months ago

Thanks for putting me on to cl_khr_pci_bus_info, I wasn't aware.

al42and commented 10 months ago

CUDA adapter uses cuDeviceGetPCIBusId which returns a string.

One could get bus/device/function as integers from cudaGetDeviceProperties: https://docs.nvidia.com/cuda/archive/9.0/cuda-runtime-api/structcudaDeviceProp.html#structcudaDeviceProp_1c5adc2ef8c6b89fb139b4684175db54a

Same for ROCm: https://docs.amd.com/projects/HIP/en/docs-5.2.0/doxygen/html/structhip_device_prop__t.html#aa0f598c0196ff1f429290a43c1fa4038

kbenzie commented 10 months ago

Seems like switching to returning a struct for PCI bus info should be doable without parsing strings then.

As for how to print the UUID, seems that this comes from the NVIDIA drivers GPU UUID:

$ cat /proc/driver/nvidia/gpus/*/information
...
GPU UUID:        GPU-2ebb02f8-b748-9eed-a6d5-4d29cf68c055
...

So I think I'll replicate this foramt minus the GPU- prefix.

Testing with this change locally doesn't fix the fact its printing in binary file mode, I've not got to the bottom of that yet.

kbenzie commented 10 months ago

https://github.com/oneapi-src/unified-runtime/pull/1243 fixes the UUID printing.

al42and commented 10 months ago

Testing with this change locally doesn't fix the fact its printing in binary file mode, I've not got to the bottom of that yet.

See #1281

kbenzie commented 10 months ago

Thanks @al42and!

kbenzie commented 10 months ago

https://github.com/oneapi-src/unified-runtime/pull/1313 still requires more validation & intel/llvm changes but pushing before going on holiday for a week.