Open al42and opened 11 months ago
Thank you for the report!
Clearly urinfo should not be printing raw output like that, perhaps the type of UR_DEVICE_INFO_UUID
should be uint8_t[]
(this is what it is in Level Zero).
Would it make sense to have PCI_ADDRESS also return raw data as uint8_t[]
then? Just for consistency sake.
To implemement the UR_DEVICE_INFO_PCI_ADDRESS
query:
ze_pci_address_ext_t
cuDeviceGetPCIBusId
which returns a string.hipDeviceGetPCIBusId
which returns a string.I think from a practical viewpoint as an adapter maintainer I'd be inclinded to keep it as a string.
I could see the utility of this being a struct instead, as Level Zero does it, if a user wants to make logical choices based on those attributes. This would be extra work for CUDA/HIP. I don't think it will ever be implemented in all adapters though.
OpenCL has the cl_khr_pci_bus_info
extension, but OpenCL can be implemented on non-PCI devices, e.g. CPUs, so there will always be situations where this extension is not provided. Same situation as Native CPU adapter.
Thanks for putting me on to cl_khr_pci_bus_info, I wasn't aware.
CUDA adapter uses cuDeviceGetPCIBusId which returns a string.
One could get bus/device/function as integers from cudaGetDeviceProperties
: https://docs.nvidia.com/cuda/archive/9.0/cuda-runtime-api/structcudaDeviceProp.html#structcudaDeviceProp_1c5adc2ef8c6b89fb139b4684175db54a
Same for ROCm: https://docs.amd.com/projects/HIP/en/docs-5.2.0/doxygen/html/structhip_device_prop__t.html#aa0f598c0196ff1f429290a43c1fa4038
Seems like switching to returning a struct for PCI bus info should be doable without parsing strings then.
As for how to print the UUID, seems that this comes from the NVIDIA drivers GPU UUID:
$ cat /proc/driver/nvidia/gpus/*/information
...
GPU UUID: GPU-2ebb02f8-b748-9eed-a6d5-4d29cf68c055
...
So I think I'll replicate this foramt minus the GPU-
prefix.
Testing with this change locally doesn't fix the fact its printing in binary file mode, I've not got to the bottom of that yet.
https://github.com/oneapi-src/unified-runtime/pull/1243 fixes the UUID printing.
Testing with this change locally doesn't fix the fact its printing in binary file mode, I've not got to the bottom of that yet.
See #1281
Thanks @al42and!
https://github.com/oneapi-src/unified-runtime/pull/1313 still requires more validation & intel/llvm changes but pushing before going on holiday for a week.
Both
UR_DEVICE_INFO_UUID
andUR_DEVICE_INFO_PCI_ADDRESS
returnchar[]
. However, for PCI address, it is a pretty-printed string:But for UUID, it's the raw 16 bytes of the UUID itself (and the reason
grep
needs extra flags, because they don't fit ASCII when treated as a string):Both of these parameters are containing an object that is, at its core, a fixed-length sequence of bytes, but which also has a well-esteblished string representation. I don't have any preferences, but it would be nice to have a uniform treatment, or at least make the docs clear what exactly is returned in each case.
char[]
return type, as opposed tounsigned char[]
suggests a formatted string.P.S.: It's not very relevant for GROMACS project (although, we're considering querying PCI Addresses for CPU-GPU affinity checks); just reporting a counterintuitive behavior.