Open bertwesarg opened 1 year ago
Thank you for reporting this. If I understand it correctly HIP_VISIBLE_DEVICES is a list of physical devices that is visible to the process (and set by the resource manager) scheduled to run on the compute node. Let’s assume we have devices 4,5,6,7 set. Should papi_native_avail show events for those using device=0,1,2,3 and then remap them internally to 4,5,6,7? Would the behaviour be similar for cuda?
I'm currently not able to check CUDA. Your understanding is correct. But for ROCm it actually depends on the used runtime. It looks like rocprofiler
is on the ROCm level, i.e., the same as ROCm SMI, but each higher-level runtime has there own GPU isolation mechanism. Because PAPI does not know which runtime is used by the application, I think the only solution is, to document that the PAPI ROCm component expects ROCm SMI device indices, and the application using PAPI needs to take care of the mapping.
In case you are interested, here is how Score-P does this mapping via the device UUID:
Is the ROCR_VISIBLE_DEVICES ever used? It looks like this is the right isolation mechanism for the GPU runtime (including rocprofiler which relies on hsa for detecting agents). My assumption when I wrote the rocm component was that rocprofiler will only see the GPU (agents) in the current partition & number them from 0 to N…
does not work on my side:
$ ROCR_VISIBLE_DEVICES=0 rocm-smi
========================= ROCm System Management Interface =========================
=================================== Concise Info ===================================
GPU Temp (DieEdge) AvgPwr SCLK MCLK Fan Perf PwrCap VRAM% GPU%
0 41.0c 43.0W 800Mhz 1600Mhz 0% auto 300.0W 0% 0%
1 43.0c 42.0W 800Mhz 1600Mhz 0% auto 300.0W 0% 0%
====================================================================================
=============================== End of ROCm SMI Log ================================
it works for rocminfo
though
$ ROCR_VISIBLE_DEVICES=0 rocminfo | grep -A 1 '^ Name:'
Name: AMD EPYC 7702 64-Core Processor
Uuid: CPU-XX
--
Name: AMD EPYC 7702 64-Core Processor
Uuid: CPU-XX
--
Name: AMD EPYC 7702 64-Core Processor
Uuid: CPU-XX
--
Name: AMD EPYC 7702 64-Core Processor
Uuid: CPU-XX
--
Name: AMD EPYC 7702 64-Core Processor
Uuid: CPU-XX
--
Name: AMD EPYC 7702 64-Core Processor
Uuid: CPU-XX
--
Name: AMD EPYC 7702 64-Core Processor
Uuid: CPU-XX
--
Name: AMD EPYC 7702 64-Core Processor
Uuid: CPU-XX
--
Name: gfx90a
Uuid: GPU-f43096f78d390147
$ ROCR_VISIBLE_DEVICES=1 rocminfo | grep -A 1 '^ Name:'
grep: warning: GREP_OPTIONS is deprecated; please use an alias or script
Name: AMD EPYC 7702 64-Core Processor
Uuid: CPU-XX
--
Name: AMD EPYC 7702 64-Core Processor
Uuid: CPU-XX
--
Name: AMD EPYC 7702 64-Core Processor
Uuid: CPU-XX
--
Name: AMD EPYC 7702 64-Core Processor
Uuid: CPU-XX
--
Name: AMD EPYC 7702 64-Core Processor
Uuid: CPU-XX
--
Name: AMD EPYC 7702 64-Core Processor
Uuid: CPU-XX
--
Name: AMD EPYC 7702 64-Core Processor
Uuid: CPU-XX
--
Name: AMD EPYC 7702 64-Core Processor
Uuid: CPU-XX
--
Name: gfx90a
Uuid: GPU-5310b8602059ef91
So I think, PAPI does not do anything in the code at all at the moment, it just needs to be clear, that the component expects the HSA level device index. Neither the HIP/HCA/OpenCL/OpenMP Target device index, nor the SMI/kernel level device index.
Agree. What would be the right way of making this clear, in your opinion? Add a comment to the component README?
Yeah, probably the best place. Looks like the PAPI device index is derived from the hsa_iterate_agents
, so just mention this too.
When setting
HIP_VISIBLE_DEVICES
the id in the:device=%d
event name suffix is still the hardware device index, not the HIP device index.The
./sample_multi_kernel_monitoring
test always uses:device=0
, so starting it with differentHIP_VISIBLE_DEVICES
values will result in 0-value results: