Open nikolai-franke opened 1 year ago
You may try passing offscreen_only=True
to SapienRenderer
constructor. This behavior will be changed in the future (to make CUDA device take higher priority than on-screen rendering)
Passing offscreen_only=True
doesn't make a difference.
I cannot figure out what is causing the issue. I think you should set the pci id of the device you want to use directly. This method requires a bit setup but should never fail. First, before creating anything with SAPIEN, run sapien.SapienRenderer.set_log_level("info")
. Next, run your code. You will see a table listing devices visible to Vulkan. From there, you will see all your GPUs with a field PciBus
. The PciBus is unique to each of your physical GPU. Next when you create SapienRenderer
, you can pass device="pci:x"
where x
is the PciBus id shown in the log. This should bypass all other checks.
Thank you very much for your answer! Sadly the result is still exactly the same. GPU 0 always gets used, even when selecting another GPU via PCI address.
Are you using sapien==2.2.2
? I have verified that the GPU selection feature is working. You can try sapien.SapienRenderer.set_log_level("info")
before creating the renderer. It will list all available GPUs to the console and tell you which GPU is selected for rendering. Since an incorrect pci id will result in an error, I guess that maybe some other program is running on your GPU 0 and it is not SAPIEN renderer.
I'm actually having the same issue.
System:
Describe the bug SAPIEN always uses GPU 0 in multi-GPU setup in addition to the GPU specified by
CUDA_VISIBLE_DEVICES
To Reproduce
CUDA_VISIBLE_DEVICES=0
CUDA_VISIBLE_DEVICES=1
Expected behavior Checking the GPU usage, only the selected GPU should be used. For
CUDA_VISIBLE_DEVICES=0
, that is the case. ForCUDA_VISIBLE_DEVICES=1
, both GPU 0 and GPU 1 get used.Screenshots
CUDA_VISIBLE_DEVICES=0
:CUDA_VISIBLE_DEVICES=1
:Additional context Even though GPU 0 only gets used a bit when
CUDA_VISIBLE_DEVICES=1
, this usage quickly adds up when running many parallel simulations. I am using ManiSkill2 for Reinforcement Learning on an HPC node with 4 Nvidia A100 GPUs and this bug severely limits the number of parallel environments I can run. Additionally, running many parallel environments becomes slow, since GPU 0 is used by every single simulation environment instead of just 1/4th of the simulations.