haosulab / ManiSkill

SAPIEN Manipulation Skill Framework, a GPU parallelized robotics simulator and benchmark
https://maniskill.ai/
Apache License 2.0
930 stars 168 forks source link

[Question] GPU mem not available when trying the gpu_sim example #650

Closed kevinqyh0827 closed 4 weeks ago

kevinqyh0827 commented 1 month ago

Hi, I was trying running the "mani_skill.examples.benchmarking.gpu_sim" example from the docementation website. I got this error below: image

I add 2 lines to print the cpu_mem and gpu_mem. It shows the "gpu_mem" is "NonType". I wondering if it's because my gpu is not working or other issues.

I was running ManiSkill in docker and my server is Ubuntu 22.04, GPU is RTX 3090. The output info of "nvidia-smi" inside the docker container is: image If it is the Ubuntu version issue, currently I am done since I do not have the permission to downgrade it.

Thanks so much!

StoneT2000 commented 1 month ago

Is your docker correctly setup for NVIDIA drivers? See https://maniskill.readthedocs.io/en/latest/user_guide/getting_started/docker.html for nvidia containers

kevinqyh0827 commented 1 month ago

Hi, thanks for your reply. I noticed the doc suggested to install "nvidia-docker v2". But search results and official instructions both indicate this wrapper is no longer supported. They sugggest to use "nvidia-container-toolkit". Here is the "nvidia-container-toolkit" and other tools installed on the server: image

I think it is correctly setup since I have finished other tasks inside a docker container requiring access to GPUs. Is this version suitable for Maniskill? Is there any other way or other example code in this repo to test this issue?

Thanks!

StoneT2000 commented 1 month ago

unfortunately I am not really sure here why nvidia-smi is not showing the GPU usage. the thing is you are most likely using ti since the gpu sim script did run (just couldn't report GPU usage). the 22k fps seems about right with the default settings

kevinqyh0827 commented 1 month ago

Hi, thanks for your suggestions. Sorry for the misleading nvidia-smi image. I wanted to show that the GPU driver is correct in the initial problem description. Here is the usage of GPUs when running the "gpu_sim" code: image

The lower sections shows 0 MiB after the code got crashed. Besides, I did set the docker running options with "-it". Is this the possible reason why it does show the gpu-mem inside the code?

Thanks!

StoneT2000 commented 1 month ago

im not really sure, i don't know how to debug the nvidia docker container. It does seem like at least the simulator works fine but the pynvml package can't seem to read the GPU memory. But it would be great if you could help make a PR fixing the benchmarking code and check if the gpu sim metrics are none or not/

kevinqyh0827 commented 1 month ago

Hi, I found this is because the seperate process ID namespace between docker container and the host PC. The most direct solution to this issue (on my server) is to add the "--pid=host" option to mapping all pid inside container to the host PC when running a docker container.

Not sure if this could be a PR. Maybe add it to the doc page?

Thanks!

StoneT2000 commented 4 weeks ago

Thanks for the details, we can add it to the docker docs