Closed kevinqyh0827 closed 4 weeks ago
Is your docker correctly setup for NVIDIA drivers? See https://maniskill.readthedocs.io/en/latest/user_guide/getting_started/docker.html for nvidia containers
Hi, thanks for your reply. I noticed the doc suggested to install "nvidia-docker v2". But search results and official instructions both indicate this wrapper is no longer supported. They sugggest to use "nvidia-container-toolkit". Here is the "nvidia-container-toolkit" and other tools installed on the server:
I think it is correctly setup since I have finished other tasks inside a docker container requiring access to GPUs. Is this version suitable for Maniskill? Is there any other way or other example code in this repo to test this issue?
Thanks!
unfortunately I am not really sure here why nvidia-smi is not showing the GPU usage. the thing is you are most likely using ti since the gpu sim script did run (just couldn't report GPU usage). the 22k fps seems about right with the default settings
Hi, thanks for your suggestions. Sorry for the misleading nvidia-smi image. I wanted to show that the GPU driver is correct in the initial problem description. Here is the usage of GPUs when running the "gpu_sim" code:
The lower sections shows 0 MiB after the code got crashed. Besides, I did set the docker running options with "-it". Is this the possible reason why it does show the gpu-mem inside the code?
Thanks!
im not really sure, i don't know how to debug the nvidia docker container. It does seem like at least the simulator works fine but the pynvml
package can't seem to read the GPU memory. But it would be great if you could help make a PR fixing the benchmarking code and check if the gpu sim metrics are none or not/
Hi, I found this is because the seperate process ID namespace between docker container and the host PC. The most direct solution to this issue (on my server) is to add the "--pid=host" option to mapping all pid inside container to the host PC when running a docker container.
Not sure if this could be a PR. Maybe add it to the doc page?
Thanks!
Thanks for the details, we can add it to the docker docs
Hi, I was trying running the "mani_skill.examples.benchmarking.gpu_sim" example from the docementation website. I got this error below:
I add 2 lines to print the cpu_mem and gpu_mem. It shows the "gpu_mem" is "NonType". I wondering if it's because my gpu is not working or other issues.
I was running ManiSkill in docker and my server is Ubuntu 22.04, GPU is RTX 3090. The output info of "nvidia-smi" inside the docker container is: If it is the Ubuntu version issue, currently I am done since I do not have the permission to downgrade it.
Thanks so much!