Open AronCao49 opened 1 year ago
Thanks for bringing this to our attention and for all your supplied information. We will have a look into it. I hope in the meantime that the 3070 is adequate for you to make some headway in the challenge.
Thanks for your prompt reply. Currently, I can run Benchbot on 4080 by hide the Viewport window when initializing the Isaac Sim, and then resume the Viewport window after the GPU crashing error passes. I think the controller instability should be due to this GPU crash error at the beginning, while it does not affect the controller in the following process (i.e., I guess, as soon as no rendering work is processing when the GPU crash error is happening, the Issac Sim can operate normally).
Anyway, hope this observation can bring some help to further perfect this project. Looking forward to your great work in the future!
Hmmm the current advice we have been given for Omniverse is that the newer graphics drivers might actually be causing the issue. Have you tried downgrading your graphics driver to 525 (would also need to downgrade CUDA to 12.0). There is some advice as to how to do this for BenchBot here https://github.com/qcr/benchbot/issues/92#issuecomment-1505011875
Yes, I have. The modification suggested in #92 does not work in my case. Though it can help to install driver 525 and CUDA 12 at the beginning, the following cuda-driver installation would force to update the driver version to 530. The validation step of the cuda-driver does not allow any other version of the cuda-driver but the newest one. A possible workaround is to change the cuda-driver/
in ben/benchbot_install
to cuda-driver-*
. However, even passed all validation, the installation still failed, which unfortunately I did not preserve any log or output of this step...
Based on my experience, the 3070 works perfectly with the newest version of CUDA (12.1) and driver (530) but not for the 4080. Maybe the incompatibility between 40x GPUs and CUDA or driver causes this issue.
Even if follow the guide to install cuda=12.0.0-1
, it upgrade the nvidia-driver to the latest one such as
I found a way to downgrade nvidia-driver and cuda packages and succeed to launch benchbot without GPU crash at 4080 GPU card. My solution is:
Uninstall cuda/nvidia-driver and install below packages manually before run benchbot install script.
# Install nvidia-driver 525
$ sudo apt install nvidia-driver-525
# Check the latest cuda-drivers for 525 (e.g. 525.125.06-1) at
# https://developer.download.nvidia.com/compute/cuda/repos/ubuntu2004/x86_64/ and install it.
$ sudo apt install cuda-drivers=525.125.06-1
# Run benchbot installer
$ ./install or benchbot_install
I hope that it will be help to others who use 40xx graphic card.
Thanks @darkain84 for pinging this issue again. I have just pushed an update to benchbot_install
to try and fix the cuda driver/cuda version issues that people have been encountering.
If someone with hardware known to cause the crashes could you please try a fresh install (without pre-existing cuda and nvidia drivers) and confirm that this problem has been resolved?
Hi, I recently try to install benchbot on two host machine. For the first one with 3070, the benchbot works smoothly. However, when I try to run the same command on the second machine with 4080, similar controller crashing as #92 happens. Specifically, the Isaac Sim window pops up at the first time holding for like 10 sec, then vanishes as soon as the starting the robot controller is "Ready".
I takes some time trying to fix this issue following some potential solutions from #92 but none of them work. Fortunately, somehow when I try to full screen the Issac Sim window and maximises the Console window to check its output...the Issac Sim survives!
Although it sounds a bit silly, I give quiet a few testing rounds and the Issac Sim window is successfully preserved for each round, which is also able to be used to conduct the exemplary demo like "hello_passive" provided. Shortly speaking, the key is to hide the Viewport window when initializing the Isaac Sim, like the screen shot below:
Comparing the console outputs from 3070 and 4080, the issue may lie upon two Error message:
which I can only find on 4080 but not 3070, possiblely causing the controller crash.
There are also some outputs from the console that are different. I will attach the screenshot for further discussion. The specifications of my host machines can be found below also.
Running Nvidia related system checks: NVIDIA GPU available: Found card of type '10de:2484' NVIDIA driver is running: Found NVIDIA driver version valid: Valid (530.30.02) NVIDIA driver from a standard PPA: PPA is valid CUDA drivers installed: Drivers found CUDA drivers version valid: Valid (530.30.02-1) CUDA drivers from the NVIDIA PPA: PPA is valid CUDA is installed: CUDA found CUDA version valid: Valid (12.1) CUDA is from the NVIDIA PPA: PPA is valid
Running Docker related system checks: Docker is available: Found Docker version valid: Valid (20.10.12) NVIDIA Container Toolkit installed: Found (1.13.1) Docker runs without root: Passed
Running checks of filesystem used for Docker: /var/lib/docker on ext4 filesystem: Yes (/dev/sdb6) /var/lib/docker supports suid: Enabled /var/lib/docker drive space check: Sufficient space (203G)
Miscellaneous requirements: Pip python package manager available: Found (21.3.1) Tkinter for Python installed: Found PIL (with ImageTk) for Python install Found
Manual installation steps for Omniverse-powered Isaac Sim: License accepted for Omniverse: Yes Access to nvcr.io Docker registry: Yes
Core host system checks: Ubuntu version >= 20.04: Passed (20.04)
Running Nvidia related system checks: NVIDIA GPU available: Found card of type '10de:2704' NVIDIA driver is running: Found NVIDIA driver version valid: Valid (530.30.02) NVIDIA driver from a standard PPA: PPA is valid CUDA drivers installed: Drivers found CUDA drivers version valid: Valid (530.30.02-1) CUDA drivers from the NVIDIA PPA: PPA is valid CUDA is installed: CUDA found CUDA version valid: Valid (12.1) CUDA is from the NVIDIA PPA: PPA is valid
Running Docker related system checks: Docker is available: Found Docker version valid: Valid (23.0.6) NVIDIA Container Toolkit installed: Found (1.13.1) Docker runs without root: Passed
Running checks of filesystem used for Docker: /var/lib/docker on ext4 filesystem: Yes (/dev/sda1) /var/lib/docker supports suid: Enabled /var/lib/docker drive space check: Sufficient space (785G)
Miscellaneous requirements: Pip python package manager available: Found (23.1.2) Tkinter for Python installed: Found PIL (with ImageTk) for Python install Found
Manual installation steps for Omniverse-powered Isaac Sim: License accepted for Omniverse: Yes Access to nvcr.io Docker registry: