RuntimeError: nvrtc: error: invalid value for --gpu-architecture (-arch)

rasrab1992 closed 1 month ago

rasrab1992 commented 10 months ago


I am a researcher from IMEC involved in the SPEAR project. I'm eager to explore your simulation tool before the kickoff meeting so that I can formulate questions and concerns for WP2. However, I encountered an error while trying to install the 'aerial_gym_simulator,' as shown in the image below: Screenshot from 2023-11-07 10-51-02 I have attempted to find a solution by searching the Nvidia forums and discovered that others have experienced a similar issue related to the RTX 4090 GPU, as documented here. I've tried both conda and docker, but the problem persists.

Could you please provide your suggestions on how to proceed with resolving this issue? Maybe one solution is changing the driver version to 525?


mihirk284 commented 10 months ago

Hi, this seems to be an issue with PyTorch and nvrtc for the new RTX 4090 GPUs. Please check this issue, and more specifically, this comment.

As suggested in the thread, perhaps upgrading a python version to a nightly build may resolve the issue as detailed in this comment.

rasrab1992 commented 10 months ago

Hello, Thank you for your help. I will test it and let you know about the result.

pompomO commented 9 months ago

Hello, Thank you for your help. I will test it and let you know about the result.

Hi, have you solved this problem?

pompomO commented 9 months ago

here is my nvidia driver version and cuda version 图片 图片

EtorArza commented 7 months ago

I also had this issue, and I was able to solve it with

pip3 install --pre torch torchvision torchaudio -f https://download.pytorch.org/whl/test/cu117/torch_test.html

and restarting the computer (as suggested in the comment https://github.com/pytorch/pytorch/issues/87595#issuecomment-1289832110).

I'm running pop OS 22.04 on a 4080 with cuda_11.5.r11.5/compiler.30672275_0

mihirk284 commented 7 months ago

@pompomO @rasrab1992 Can you please try the above solution here to see if it works?

If not, please try out the fix provided in #3.

MJavadZallaghi commented 4 months ago

@pompomO @rasrab1992 Can you please try the above solution here to see if it works?

If not, please try out the fix provided in #3.

I'm facing the same issue with GPU RTX 4060 and Ubuntu 20.04, and going to test the solutions provided here.

I will post the results here after testing your solutions.

MJavadZallaghi commented 4 months ago

@pompomO @rasrab1992 Can you please try the above solution here to see if it works? If not, please try out the fix provided in #3.

I'm facing the same issue with GPU RTX 4060 and Ubuntu 20.04, and going to test the solutions provided here.

I will post the results here after testing your solutions.

I have tested this solution, but the issue is not solved.

My terminal's screenshot: image

EtorArza commented 4 months ago

@MJavadZallaghi what version of torch, torchvision and torchaudio are you using?

I'm on torch==2.3.0, torchvision==0.14.1+cu117, torchaudio==0.13.1+cu117.

https://github.com/pytorch/pytorch/issues/87595#issuecomment-1865391649 says that updating these to newer versions can solve the issue. You might also need to update the cuda version. Im on CUDA Version: 12.4

MJavadZallaghi commented 4 months ago

@EtorArza This is version list of all the installed package inside the rlgpu conda environment (installed by isaac gym preview 4):

pytorch version is 1.8.1. How can I upgrade it inside the rlgpu env? Because I'm not sure about dependencies.

EtorArza commented 4 months ago

@MJavadZallaghi upgrading your pytorch might solve the issue. Activate the environment conda activate rlgpu and then follow the instructions in https://pytorch.org/ in section "Install PyTorch". The installation command is

conda install pytorch torchvision torchaudio pytorch-cuda=12.1 -c pytorch -c nvidia

However, for the latest pytorch version 2.3.0, it says in the website that python 3.8 is required, and you seem to be running python 3.7, so I don't know if it will work. I think @mihirk284 is runnig python 3.7 so perhaps he can tell us what pytorch version he is running?

MJavadZallaghi commented 4 months ago

@EtorArza I have tried this command, see the result:

mjavadzallaghi@mjavadzallaghi-Legion-Pro-5-16IRX9:~/aerial_gym_ws/aerial_gym_reference_governor/aerial_gym/scripts$ conda activate rlgpu
(rlgpu) mjavadzallaghi@mjavadzallaghi-Legion-Pro-5-16IRX9:~/aerial_gym_ws/aerial_gym_reference_governor/aerial_gym/scripts$ conda install pytorch torchvision torchaudio pytorch-cuda=12.1 -c pytorch -c nvidia
Retrieving notices: ...working... done
 - pytorch
 - nvidia
 - defaults
Platform: linux-64
Collecting package metadata (repodata.json): done
Solving environment: done

# All requested packages already installed.

(rlgpu) mjavadzallaghi@mjavadzallaghi-Legion-Pro-5-16IRX9:~/aerial_gym_ws/aerial_gym_reference_governor/aerial_gym/scripts$ 

As you mentioned, python version of rlgpu is 3.7, while new versions of pytorch need 3.8. And this is my blocking point to find out how I can update the pythorch version inside rlgpu. @EtorArza Did you face this issue? @mihirk284 Any hint?

EtorArza commented 4 months ago

@MJavadZallaghi I'm on python 3.8, which is why I can install the latest pytorch version. Im in 22.04, so I get python 3.8 by default. It should be possible to install python 3.8 in 20.04 as well. You need to create the rlgpu environment from scratch. You can change the isaacgym/python/rlgpu_conda_env.yml file before running ./create_conda_env_rlgpu.sh. The isaacgym/python/rlgpu_conda_env.yml file should look like this:

name: rlgpu
  - pytorch
  - conda-forge
  - defaults
  - python=3.8
  - pyyaml>=5.3.1
  - scipy>=1.5.0
  - tensorboard>=2.2.1

After you edit this file and run ./create_conda_env_rlgpu.sh you need to install everything again. You can check your python version inside the environment on reinstall to see if you are actually in 3.8.

MJavadZallaghi commented 4 months ago

@EtorArza Thank you! I have solved the issue with your help.

For the next person who face the issue, I did these steps:

  1. Downloaded Isaac Gym SDK (version 4) and extracted the SDK in the home (~/isaacgym)
  2. Modified the file ~/isaacgym/python/rlgpu_conda_env.yml to update python version to 3.8:
    name: rlgpu
    - pytorch
    - conda-forge
    - defaults
    - python=3.8 #3.7
    #- pytorch=2.3.0 #1.8.1
    #- torchvision=0.9.1
    #- cudatoolkit=12.1 #11.1
    - pyyaml>=5.3.1
    - scipy>=1.5.0
    - tensorboard>=2.2.1
  3. in the directory ~/isaacgym, run the command ./create_conda_env_rlgpu.sh and wait for rlgpu conda env installation.
  4. activate rlgpu env by conda activate rlgpu.
  5. run command conda install -c fvcore -c iopath -c conda-forge fvcore iopath. DO NOT run the command conda install -c pytorch3d pytorch3d as it will downgrade some torch related libraries. pytorch3d will be installed in the next steps.
  6. Go to the directory of aerial gym aerial_gym_simulator/ and run pip3 install -e . This step will install pytorch3d for you.
  7. Got to the example directories of aerial gym and run python example.py. You must see this beautiful scene now: image

For more information:

- list of the installed packages and their version in the rlgpu env:
# packages in environment at /home/mjavadzallaghi/anaconda3/envs/rlgpu:
# Name                    Version                   Build  Channel
_libgcc_mutex             0.1                 conda_forge    conda-forge
_openmp_mutex             4.5                       2_gnu    conda-forge
absl-py                   2.1.0              pyhd8ed1ab_0    conda-forge
aerial-gym                1.0.0                     dev_0    <develop>
brotli-python             1.1.0            py38h17151c0_1    conda-forge
bzip2                     1.0.8                hd590300_5    conda-forge
c-ares                    1.28.1               hd590300_0    conda-forge
ca-certificates           2024.2.2             hbcca054_0    conda-forge
certifi                   2024.2.2           pyhd8ed1ab_0    conda-forge
charset-normalizer        3.3.2              pyhd8ed1ab_0    conda-forge
colorama                  0.4.6              pyhd8ed1ab_0    conda-forge
contourpy                 1.1.1                    pypi_0    pypi
cycler                    0.12.1                   pypi_0    pypi
filelock                  3.14.0                   pypi_0    pypi
fonttools                 4.51.0                   pypi_0    pypi
freetype                  2.12.1               h267a509_2    conda-forge
fsspec                    2024.3.1                 pypi_0    pypi
fvcore                    0.1.5.post20210915            py38    fvcore
grpcio                    1.62.2           py38h94a1851_0    conda-forge
idna                      3.7                pyhd8ed1ab_0    conda-forge
imageio                   2.34.1                   pypi_0    pypi
importlib-metadata        7.1.0              pyha770c72_0    conda-forge
importlib-resources       6.4.0                    pypi_0    pypi
iopath                    0.1.9                      py38    iopath
isaacgym                  1.0rc4                    dev_0    <develop>
jinja2                    3.1.4                    pypi_0    pypi
kiwisolver                1.4.5                    pypi_0    pypi
lcms2                     2.16                 hb7c19ff_0    conda-forge
ld_impl_linux-64          2.40                 h55db66e_0    conda-forge
lerc                      4.0.0                h27087fc_0    conda-forge
libabseil                 20240116.2      cxx17_h59595ed_0    conda-forge
libblas                   3.9.0           20_linux64_openblas    conda-forge
libcblas                  3.9.0           20_linux64_openblas    conda-forge
libdeflate                1.20                 hd590300_0    conda-forge
libffi                    3.4.2                h7f98852_5    conda-forge
libgcc-ng                 13.2.0               h77fa898_7    conda-forge
libgfortran-ng            13.2.0               h69a702a_7    conda-forge
libgfortran5              13.2.0               hca663fb_7    conda-forge
libgomp                   13.2.0               h77fa898_7    conda-forge
libgrpc                   1.62.2               h15f2491_0    conda-forge
libjpeg-turbo             3.0.0                hd590300_1    conda-forge
liblapack                 3.9.0           20_linux64_openblas    conda-forge
libnsl                    2.0.1                hd590300_0    conda-forge
libopenblas               0.3.25          pthreads_h413a1c8_0    conda-forge
libpng                    1.6.43               h2797004_0    conda-forge
libprotobuf               4.25.3               h08a7969_0    conda-forge
libre2-11                 2023.09.01           h5a48ba9_2    conda-forge
libsqlite                 3.45.3               h2797004_0    conda-forge
libstdcxx-ng              13.2.0               hc0a3c3a_7    conda-forge
libtiff                   4.6.0                h1dd3fc0_3    conda-forge
libuuid                   2.38.1               h0b41bf4_0    conda-forge
libwebp-base              1.4.0                hd590300_0    conda-forge
libxcb                    1.15                 h0b41bf4_0    conda-forge
libxcrypt                 4.4.36               hd590300_1    conda-forge
libzlib                   1.2.13               hd590300_5    conda-forge
markdown                  3.6                pyhd8ed1ab_0    conda-forge
markupsafe                2.1.5            py38h01eb140_0    conda-forge
matplotlib                3.7.5                    pypi_0    pypi
mpmath                    1.3.0                    pypi_0    pypi
ncurses                   6.4.20240210         h59595ed_0    conda-forge
networkx                  3.1                      pypi_0    pypi
ninja                            pypi_0    pypi
numpy                     1.24.4           py38h59b608b_0    conda-forge
nvidia-cublas-cu12                 pypi_0    pypi
nvidia-cuda-cupti-cu12    12.1.105                 pypi_0    pypi
nvidia-cuda-nvrtc-cu12    12.1.105                 pypi_0    pypi
nvidia-cuda-runtime-cu12  12.1.105                 pypi_0    pypi
nvidia-cudnn-cu12                 pypi_0    pypi
nvidia-cufft-cu12                pypi_0    pypi
nvidia-curand-cu12               pypi_0    pypi
nvidia-cusolver-cu12               pypi_0    pypi
nvidia-cusparse-cu12               pypi_0    pypi
nvidia-nccl-cu12          2.20.5                   pypi_0    pypi
nvidia-nvjitlink-cu12     12.4.127                 pypi_0    pypi
nvidia-nvtx-cu12          12.1.105                 pypi_0    pypi
openjpeg                  2.5.2                h488ebb8_0    conda-forge
openssl                   3.3.0                hd590300_0    conda-forge
packaging                 24.0               pyhd8ed1ab_0    conda-forge
pillow                    10.3.0           py38h9e66945_0    conda-forge
pip                       24.0               pyhd8ed1ab_0    conda-forge
platformdirs              4.2.1              pyhd8ed1ab_0    conda-forge
pooch                     1.8.1              pyhd8ed1ab_0    conda-forge
portalocker               2.8.2            py38h578d9bd_1    conda-forge
protobuf                  4.25.3           py38hb5c7596_0    conda-forge
pthread-stubs             0.4               h36c2ea0_1001    conda-forge
pyparsing                 3.1.2                    pypi_0    pypi
pysocks                   1.7.1              pyha2e5f31_6    conda-forge
python                    3.8.19          hd12c33a_0_cpython    conda-forge
python-dateutil           2.9.0.post0              pypi_0    pypi
python_abi                3.8                      4_cp38    conda-forge
pytorch3d                 0.3.0                    pypi_0    pypi
pyyaml                    6.0.1            py38h01eb140_1    conda-forge
re2                       2023.09.01           h7f4b329_2    conda-forge
readline                  8.2                  h8228510_1    conda-forge
requests                  2.31.0             pyhd8ed1ab_0    conda-forge
scipy                     1.10.1           py38h59b608b_3    conda-forge
setuptools                69.5.1             pyhd8ed1ab_0    conda-forge
six                       1.16.0             pyh6c4a22f_0    conda-forge
sympy                     1.12                     pypi_0    pypi
tabulate                  0.9.0              pyhd8ed1ab_1    conda-forge
tensorboard               2.16.2             pyhd8ed1ab_0    conda-forge
tensorboard-data-server   0.7.0            py38hcdda232_1    conda-forge
termcolor                 2.4.0              pyhd8ed1ab_0    conda-forge
tk                        8.6.13          noxft_h4845f30_101    conda-forge
torch                     2.3.0                    pypi_0    pypi
torchvision               0.18.0                   pypi_0    pypi
tqdm                      4.66.4             pyhd8ed1ab_0    conda-forge
transitions               0.9.0                    pypi_0    pypi
triton                    2.3.0                    pypi_0    pypi
typing-extensions         4.11.0                   pypi_0    pypi
urllib3                   2.2.1              pyhd8ed1ab_0    conda-forge
werkzeug                  3.0.3              pyhd8ed1ab_0    conda-forge
wheel                     0.43.0             pyhd8ed1ab_1    conda-forge
xorg-libxau               1.0.11               hd590300_0    conda-forge
xorg-libxdmcp             1.1.3                h7f98852_0    conda-forge
xz                        5.2.6                h166bdaf_0    conda-forge
yacs                      0.1.8              pyhd8ed1ab_0    conda-forge
yaml                      0.2.5                h7f98852_2    conda-forge
zipp                      3.17.0             pyhd8ed1ab_0    conda-forge
zstd                      1.5.6                ha6fb4c9_0    conda-forge
piratax007 commented 3 months ago

Hi, I followed the steps commented by @MJavadZallaghi but now I received this error:

Importing module 'gym_38' (/home/r2d2/isaacgym/python/isaacgym/_bindings/linux-x86_64/gym_38.so)
Setting GYM_USD_PLUG_INFO_PATH to /home/r2d2/isaacgym/python/isaacgym/_bindings/linux-x86_64/usd/plugInfo.json
AERIAL_GYM_ROOT_DIR /home/r2d2/isaacgym/aerial_gym_simulator
PyTorch version 2.3.0+cu121
Device count 1
Using /home/r2d2/.cache/torch_extensions/py38_cu121 as PyTorch extensions root...
Creating extension directory /home/r2d2/.cache/torch_extensions/py38_cu121/gymtorch...
Emitting ninja build file /home/r2d2/.cache/torch_extensions/py38_cu121/gymtorch/build.ninja...
Building extension module gymtorch...
Allowing ninja to set a default number of workers... (overridable by setting the environment variable MAX_JOBS=N)
[1/2] c++ -MMD -MF gymtorch.o.d -DTORCH_EXTENSION_NAME=gymtorch -DTORCH_API_INCLUDE_EXTENSION_H -DPYBIND11_COMPILER_TYPE=\"_gcc\" -DPYBIND11_STDLIB=\"_libstdcpp\" -DPYBIND11_BUILD_ABI=\"_cxxabi1011\" -isystem /home/r2d2/anaconda3/envs/rlgpu/lib/python3.8/site-packages/torch/include -isystem /home/r2d2/anaconda3/envs/rlgpu/lib/python3.8/site-packages/torch/include/torch/csrc/api/include -isystem /home/r2d2/anaconda3/envs/rlgpu/lib/python3.8/site-packages/torch/include/TH -isystem /home/r2d2/anaconda3/envs/rlgpu/lib/python3.8/site-packages/torch/include/THC -isystem /home/r2d2/anaconda3/envs/rlgpu/include/python3.8 -D_GLIBCXX_USE_CXX11_ABI=0 -fPIC -std=c++17 -DTORCH_MAJOR=2 -DTORCH_MINOR=3 -c /home/r2d2/isaacgym/python/isaacgym/_bindings/src/gymtorch/gymtorch.cpp -o gymtorch.o 
[2/2] c++ gymtorch.o -shared -L/home/r2d2/anaconda3/envs/rlgpu/lib/python3.8/site-packages/torch/lib -lc10 -ltorch_cpu -ltorch -ltorch_python -o gymtorch.so
Loading extension module gymtorch...
Traceback (most recent call last):
  File "example.py", line 12, in <module>
    from aerial_gym.envs import *
  File "/home/r2d2/isaacgym/aerial_gym_simulator/aerial_gym/envs/__init__.py", line 9, in <module>
    from .base.aerial_robot  import AerialRobot
  File "/home/r2d2/isaacgym/aerial_gym_simulator/aerial_gym/envs/base/aerial_robot.py", line 17, in <module>
    from isaacgym.torch_utils import *
  File "/home/r2d2/isaacgym/python/isaacgym/torch_utils.py", line 135, in <module>
    def get_axis_params(value, axis_idx, x_value=0., dtype=np.float, n_dims=3):
  File "/home/r2d2/anaconda3/envs/rlgpu/lib/python3.8/site-packages/numpy/__init__.py", line 305, in __getattr__
    raise AttributeError(__former_attrs__[attr])
AttributeError: module 'numpy' has no attribute 'float'.
`np.float` was a deprecated alias for the builtin `float`. To avoid this error in existing code, use `float` by itself. Doing this will not modify any behavior and is safe. If you specifically wanted the numpy scalar type, use `np.float64` here.
The aliases was originally deprecated in NumPy 1.20; for more details and guidance see the original release note at:

For more information

- list of installed packages and versions

piratax007 commented 3 months ago

Solved changing the python version on setup.py from aerial-gym directory to numpy<=1.19.5