ministry-of-silly-code / experiment_buddy

GNU Affero General Public License v3.0
9 stars 5 forks source link

rocm4.2 sucks #81

Open manuel-delverme opened 3 years ago

manuel-delverme commented 3 years ago

Olexa Bilaniuk 6 minutes ago @Manuel DV HIP is the CUDA-to-ROCm (AMD) translation layer. You should not have gotten this error unless you have a very special build of CUDA. How did you install this PYTorch? (edited)

Manuel Del Verme 4 minutes ago [=== Module python/3.7 loaded ===] [=== Module cudatoolkit/11.0 loaded ===] [=== Module cuda/11.0/cudnn/8.0 loaded ===] [=== Module cudatoolkit/11.0 loaded ===] [=== Module cuda/11.0/nccl/2.7 loaded ===] [=== Module openmpi/4.0.4 loaded ===] [=== Module pytorch/1.7.0 loaded ===] but then Collecting torch Using cached https://download.pytorch.org/whl/rocm4.2/torch-1.9.0%2Brocm4.2-cp37-cp37m-linux_x86_64.whl (995.4 MB) torch-1.9.0+rocm4.2

Manuel Del Verme 3 minutes ago is this it? torch cuda mismatch ?

Olexa Bilaniuk 3 minutes ago Yeah. ROCm. Not supposed to use this!

Olexa Bilaniuk 3 minutes ago ROCm is the AMD version of CUDA. HIP is the transpiler that converts CUDA code to ROCm.

Olexa Bilaniuk 2 minutes ago I don't know why you installed this wheel in particular but it's the wrong choice. Wrong wrong wheel.

Manuel Del Verme 2 minutes ago why does pip install torch install rocm :disappointed:

Olexa Bilaniuk < 1 minute ago Recommend you specifically install the CUDA build +cu110.