pytorch / vision

Datasets, Transforms and Models specific to Computer Vision
https://pytorch.org/vision
BSD 3-Clause "New" or "Revised" License
15.96k stars 6.92k forks source link

Using torchvision on AMD GPU #4044

Open abedidev opened 3 years ago

abedidev commented 3 years ago

Hi,

I have installed and used the AMD version of PyTorch without any problem. I have also installed the AMD version of torchvision. However, I can not use torchvision with AMD GPU. Is there a solution to use torchvision on AMD GPU?

Thanks

NicolasHug commented 3 years ago

Hi @abedicodes , couldyou please describe what you tried and what error message you're getting?

fmassa commented 3 years ago

Hi @abedicodes ,

We have ROCm CI in torchvision, and we provide nightlies for Linux with pip (see https://download.pytorch.org/whl/nightly/rocm4.0.1/torch_nightly.html)

Installation instructions can be found in https://pytorch.org/get-started/locally/ image

What issues are you having when trying to use torchvision on an AMD GPU?

martinezhermes commented 3 years ago

@abedicodes might be referring to the error I've been experienced while trying to install the latest stable, 1.9.0, torch works without any issue, but torchvision fails, error falls back to RuntimeError: Error compiling objects for extension. ERROR: Failed building wheel for torchvision, Preview Nightly does install and seems to work correctly though.

NicolasHug commented 3 years ago

@martinezhermes the issue experienced here is likely unrelated to the latest release, since it was submitted prior to the release.

Are you experiencing an issue with AMD on the latest release as well? If so, could you please provide the commands that you ran, with the error messages, etc. ?

martinezhermes commented 3 years ago

@martinezhermes the issue experienced here is likely unrelated to the latest release, since it was submitted prior to the release.

Are you experiencing an issue with AMD on the latest release as well? If so, could you please provide the commands that you ran, with the error messages, etc. ?

Sure thing, currently running ROCm 4.2 with this device:


Number of devices:                               1
  Device Type:                                   CL_DEVICE_TYPE_GPU
  Vendor ID:                                     1002h
  Board name:                                    Vega 10 XL/XT [Radeon RX Vega 56/64]
  Device Topology:                               PCI[ B#3, D#0, F#0 ]
  Max compute units:                             56
  Max work items dimensions:                     3```

torch installs flawlessly by using:
`pip3 install torch -f https://download.pytorch.org/whl/rocm4.2/torch_stable.html`

torchvision fails by using:
`pip3 install ninja && pip3 install 'git+https://github.com/pytorch/vision.git@v0.10.0'`

Error message:
`Installing collected packages: pillow, numpy, torchvision
    Running setup.py install for torchvision ... error
    ERROR: Command errored out with exit status 1:
     command: /home/kipp/.pyenv/versions/3.8.10/envs/rocm_demo/bin/python3.8 -u -c 'import io, os, sys, setuptools, tokenize; sys.argv[0] = '"'"'/tmp/pip-req-build-o2jpn4eq/setup.py'"'"'; __f
ile__='"'"'/tmp/pip-req-build-o2jpn4eq/setup.py'"'"';f = getattr(tokenize, '"'"'open'"'"', open)(__file__) if os.path.exists(__file__) else io.StringIO('"'"'from setuptools import setup; setu
p()'"'"');code = f.read().replace('"'"'\r\n'"'"', '"'"'\n'"'"');f.close();exec(compile(code, __file__, '"'"'exec'"'"'))' install --record /tmp/pip-record-ntb5z461/install-record.txt --single-
version-externally-managed --compile --install-headers /home/kipp/.pyenv/versions/3.8.10/envs/rocm_demo/include/site/python3.8/torchvision
         cwd: /tmp/pip-req-build-o2jpn4eq/
    Complete output (788 lines):
    Building wheel torchvision-0.10.0a0+300a8a4`

... 

    Traceback (most recent call last):
      File "<string>", line 1, in <module>
      File "/tmp/pip-req-build-o2jpn4eq/setup.py", line 464, in <module>
        setup(
      File "/home/kipp/.pyenv/versions/3.8.10/envs/rocm_demo/lib/python3.8/site-packages/setuptools/__init__.py", line 153, in setup
        return distutils.core.setup(**attrs)
      File "/home/kipp/.pyenv/versions/3.8.10/lib/python3.8/distutils/core.py", line 148, in setup
        dist.run_commands()
      File "/home/kipp/.pyenv/versions/3.8.10/lib/python3.8/distutils/dist.py", line 966, in run_commands
        self.run_command(cmd)
      File "/home/kipp/.pyenv/versions/3.8.10/lib/python3.8/distutils/dist.py", line 985, in run_command
        cmd_obj.run()
      File "/home/kipp/.pyenv/versions/3.8.10/envs/rocm_demo/lib/python3.8/site-packages/setuptools/command/install.py", line 61, in run
        return orig.install.run(self)
      File "/home/kipp/.pyenv/versions/3.8.10/lib/python3.8/distutils/command/install.py", line 545, in run
        self.run_command('build')
      File "/home/kipp/.pyenv/versions/3.8.10/lib/python3.8/distutils/cmd.py", line 313, in run_command
        self.distribution.run_command(command)
      File "/home/kipp/.pyenv/versions/3.8.10/lib/python3.8/distutils/dist.py", line 985, in run_command
        cmd_obj.run()
      File "/home/kipp/.pyenv/versions/3.8.10/lib/python3.8/distutils/command/build.py", line 135, in run
        self.run_command(cmd_name)
      File "/home/kipp/.pyenv/versions/3.8.10/lib/python3.8/distutils/cmd.py", line 313, in run_command
        self.distribution.run_command(command)
      File "/home/kipp/.pyenv/versions/3.8.10/lib/python3.8/distutils/dist.py", line 985, in run_command
        cmd_obj.run()
      File "/home/kipp/.pyenv/versions/3.8.10/envs/rocm_demo/lib/python3.8/site-packages/setuptools/command/build_ext.py", line 79, in run
        _build_ext.run(self)
      File "/home/kipp/.pyenv/versions/3.8.10/lib/python3.8/distutils/command/build_ext.py", line 340, in run
        self.build_extensions()
      File "/home/kipp/.pyenv/versions/3.8.10/envs/rocm_demo/lib/python3.8/site-packages/torch/utils/cpp_extension.py", line 709, in build_extensions
        build_ext.build_extensions(self)
      File "/home/kipp/.pyenv/versions/3.8.10/lib/python3.8/distutils/command/build_ext.py", line 449, in build_extensions
        self._build_extensions_serial()
      File "/home/kipp/.pyenv/versions/3.8.10/lib/python3.8/distutils/command/build_ext.py", line 474, in _build_extensions_serial
        self.build_extension(ext)
      File "/home/kipp/.pyenv/versions/3.8.10/envs/rocm_demo/lib/python3.8/site-packages/setuptools/command/build_ext.py", line 196, in build_extension
        _build_ext.build_extension(self, ext)
      File "/home/kipp/.pyenv/versions/3.8.10/lib/python3.8/distutils/command/build_ext.py", line 528, in build_extension
        objects = self.compiler.compile(sources,
      File "/home/kipp/.pyenv/versions/3.8.10/envs/rocm_demo/lib/python3.8/site-packages/torch/utils/cpp_extension.py", line 530, in unix_wrap_ninja_compile
        _write_ninja_file_and_compile_objects(
      File "/home/kipp/.pyenv/versions/3.8.10/envs/rocm_demo/lib/python3.8/site-packages/torch/utils/cpp_extension.py", line 1355, in _write_ninja_file_and_compile_objects
        _run_ninja_build(
      File "/home/kipp/.pyenv/versions/3.8.10/envs/rocm_demo/lib/python3.8/site-packages/torch/utils/cpp_extension.py", line 1682, in _run_ninja_build
        raise RuntimeError(message) from e
    RuntimeError: Error compiling objects for extension
    ----------------------------------------
```ERROR: Command errored out with exit status 1: /home/kipp/.pyenv/versions/3.8.10/envs/rocm_demo/bin/python3.8 -u -c 'import io, os, sys, setuptools, tokenize; sys.argv[0] = '"'"'/tmp/pip-req-build-o2jpn4eq/setup.py'"'"'; __file__='"'"'/tmp/pip-req-build-o2jpn4eq/setup.py'"'"';f = getattr(tokenize, '"'"'open'"'"', open)(__file__) if os.path.exists(__file__) else io.StringIO('"'"'from setuptools import setup; setup()'"'"');code = f.read().replace('"'"'\r\n'"'"', '"'"'\n'"'"');f.close();exec(compile(code, __file__, '"'"'exec'"'"'))' install --record /tmp/pip-record-ntb5z461/install-record.txt --single-version-externally-managed --compile --install-headers /home/kipp/.pyenv/versions/3.8.10/envs/rocm_demo/include/site/python3.8/torchvision Check the logs for full command output.```
abedidev commented 3 years ago

Unfortunately, It doesn't work. I tried stable and nightly, I installed it using pip3. The stable version (of the torchvision) can not see GPU. The nightly version (of the torchvision) can see GPU. But, during training, I get different errors. I can not use any pre-trained model, and other errors like this:

MIOpen Error: /MIOpen/src/sqlite_db.cpp:107: open memvfs: unable to open database file Traceback (most recent call last): File "/home/abediee/TCN/main.py", line 137, in main() File "/home/abediee/TCN/main.py", line 128, in main train(args, model, device, train_loader, optimizer, epoch) File "/home/abediee/TCN/main.py", line 44, in train loss.backward() File "/home/abediee/anaconda3/envs/abedi/lib/python3.9/site-packages/torch/_tensor.py", line 256, in backward torch.autograd.backward(self, gradient, retain_graph, create_graph, inputs=inputs) File "/home/abediee/anaconda3/envs/abedi/lib/python3.9/site-packages/torch/autograd/init.py", line 147, in backward Variable._execution_engine.run_backward( RuntimeError: miopenStatusInternalError

I installed the nightly version in a Conda virtual environment using pip.

abedidev commented 3 years ago

It is another error:

MIOpen Error: /MIOpen/src/sqlite_db.cpp:98: Unknown database: /opt/rocm/miopen/share/miopen/db/gfx90660.kdb in internal file cache Traceback (most recent call last): File "main.py", line 133, in outputs = model(inputs).squeeze() File "/home/abediee/anaconda3/lib/python3.8/site-packages/torch/nn/modules/module.py", line 1051, in _call_impl return forward_call(*input, *kwargs) File "/home/abediee/driver/DAD/C3D.py", line 41, in forward h = self.relu(self.conv1(x)) File "/home/abediee/anaconda3/lib/python3.8/site-packages/torch/nn/modules/module.py", line 1051, in _call_impl return forward_call(input, **kwargs) File "/home/abediee/anaconda3/lib/python3.8/site-packages/torch/nn/modules/conv.py", line 587, in forward return self._conv_forward(input, self.weight, self.bias) File "/home/abediee/anaconda3/lib/python3.8/site-packages/torch/nn/modules/conv.py", line 582, in _conv_forward return F.conv3d( RuntimeError: miopenStatusInternalError