wuxxin / aur-packages

archlinux AUR packages i maintain
1 stars 0 forks source link

python-torchvision-rocm "Error compiling objects for extension" #13

Closed semilin closed 2 months ago

semilin commented 3 months ago

This error occurs when running makepkg -si.

ninja: build stopped: subcommand failed.
Traceback (most recent call last):
  File "/usr/lib/python3.12/site-packages/torch/utils/cpp_extension.py", line 2107, in _run_ninja_build
    subprocess.run(
  File "/usr/lib/python3.12/subprocess.py", line 571, in run
    raise CalledProcessError(retcode, process.args,
subprocess.CalledProcessError: Command '['ninja', '-v']' returned non-zero exit status 1.

The above exception was the direct cause of the following exception:

Traceback (most recent call last):
  File "/home/semi/code/src/python-torchvision-rocm/src/vision-0.18.0/setup.py", line 545, in <module>
    setup(
  File "/usr/lib/python3.12/site-packages/setuptools/__init__.py", line 103, in setup
    return distutils.core.setup(**attrs)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/usr/lib/python3.12/site-packages/setuptools/_distutils/core.py", line 185, in setup
    return run_commands(dist)
           ^^^^^^^^^^^^^^^^^^
  File "/usr/lib/python3.12/site-packages/setuptools/_distutils/core.py", line 201, in run_commands
    dist.run_commands()
  File "/usr/lib/python3.12/site-packages/setuptools/_distutils/dist.py", line 969, in run_commands
    self.run_command(cmd)
  File "/usr/lib/python3.12/site-packages/setuptools/dist.py", line 963, in run_command
    super().run_command(command)
  File "/usr/lib/python3.12/site-packages/setuptools/_distutils/dist.py", line 988, in run_command
    cmd_obj.run()
  File "/usr/lib/python3.12/site-packages/setuptools/_distutils/command/build.py", line 131, in run
    self.run_command(cmd_name)
  File "/usr/lib/python3.12/site-packages/setuptools/_distutils/cmd.py", line 318, in run_command
    self.distribution.run_command(command)
  File "/usr/lib/python3.12/site-packages/setuptools/dist.py", line 963, in run_command
    super().run_command(command)
  File "/usr/lib/python3.12/site-packages/setuptools/_distutils/dist.py", line 988, in run_command
    cmd_obj.run()
  File "/usr/lib/python3.12/site-packages/setuptools/command/build_ext.py", line 88, in run
    _build_ext.run(self)
  File "/usr/lib/python3.12/site-packages/setuptools/_distutils/command/build_ext.py", line 345, in run
    self.build_extensions()
  File "/usr/lib/python3.12/site-packages/torch/utils/cpp_extension.py", line 870, in build_extensions
    build_ext.build_extensions(self)
  File "/usr/lib/python3.12/site-packages/setuptools/_distutils/command/build_ext.py", line 467, in build_extensions
    self._build_extensions_serial()
  File "/usr/lib/python3.12/site-packages/setuptools/_distutils/command/build_ext.py", line 493, in _build_extensions_serial
    self.build_extension(ext)
  File "/usr/lib/python3.12/site-packages/setuptools/command/build_ext.py", line 249, in build_extension
    _build_ext.build_extension(self, ext)
  File "/usr/lib/python3.12/site-packages/setuptools/_distutils/command/build_ext.py", line 548, in build_extension
    objects = self.compiler.compile(
              ^^^^^^^^^^^^^^^^^^^^^^
  File "/usr/lib/python3.12/site-packages/torch/utils/cpp_extension.py", line 683, in unix_wrap_ninja_compile
    _write_ninja_file_and_compile_objects(
  File "/usr/lib/python3.12/site-packages/torch/utils/cpp_extension.py", line 1783, in _write_ninja_file_and_compile_objects
    _run_ninja_build(
  File "/usr/lib/python3.12/site-packages/torch/utils/cpp_extension.py", line 2123, in _run_ninja_build
    raise RuntimeError(message) from e
RuntimeError: Error compiling objects for extension
==> ERROR: A failure occurred in build().
    Aborting...

I would try to debug this myself, but honestly this is so cryptic I'm not sure where to begin. System info:

/opt/rocm/bin/rocminfo | grep -E "(Name|ID):"
export | grep -E \
  "(GPU_TARGETS|AMDGPU_TARGETS|PYTORCH_ROCM_ARCH|HSA_OVERRIDE_GFX_VERSION|ROCR_VISIBLE_DE
VICES)"
python -c 'import torch.version as v; \
  print("torch: {}\nrocm: {}\n".format(v.git_version, v.hip))'

  Name:                    AMD Ryzen 7 5700G with Radeon Graphics
  Marketing Name:          AMD Ryzen 7 5700G with Radeon Graphics
  Vendor Name:             CPU                                
  Chip ID:                 0(0x0)                             
  BDFID:                   0                                  
  Internal Node ID:        0                                  
  Name:                    gfx1030                            
  Marketing Name:          AMD Radeon RX 6700 XT              
  Vendor Name:             AMD                                
  Chip ID:                 29663(0x73df)                      
  BDFID:                   768                                
  Internal Node ID:        1                                  
      Name:                    amdgcn-amd-amdhsa--gfx1030         
HSA_OVERRIDE_GFX_VERSION 10.3.0
PYTORCH_ROCM_ARCH gfx1030
torch: 63d5e9221bedd1546b7d364b5ce4171547db12a9
rocm: 6.0.32831-
xenedium commented 3 months ago

Hello, sadly got the same issue, here's my system info

  Name:                    AMD Ryzen 5 3600 6-Core Processor
  Marketing Name:          AMD Ryzen 5 3600 6-Core Processor
  Vendor Name:             CPU
  Chip ID:                 0(0x0)
  BDFID:                   0
  Internal Node ID:        0
  Name:                    gfx1102
  Marketing Name:          AMD Radeon RX 7600
  Vendor Name:             AMD
  Chip ID:                 29824(0x7480)
  BDFID:                   2304
  Internal Node ID:        1
      Name:                    amdgcn-amd-amdhsa--gfx1102
Exploder98 commented 3 months ago

I ran the build with paru and set MAX_JOBS=1 to get more legible output. I got the following error message:

FAILED: /home/username/.cache/paru/clone/python-torchvision-rocm/src/vision-0.18.0/build/temp.linux-x86_64-cpython-312/home/username/.cache/paru/clone/python-torchvision-rocm/src/vision-0.18.0/torchvision/csrc/io/decoder/audio_sampler.o 
c++ -MMD -MF /home/username/.cache/paru/clone/python-torchvision-rocm/src/vision-0.18.0/build/temp.linux-x86_64-cpython-312/home/username/.cache/paru/clone/python-torchvision-rocm/src/vision-0.18.0/torchvision/csrc/io/decoder/audio_sampler.o.d -fno-strict-overflow -DNDEBUG -g -O3 -Wall -march=x86-64 -mtune=generic -O3 -pipe -fno-plt -fexceptions -Wp,-D_FORTIFY_SOURCE=3 -Wformat -Werror=format-security -fstack-clash-protection -fcf-protection -fno-omit-frame-pointer -mno-omit-leaf-frame-pointer -g -ffile-prefix-map=/build/python/src=/usr/src/debug/python -flto=auto -ffat-lto-objects -march=x86-64 -mtune=generic -O3 -pipe -fno-plt -fexceptions -Wp,-D_FORTIFY_SOURCE=3 -Wformat -Werror=format-security -fstack-clash-protection -fcf-protection -fno-omit-frame-pointer -mno-omit-leaf-frame-pointer -g -ffile-prefix-map=/build/python/src=/usr/src/debug/python -flto=auto -march=x86-64 -mtune=generic -O3 -pipe -fno-plt -fexceptions -Wp,-D_FORTIFY_SOURCE=3 -Wformat -Werror=format-security -fstack-clash-protection -fcf-protection -fno-omit-frame-pointer -mno-omit-leaf-frame-pointer -g -ffile-prefix-map=/build/python/src=/usr/src/debug/python -flto=auto -march=x86-64 -mtune=generic -O2 -pipe -fno-plt -fexceptions -Wp,-D_FORTIFY_SOURCE=3 -Wformat -Werror=format-security -fstack-clash-protection -fcf-protection -fno-omit-frame-pointer -mno-omit-leaf-frame-pointer -fPIC -DGLOG_USE_GLOG_EXPORT -I/home/username/.cache/paru/clone/python-torchvision-rocm/src/vision-0.18.0/torchvision/csrc/io/decoder -I/home/username/.cache/paru/clone/python-torchvision-rocm/src/vision-0.18.0/torchvision/csrc/io/video_reader -I/home/username/.cache/paru/clone/python-torchvision-rocm/src/vision-0.18.0/torchvision/csrc/io/video -I/home/username/.cache/paru/clone/python-torchvision-rocm/src/vision-0.18.0/torchvision/csrc -I/usr/include -I/home/username/.cache/paru/clone/python-torchvision-rocm/src/vision-0.18.0/torchvision/csrc -I/usr/lib/python3.12/site-packages/torch/include -I/usr/lib/python3.12/site-packages/torch/include/torch/csrc/api/include -I/usr/lib/python3.12/site-packages/torch/include/TH -I/usr/lib/python3.12/site-packages/torch/include/THC -I/usr/lib/python3.12/site-packages/torch/include/THH -I/opt/rocm/include -I/home/username/.cache/paru/clone/python-torchvision-rocm/src -I/usr/lib/python3.12/site-packages/torch/include -I/usr/lib/python3.12/site-packages/torch/include/torch/csrc/api/include -I/usr/lib/python3.12/site-packages/torch/include/TH -I/usr/lib/python3.12/site-packages/torch/include/THC -I/usr/include/python3.12 -c -c /home/username/.cache/paru/clone/python-torchvision-rocm/src/vision-0.18.0/torchvision/csrc/io/decoder/audio_sampler.cpp -o /home/username/.cache/paru/clone/python-torchvision-rocm/src/vision-0.18.0/build/temp.linux-x86_64-cpython-312/home/username/.cache/paru/clone/python-torchvision-rocm/src/vision-0.18.0/torchvision/csrc/io/decoder/audio_sampler.o -fPIC -D__HIP_PLATFORM_AMD__=1 -DUSE_ROCM=1 -DHIPBLAS_V2 -fPIC -D__HIP_PLATFORM_AMD__=1 -DUSE_ROCM=1 -DHIPBLAS_V2 -std=c++17 -DTORCH_API_INCLUDE_EXTENSION_H '-DPYBIND11_COMPILER_TYPE="_gcc"' '-DPYBIND11_STDLIB="_libstdcpp"' '-DPYBIND11_BUILD_ABI="_cxxabi1018"' -DTORCH_EXTENSION_NAME=video_reader -D_GLIBCXX_USE_CXX11_ABI=1
/home/username/.cache/paru/clone/python-torchvision-rocm/src/vision-0.18.0/torchvision/csrc/io/decoder/audio_sampler.cpp: In member function ‘virtual bool ffmpeg::AudioSampler::init(const ffmpeg::SamplerParameters&)’:
/home/username/.cache/paru/clone/python-torchvision-rocm/src/vision-0.18.0/torchvision/csrc/io/decoder/audio_sampler.cpp:53:7: error: ‘av_get_default_channel_layout’ was not declared in this scope
   53 |       av_get_default_channel_layout(params.out.audio.channels),
      |       ^~~~~~~~~~~~~~~~~~~~~~~~~~~~~
/home/username/.cache/paru/clone/python-torchvision-rocm/src/vision-0.18.0/torchvision/csrc/io/decoder/audio_sampler.cpp:51:17: error: ‘swr_alloc_set_opts’ was not declared in this scope; did you mean ‘swr_alloc_set_opts2’?
   51 |   swrContext_ = swr_alloc_set_opts(
      |                 ^~~~~~~~~~~~~~~~~~
      |                 swr_alloc_set_opts2
ninja: build stopped: subcommand failed.
Traceback (most recent call last):
  File "/usr/lib/python3.12/site-packages/torch/utils/cpp_extension.py", line 2107, in _run_ninja_build
    subprocess.run(
  File "/usr/lib/python3.12/subprocess.py", line 571, in run
    raise CalledProcessError(retcode, process.args,
subprocess.CalledProcessError: Command '['ninja', '-v', '-j', '1']' returned non-zero exit status 1.

The above exception was the direct cause of the following exception:

Traceback (most recent call last):
  File "/home/username/.cache/paru/clone/python-torchvision-rocm/src/vision-0.18.0/setup.py", line 545, in <module>
    setup(
  File "/usr/lib/python3.12/site-packages/setuptools/__init__.py", line 103, in setup
    return distutils.core.setup(**attrs)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/usr/lib/python3.12/site-packages/setuptools/_distutils/core.py", line 185, in setup
    return run_commands(dist)
           ^^^^^^^^^^^^^^^^^^
  File "/usr/lib/python3.12/site-packages/setuptools/_distutils/core.py", line 201, in run_commands
    dist.run_commands()
  File "/usr/lib/python3.12/site-packages/setuptools/_distutils/dist.py", line 969, in run_commands
    self.run_command(cmd)
  File "/usr/lib/python3.12/site-packages/setuptools/dist.py", line 963, in run_command
    super().run_command(command)
  File "/usr/lib/python3.12/site-packages/setuptools/_distutils/dist.py", line 988, in run_command
    cmd_obj.run()
  File "/usr/lib/python3.12/site-packages/setuptools/_distutils/command/build.py", line 131, in run
    self.run_command(cmd_name)
  File "/usr/lib/python3.12/site-packages/setuptools/_distutils/cmd.py", line 318, in run_command
    self.distribution.run_command(command)
  File "/usr/lib/python3.12/site-packages/setuptools/dist.py", line 963, in run_command
    super().run_command(command)
  File "/usr/lib/python3.12/site-packages/setuptools/_distutils/dist.py", line 988, in run_command
    cmd_obj.run()
  File "/usr/lib/python3.12/site-packages/setuptools/command/build_ext.py", line 88, in run
    _build_ext.run(self)
  File "/usr/lib/python3.12/site-packages/setuptools/_distutils/command/build_ext.py", line 345, in run
    self.build_extensions()
  File "/usr/lib/python3.12/site-packages/torch/utils/cpp_extension.py", line 870, in build_extensions
    build_ext.build_extensions(self)
  File "/usr/lib/python3.12/site-packages/setuptools/_distutils/command/build_ext.py", line 467, in build_extensions
    self._build_extensions_serial()
  File "/usr/lib/python3.12/site-packages/setuptools/_distutils/command/build_ext.py", line 493, in _build_extensions_serial
    self.build_extension(ext)
  File "/usr/lib/python3.12/site-packages/setuptools/command/build_ext.py", line 249, in build_extension
    _build_ext.build_extension(self, ext)
  File "/usr/lib/python3.12/site-packages/setuptools/_distutils/command/build_ext.py", line 548, in build_extension
    objects = self.compiler.compile(
              ^^^^^^^^^^^^^^^^^^^^^^
  File "/usr/lib/python3.12/site-packages/torch/utils/cpp_extension.py", line 683, in unix_wrap_ninja_compile
    _write_ninja_file_and_compile_objects(
  File "/usr/lib/python3.12/site-packages/torch/utils/cpp_extension.py", line 1783, in _write_ninja_file_and_compile_objects
    _run_ninja_build(
  File "/usr/lib/python3.12/site-packages/torch/utils/cpp_extension.py", line 2123, in _run_ninja_build
    raise RuntimeError(message) from e
RuntimeError: Error compiling objects for extension

I also tried to build a clean clone (from AUR) with makepkg -si, and got essentially the same error.

I guess this is due to some changes in FFMPEG 7.0. Looking at torchvision commit history, it seems that they have fixed the compile in main (https://github.com/pytorch/vision/commit/375cfdf5675d0b6dbebe4d49ffcd0a6d50081715), but the fixes are not included in any released version yet.

Pi000tr commented 3 months ago

Same problem

Name:                    AMD Ryzen 7 5700X 8-Core Processor
  Marketing Name:          AMD Ryzen 7 5700X 8-Core Processor
  Vendor Name:             CPU
  Chip ID:                 0(0x0)
  BDFID:                   0
  Internal Node ID:        0
  Name:                    gfx1100
  Marketing Name:          AMD Radeon RX 7900 GRE
  Vendor Name:             AMD
  Chip ID:                 29772(0x744c)
  BDFID:                   11520
  Internal Node ID:        1
      Name:                    amdgcn-amd-amdhsa--gfx1100
grep: Unmatched ( or \(
torch: 63d5e9221bedd1546b7d364b5ce4171547db12a9
rocm: 6.0.32831-
wuxxin commented 3 months ago

ty for the reports and for the ffmpeg compat. issue @Exploder98 , i bumped the minor to the newer packet and added an ffmpeg patch, it builds on my machine, ymmv

semilin commented 3 months ago

Perfect, it now builds and runs on my machine as well. Thanks for taking care of this.

xenedium commented 2 months ago

Hello, builds and runs perfectly now thank you