lamikr / rocm_sdk_builder

Other
136 stars 13 forks source link

build failed: aortion #178

Open colertyui opened 1 week ago

colertyui commented 1 week ago

the operation runs normaly up until a certin part where it freezes and either crashes the terminal or reboots the entire system, happens by just running ./babs.sh -b on my machine.

[  0%] Generating venv/lib/python3.11/site-packages/triton.egg-link
cd /media/phlq/HDD/rocm_sdk_builder/src_projects/aotriton/third_party/triton/python && /usr/bin/cmake -E env VIRTUAL_ENV=/media/phlq/HDD/rocm_sdk_builder/builddir/038_aotriton/venv TRITON_USE_ROCM=ON ROCM_DEFAULT_DIR=/opt/rocm_sdk_612 MLIR_ENABLE_DUMP=1 LLVM_IR_ENABLE_DUMP=1 AMDGCN_ENABLE_DUMP=1 TRITON_BUILD_DIR=/media/phlq/HDD/rocm_sdk_builder/builddir/038_aotriton/triton_build /media/phlq/HDD/rocm_sdk_builder/builddir/038_aotriton/venv/bin/python setup.py develop
running develop
/media/phlq/HDD/rocm_sdk_builder/builddir/038_aotriton/venv/lib/python3.11/site-packages/setuptools/command/easy_install.py:144: EasyInstallDeprecationWarning: easy_install command is deprecated. Use build and pip and other standards-based tools.
  warnings.warn(
running egg_info
writing triton.egg-info/PKG-INFO
writing dependency_links to triton.egg-info/dependency_links.txt
writing entry points to triton.egg-info/entry_points.txt
writing requirements to triton.egg-info/requires.txt
writing top-level names to triton.egg-info/top_level.txt
reading manifest file 'triton.egg-info/SOURCES.txt'
reading manifest template 'MANIFEST.in'
writing manifest file 'triton.egg-info/SOURCES.txt'
running build_ext
Re-run cmake no build system arguments
CMake Deprecation Warning at CMakeLists.txt:6 (cmake_policy):
  The OLD behavior for policy CMP0116 will be removed from a future version
  of CMake.

  The cmake-policies(7) manual explains that the OLD behaviors of all
  policies are deprecated and that a policy should be set to OLD only under
  specific short-term circumstances.  Projects should be ported to the NEW
  behavior and not rely on setting a policy to OLD.

-- TRITON_USE_ROCM: ON
-- ROCM_DEFAULT_DIR: /opt/rocm_sdk_612
-- MLIR_ENABLE_DUMP: 1
-- LLVM_IR_ENABLE_DUMP: 1
-- AMDGCN_ENABLE_DUMP: 1
-- Adding Python module
-- Triton backends tuple: amd
-- Configuring done (0.1s)
-- Generating done (0.1s)
-- Build files have been written to: /media/phlq/HDD/rocm_sdk_builder/src_projects/aotriton/third_party/triton/python/build/cmake.linux-x86_64-cpython-3.11
Change Dir: '/media/phlq/HDD/rocm_sdk_builder/src_projects/aotriton/third_party/triton/python/build/cmake.linux-x86_64-cpython-3.11'

Run Build Command(s): /usr/bin/ninja -v -j 16
ninja: error: stat(/root/.triton/llvm/llvm-657ec732-ubuntu-x64/include/mlir/IR/AttrTypeBase.td): Permission denied

Traceback (most recent call last):
  File "/media/phlq/HDD/rocm_sdk_builder/src_projects/aotriton/third_party/triton/python/setup.py", line 612, in <module>
    setup(
  File "/media/phlq/HDD/rocm_sdk_builder/builddir/038_aotriton/venv/lib/python3.11/site-packages/setuptools/__init__.py", line 87, in setup
    return distutils.core.setup(**attrs)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/media/phlq/HDD/rocm_sdk_builder/builddir/038_aotriton/venv/lib/python3.11/site-packages/setuptools/_distutils/core.py", line 185, in setup
    return run_commands(dist)
           ^^^^^^^^^^^^^^^^^^
  File "/media/phlq/HDD/rocm_sdk_builder/builddir/038_aotriton/venv/lib/python3.11/site-packages/setuptools/_distutils/core.py", line 201, in run_commands
    dist.run_commands()
  File "/media/phlq/HDD/rocm_sdk_builder/builddir/038_aotriton/venv/lib/python3.11/site-packages/setuptools/_distutils/dist.py", line 968, in run_commands
    self.run_command(cmd)
  File "/media/phlq/HDD/rocm_sdk_builder/builddir/038_aotriton/venv/lib/python3.11/site-packages/setuptools/dist.py", line 1217, in run_command
    super().run_command(command)
  File "/media/phlq/HDD/rocm_sdk_builder/builddir/038_aotriton/venv/lib/python3.11/site-packages/setuptools/_distutils/dist.py", line 987, in run_command
    cmd_obj.run()
  File "/media/phlq/HDD/rocm_sdk_builder/src_projects/aotriton/third_party/triton/python/setup.py", line 555, in run
    develop.run(self)
  File "/media/phlq/HDD/rocm_sdk_builder/builddir/038_aotriton/venv/lib/python3.11/site-packages/setuptools/command/develop.py", line 34, in run
    self.install_for_development()
  File "/media/phlq/HDD/rocm_sdk_builder/builddir/038_aotriton/venv/lib/python3.11/site-packages/setuptools/command/develop.py", line 114, in install_for_development
    self.run_command('build_ext')
  File "/media/phlq/HDD/rocm_sdk_builder/builddir/038_aotriton/venv/lib/python3.11/site-packages/setuptools/_distutils/cmd.py", line 319, in run_command
    self.distribution.run_command(command)
  File "/media/phlq/HDD/rocm_sdk_builder/builddir/038_aotriton/venv/lib/python3.11/site-packages/setuptools/dist.py", line 1217, in run_command
    super().run_command(command)
  File "/media/phlq/HDD/rocm_sdk_builder/builddir/038_aotriton/venv/lib/python3.11/site-packages/setuptools/_distutils/dist.py", line 987, in run_command
    cmd_obj.run()
  File "/media/phlq/HDD/rocm_sdk_builder/src_projects/aotriton/third_party/triton/python/setup.py", line 341, in run
    self.build_extension(ext)
  File "/media/phlq/HDD/rocm_sdk_builder/src_projects/aotriton/third_party/triton/python/setup.py", line 456, in build_extension
    subprocess.check_call(["cmake", "--build", "."] + build_args, cwd=cmake_dir)
  File "/opt/rocm_sdk_612/lib/python3.11/subprocess.py", line 413, in check_call
    raise CalledProcessError(retcode, cmd)
subprocess.CalledProcessError: Command '['cmake', '--build', '.', '--config', 'TritonRelBuildWithAsserts', '-j16']' returned non-zero exit status 1.
make[2]: *** [CMakeFiles/aotriton_venv_triton.dir/build.make:73: venv/lib/python3.11/site-packages/triton.egg-link] Error 1
make[2]: Leaving directory '/media/phlq/HDD/rocm_sdk_builder/builddir/038_aotriton'
make[1]: *** [CMakeFiles/Makefile2:139: CMakeFiles/aotriton_venv_triton.dir/all] Error 2
make[1]: Leaving directory '/media/phlq/HDD/rocm_sdk_builder/builddir/038_aotriton'
make: *** [Makefile:136: all] Error 2
build failed: aotriton

I'm on Ubuntu Ubuntu 24.04.1 LTS, my hardware is: ASUS TUF GAMING B550M-PLUS; AMD Ryzen™ 7 5800X3D × 16; AMD Radeon™ RX 5700; I apologize in advance if Im being an idiot about something, I'm really new to coding in general, so Im trying my beast to learn from my mistakes, if the information porivided is insufficient, please, just tell me what else I need to provide and I will do so to the best of my capacities, I did try apt install libzstd-dev, didn`t change the outcome

lamikr commented 4 days ago

Hi, sorry for the delay and thank you for your report. I am wondering is your system running out of memory. Do you have only the gfx1010 selected in your build_cfg.user file? Can you also tell how much RAM memory you have on your system?

colertyui commented 4 days ago

Hi, sorry for the delay and thank you for your report. I am wondering is your system running out of memory. Do you have only the gfx1010 selected in your build_cfg.user file? Can you also tell how much RAM memory you have on your system?

Hello, I only have the gfx1010 selected in my file, and I have one 16 gb 3000MHz, DDR4, CL16 stick of ram, hoped that would be enough, but if absolutely necessary, I can buy another one and run 32gb, I'm just working on a relatively tight budget, also, in case it is relevant, I'm storing all the code in a 1TB HDD, thank you very much for the assistance

lamikr commented 23 hours ago

Can you try if it helps to set the MAX_JOBS variable to 8 cpus. Aotriton/triton will check if that is specified and then it should limit the build only to 8 from your 16 cpus. So try to run these commands:

rm -rf builddir/038_aotriton export MAX_JOBS=8 ./babs.sh -b

I am not fully sure whether this helps. but at least the code in theory the src_projects/aotriton/third_party/triton/python/setup.py checks if that is defined and then it should limit the cpu-usage to that number...