Closed mritunjaymusale closed 1 month ago
I think I found the reason and the source but idk how to patch it. The file in the screenshot seems to be allocating double the number of cpu threads which causes the ram usage to spike and make the system unusable.
so the fix for me was to remove the 2*
and run the build again and aotriton build continued without problems.
Thanks for the finding, I think that would be a good patch to have, If you want you could create a patch from that change. Procedure would be something like following
git add third_party/triton/python/setup.py
git commit -s -m "your commit message"
git format-patch upstream/master
mv 00* ../../patches/rocm-6.1.1/aotriton/
cd rocm_sdk_builder
git add patches/rocm-6.1.1/aotriton/*
git commit -s -m "added aotriton patch for cpu fix"
then push the change to your github fork on some branch and create merge request
another thing I would like to point out libaio-dev
is the incorrect name for fedora dependency it's named as libaio-devel
i'll add this to the patch aswell, later today.
dispite adding this package the build for deepspeed fails, but that would be a topic for seperate issue.
Thank you for the aotriton patch, it's now applied.
About the libaio. On ubuntu the package is called "libaio-dev" and on Fedora it's "libaio-devel" and I think that is ok on the sdk builders install_debs.sh script. Are you referring to some other location?
And for the DeepSpeed, could it be a read-write access problem to /dev/kfd driver. See issue:
Ignore the libaio-devel
issue that was my bad, as for the deepspeed I am opening another issue where this can be discussed.
aotriton build fails again this time with different error :
make[3]: Entering directory '/mnt/Data/programming-stuff/python/Deep_learning/rocm_sdk_builder/builddir/038_02_aotriton/v2src'
make[3]: warning: -j12 forced in submake: resetting jobserver mode.
LD_PRELOAD=/opt/rocm_sdk_611/lib64/libhsa-runtime64.so /mnt/Data/programming-stuff/python/Deep_learning/rocm_sdk_builder/src_projects/aotriton/v2python/compile.py /mnt/Data/programming-stuff/python/Deep_learning/rocm_sdk_builder/src_projects/aotriton/tritonsrc/flash.py --kernel_name attn_fwd -o /mnt/Data/programming-stuff/python/Deep_learning/rocm_sdk_builder/builddir/038_02_aotriton/v2src/flash/gpu_kernel_image.attn_fwd/attn_fwd-Sig-F__^fp16@16_False_16_False_False_False_0__P__128_64_1__CO__warp4_stg1_wave1-Gpu-MI300X.hsaco -g 1,1,1 --num_warps 4 --num_stages 1 --waves_per_eu 1 --target 'MI300X' --signature '*fp16:16, *fp16:16, *fp16:16, *fp16:16, fp32, *fp32:16, *fp16:16, u64:16, u64:16, u64:16, 1, u64:16, u64:16, u64:16, 1, u64:16, u64:16, u64:16, 1, u64:16, u64:16, u64:16, 1, u64:16, u64:16, u64:16, 1, i32, i32, u64, fp32, u64, u32, *fp16:16, False, 128, 16, 64, 1, False, False, False, 0'
LD_PRELOAD=/opt/rocm_sdk_611/lib64/libhsa-runtime64.so /mnt/Data/programming-stuff/python/Deep_learning/rocm_sdk_builder/src_projects/aotriton/v2python/compile.py /mnt/Data/programming-stuff/python/Deep_learning/rocm_sdk_builder/src_projects/aotriton/tritonsrc/flash.py --kernel_name attn_fwd -o /mnt/Data/programming-stuff/python/Deep_learning/rocm_sdk_builder/builddir/038_02_aotriton/v2src/flash/gpu_kernel_image.attn_fwd/attn_fwd-Sig-F__^fp16@16_False_16_False_False_False_0__P__128_64_0__CO__warp4_stg1_wave2-Gpu-MI300X.hsaco -g 1,1,1 --num_warps 4 --num_stages 1 --waves_per_eu 2 --target 'MI300X' --signature '*fp16:16, *fp16:16, *fp16:16, *fp16:16, fp32, *fp32:16, *fp16:16, u64:16, u64:16, u64:16, 1, u64:16, u64:16, u64:16, 1, u64:16, u64:16, u64:16, 1, u64:16, u64:16, u64:16, 1, u64:16, u64:16, u64:16, 1, i32, i32, u64, fp32, u64, u32, *fp16:16, False, 128, 16, 64, 0, False, False, False, 0'
LD_PRELOAD=/opt/rocm_sdk_611/lib64/libhsa-runtime64.so /mnt/Data/programming-stuff/python/Deep_learning/rocm_sdk_builder/src_projects/aotriton/v2python/compile.py /mnt/Data/programming-stuff/python/Deep_learning/rocm_sdk_builder/src_projects/aotriton/tritonsrc/flash.py --kernel_name attn_fwd -o /mnt/Data/programming-stuff/python/Deep_learning/rocm_sdk_builder/builddir/038_02_aotriton/v2src/flash/gpu_kernel_image.attn_fwd/attn_fwd-Sig-F__^fp16@16_False_16_False_False_False_0__P__128_64_0__CO__warp4_stg1_wave1-Gpu-MI300X.hsaco -g 1,1,1 --num_warps 4 --num_stages 1 --waves_per_eu 1 --target 'MI300X' --signature '*fp16:16, *fp16:16, *fp16:16, *fp16:16, fp32, *fp32:16, *fp16:16, u64:16, u64:16, u64:16, 1, u64:16, u64:16, u64:16, 1, u64:16, u64:16, u64:16, 1, u64:16, u64:16, u64:16, 1, u64:16, u64:16, u64:16, 1, i32, i32, u64, fp32, u64, u32, *fp16:16, False, 128, 16, 64, 0, False, False, False, 0'
LD_PRELOAD=/opt/rocm_sdk_611/lib64/libhsa-runtime64.so /mnt/Data/programming-stuff/python/Deep_learning/rocm_sdk_builder/src_projects/aotriton/v2python/compile.py /mnt/Data/programming-stuff/python/Deep_learning/rocm_sdk_builder/src_projects/aotriton/tritonsrc/flash.py --kernel_name attn_fwd -o /mnt/Data/programming-stuff/python/Deep_learning/rocm_sdk_builder/builddir/038_02_aotriton/v2src/flash/gpu_kernel_image.attn_fwd/attn_fwd-Sig-F__^fp16@16_False_16_False_False_False_0__P__128_64_1__CO__warp4_stg1_wave2-Gpu-MI300X.hsaco -g 1,1,1 --num_warps 4 --num_stages 1 --waves_per_eu 2 --target 'MI300X' --signature '*fp16:16, *fp16:16, *fp16:16, *fp16:16, fp32, *fp32:16, *fp16:16, u64:16, u64:16, u64:16, 1, u64:16, u64:16, u64:16, 1, u64:16, u64:16, u64:16, 1, u64:16, u64:16, u64:16, 1, u64:16, u64:16, u64:16, 1, i32, i32, u64, fp32, u64, u32, *fp16:16, False, 128, 16, 64, 1, False, False, False, 0'
LD_PRELOAD=/opt/rocm_sdk_611/lib64/libhsa-runtime64.so /mnt/Data/programming-stuff/python/Deep_learning/rocm_sdk_builder/src_projects/aotriton/v2python/compile.py /mnt/Data/programming-stuff/python/Deep_learning/rocm_sdk_builder/src_projects/aotriton/tritonsrc/flash.py --kernel_name attn_fwd -o /mnt/Data/programming-stuff/python/Deep_learning/rocm_sdk_builder/builddir/038_02_aotriton/v2src/flash/gpu_kernel_image.attn_fwd/attn_fwd-Sig-F__^fp16@16_False_16_False_False_False_0__P__128_64_0__CO__warp4_stg1_wave0-Gpu-MI300X.hsaco -g 1,1,1 --num_warps 4 --num_stages 1 --waves_per_eu 0 --target 'MI300X' --signature '*fp16:16, *fp16:16, *fp16:16, *fp16:16, fp32, *fp32:16, *fp16:16, u64:16, u64:16, u64:16, 1, u64:16, u64:16, u64:16, 1, u64:16, u64:16, u64:16, 1, u64:16, u64:16, u64:16, 1, u64:16, u64:16, u64:16, 1, i32, i32, u64, fp32, u64, u32, *fp16:16, False, 128, 16, 64, 0, False, False, False, 0'
LD_PRELOAD=/opt/rocm_sdk_611/lib64/libhsa-runtime64.so /mnt/Data/programming-stuff/python/Deep_learning/rocm_sdk_builder/src_projects/aotriton/v2python/compile.py /mnt/Data/programming-stuff/python/Deep_learning/rocm_sdk_builder/src_projects/aotriton/tritonsrc/flash.py --kernel_name attn_fwd -o /mnt/Data/programming-stuff/python/Deep_learning/rocm_sdk_builder/builddir/038_02_aotriton/v2src/flash/gpu_kernel_image.attn_fwd/attn_fwd-Sig-F__^fp16@16_False_16_False_False_False_0__P__128_64_1__CO__warp4_stg1_wave0-Gpu-MI300X.hsaco -g 1,1,1 --num_warps 4 --num_stages 1 --waves_per_eu 0 --target 'MI300X' --signature '*fp16:16, *fp16:16, *fp16:16, *fp16:16, fp32, *fp32:16, *fp16:16, u64:16, u64:16, u64:16, 1, u64:16, u64:16, u64:16, 1, u64:16, u64:16, u64:16, 1, u64:16, u64:16, u64:16, 1, u64:16, u64:16, u64:16, 1, i32, i32, u64, fp32, u64, u32, *fp16:16, False, 128, 16, 64, 1, False, False, False, 0'
LD_PRELOAD=/opt/rocm_sdk_611/lib64/libhsa-runtime64.so /mnt/Data/programming-stuff/python/Deep_learning/rocm_sdk_builder/src_projects/aotriton/v2python/compile.py /mnt/Data/programming-stuff/python/Deep_learning/rocm_sdk_builder/src_projects/aotriton/tritonsrc/flash.py --kernel_name attn_fwd -o /mnt/Data/programming-stuff/python/Deep_learning/rocm_sdk_builder/builddir/038_02_aotriton/v2src/flash/gpu_kernel_image.attn_fwd/attn_fwd-Sig-F__^fp16@16_False_16_False_False_False_1__P__128_64_1__CO__warp4_stg1_wave1-Gpu-MI300X.hsaco -g 1,1,1 --num_warps 4 --num_stages 1 --waves_per_eu 1 --target 'MI300X' --signature '*fp16:16, *fp16:16, *fp16:16, *fp16:16, fp32, *fp32:16, *fp16:16, u64:16, u64:16, u64:16, 1, u64:16, u64:16, u64:16, 1, u64:16, u64:16, u64:16, 1, u64:16, u64:16, u64:16, 1, u64:16, u64:16, u64:16, 1, i32, i32, u64, fp32, u64, u32, *fp16:16, False, 128, 16, 64, 1, False, False, False, 1'
LD_PRELOAD=/opt/rocm_sdk_611/lib64/libhsa-runtime64.so /mnt/Data/programming-stuff/python/Deep_learning/rocm_sdk_builder/src_projects/aotriton/v2python/compile.py /mnt/Data/programming-stuff/python/Deep_learning/rocm_sdk_builder/src_projects/aotriton/tritonsrc/flash.py --kernel_name attn_fwd -o /mnt/Data/programming-stuff/python/Deep_learning/rocm_sdk_builder/builddir/038_02_aotriton/v2src/flash/gpu_kernel_image.attn_fwd/attn_fwd-Sig-F__^fp16@16_False_16_False_False_False_1__P__128_64_0__CO__warp4_stg1_wave2-Gpu-MI300X.hsaco -g 1,1,1 --num_warps 4 --num_stages 1 --waves_per_eu 2 --target 'MI300X' --signature '*fp16:16, *fp16:16, *fp16:16, *fp16:16, fp32, *fp32:16, *fp16:16, u64:16, u64:16, u64:16, 1, u64:16, u64:16, u64:16, 1, u64:16, u64:16, u64:16, 1, u64:16, u64:16, u64:16, 1, u64:16, u64:16, u64:16, 1, i32, i32, u64, fp32, u64, u32, *fp16:16, False, 128, 16, 64, 0, False, False, False, 1'
LD_PRELOAD=/opt/rocm_sdk_611/lib64/libhsa-runtime64.so /mnt/Data/programming-stuff/python/Deep_learning/rocm_sdk_builder/src_projects/aotriton/v2python/compile.py /mnt/Data/programming-stuff/python/Deep_learning/rocm_sdk_builder/src_projects/aotriton/tritonsrc/flash.py --kernel_name attn_fwd -o /mnt/Data/programming-stuff/python/Deep_learning/rocm_sdk_builder/builddir/038_02_aotriton/v2src/flash/gpu_kernel_image.attn_fwd/attn_fwd-Sig-F__^fp16@16_False_16_False_False_False_1__P__128_64_0__CO__warp4_stg1_wave1-Gpu-MI300X.hsaco -g 1,1,1 --num_warps 4 --num_stages 1 --waves_per_eu 1 --target 'MI300X' --signature '*fp16:16, *fp16:16, *fp16:16, *fp16:16, fp32, *fp32:16, *fp16:16, u64:16, u64:16, u64:16, 1, u64:16, u64:16, u64:16, 1, u64:16, u64:16, u64:16, 1, u64:16, u64:16, u64:16, 1, u64:16, u64:16, u64:16, 1, i32, i32, u64, fp32, u64, u32, *fp16:16, False, 128, 16, 64, 0, False, False, False, 1'
LD_PRELOAD=/opt/rocm_sdk_611/lib64/libhsa-runtime64.so /mnt/Data/programming-stuff/python/Deep_learning/rocm_sdk_builder/src_projects/aotriton/v2python/compile.py /mnt/Data/programming-stuff/python/Deep_learning/rocm_sdk_builder/src_projects/aotriton/tritonsrc/flash.py --kernel_name attn_fwd -o /mnt/Data/programming-stuff/python/Deep_learning/rocm_sdk_builder/builddir/038_02_aotriton/v2src/flash/gpu_kernel_image.attn_fwd/attn_fwd-Sig-F__^fp16@16_False_16_False_False_False_1__P__128_64_1__CO__warp4_stg1_wave2-Gpu-MI300X.hsaco -g 1,1,1 --num_warps 4 --num_stages 1 --waves_per_eu 2 --target 'MI300X' --signature '*fp16:16, *fp16:16, *fp16:16, *fp16:16, fp32, *fp32:16, *fp16:16, u64:16, u64:16, u64:16, 1, u64:16, u64:16, u64:16, 1, u64:16, u64:16, u64:16, 1, u64:16, u64:16, u64:16, 1, u64:16, u64:16, u64:16, 1, i32, i32, u64, fp32, u64, u32, *fp16:16, False, 128, 16, 64, 1, False, False, False, 1'
LD_PRELOAD=/opt/rocm_sdk_611/lib64/libhsa-runtime64.so /mnt/Data/programming-stuff/python/Deep_learning/rocm_sdk_builder/src_projects/aotriton/v2python/compile.py /mnt/Data/programming-stuff/python/Deep_learning/rocm_sdk_builder/src_projects/aotriton/tritonsrc/flash.py --kernel_name attn_fwd -o /mnt/Data/programming-stuff/python/Deep_learning/rocm_sdk_builder/builddir/038_02_aotriton/v2src/flash/gpu_kernel_image.attn_fwd/attn_fwd-Sig-F__^fp16@16_False_16_False_False_False_1__P__128_64_0__CO__warp4_stg1_wave0-Gpu-MI300X.hsaco -g 1,1,1 --num_warps 4 --num_stages 1 --waves_per_eu 0 --target 'MI300X' --signature '*fp16:16, *fp16:16, *fp16:16, *fp16:16, fp32, *fp32:16, *fp16:16, u64:16, u64:16, u64:16, 1, u64:16, u64:16, u64:16, 1, u64:16, u64:16, u64:16, 1, u64:16, u64:16, u64:16, 1, u64:16, u64:16, u64:16, 1, i32, i32, u64, fp32, u64, u32, *fp16:16, False, 128, 16, 64, 0, False, False, False, 1'
LD_PRELOAD=/opt/rocm_sdk_611/lib64/libhsa-runtime64.so /mnt/Data/programming-stuff/python/Deep_learning/rocm_sdk_builder/src_projects/aotriton/v2python/compile.py /mnt/Data/programming-stuff/python/Deep_learning/rocm_sdk_builder/src_projects/aotriton/tritonsrc/flash.py --kernel_name attn_fwd -o /mnt/Data/programming-stuff/python/Deep_learning/rocm_sdk_builder/builddir/038_02_aotriton/v2src/flash/gpu_kernel_image.attn_fwd/attn_fwd-Sig-F__^fp16@16_False_16_False_False_False_1__P__128_64_1__CO__warp4_stg1_wave0-Gpu-MI300X.hsaco -g 1,1,1 --num_warps 4 --num_stages 1 --waves_per_eu 0 --target 'MI300X' --signature '*fp16:16, *fp16:16, *fp16:16, *fp16:16, fp32, *fp32:16, *fp16:16, u64:16, u64:16, u64:16, 1, u64:16, u64:16, u64:16, 1, u64:16, u64:16, u64:16, 1, u64:16, u64:16, u64:16, 1, u64:16, u64:16, u64:16, 1, i32, i32, u64, fp32, u64, u32, *fp16:16, False, 128, 16, 64, 1, False, False, False, 1'
/opt/rocm_sdk_611/bin/llvm-objcopy/opt/rocm_sdk_611/bin/llvm-objcopy/opt/rocm_sdk_611/bin/llvm-objcopy/opt/rocm_sdk_611/bin/llvm-objcopy/opt/rocm_sdk_611/bin/llvm-objcopy: /opt/rocm_sdk_611/bin/llvm-objcopy: : : : /opt/rocm_sdk_611/bin/llvm-objcopy/opt/rocm_sdk_611/bin/llvm-objcopy/opt/rocm_sdk_611/bin/llvm-objcopy: error: : : : error: error: error: error: error: 'error: '/tmp/amd_triton_kernel-452ee7/amd_triton_kernel-452ee7.hsaco''/tmp/amd_triton_kernel-d218bc/amd_triton_kernel-d218bc.hsacoerror: 'error: '/tmp/amd_triton_kernel-23eb90/amd_triton_kernel-23eb90.hsaco': /tmp/amd_triton_kernel-b31384/amd_triton_kernel-b31384.hsaco': /tmp/amd_triton_kernel-b39d22/amd_triton_kernel-b39d22.hsaco/tmp/amd_triton_kernel-7f2db2/amd_triton_kernel-7f2db2.hsaco'': No such file or directory': '': No such file or directory'': /tmp/amd_triton_kernel-a21919/amd_triton_kernel-a21919.hsacoNo such file or directory
/tmp/amd_triton_kernel-4ab872/amd_triton_kernel-4ab872.hsacoNo such file or directory
No such file or directory/tmp/amd_triton_kernel-e0146e/amd_triton_kernel-e0146e.hsaco': No such file or directory
':
':
No such file or directoryNo such file or directory
No such file or directory
/opt/rocm_sdk_611/bin/llvm-objcopy: error: '/tmp/amd_triton_kernel-c2efdf/amd_triton_kernel-c2efdf.hsaco': No such file or directory
/opt/rocm_sdk_611/bin/llvm-objcopy: error: '/tmp/amd_triton_kernel-3bd7a6/amd_triton_kernel-3bd7a6.hsaco': No such file or directory
/opt/rocm_sdk_611/bin/llvm-objcopy: error: '/tmp/amd_triton_kernel-9f7912/amd_triton_kernel-9f7912.hsaco': No such file or directory
/opt/rocm_sdk_611/bin/zstd -f /mnt/Data/programming-stuff/python/Deep_learning/rocm_sdk_builder/builddir/038_02_aotriton/v2src/flash/gpu_kernel_image.attn_fwd/attn_fwd-Sig-F__^fp16@16_False_16_False_False_False_1__P__128_64_1__CO__warp4_stg1_wave1-Gpu-MI300X.hsaco
/opt/rocm_sdk_611/bin/zstd -f /mnt/Data/programming-stuff/python/Deep_learning/rocm_sdk_builder/builddir/038_02_aotriton/v2src/flash/gpu_kernel_image.attn_fwd/attn_fwd-Sig-F__^fp16@16_False_16_False_False_False_0__P__128_64_0__CO__warp4_stg1_wave2-Gpu-MI300X.hsaco
/opt/rocm_sdk_611/bin/zstd -f /mnt/Data/programming-stuff/python/Deep_learning/rocm_sdk_builder/builddir/038_02_aotriton/v2src/flash/gpu_kernel_image.attn_fwd/attn_fwd-Sig-F__^fp16@16_False_16_False_False_False_0__P__128_64_0__CO__warp4_stg1_wave1-Gpu-MI300X.hsaco
/opt/rocm_sdk_611/bin/zstd -f /mnt/Data/programming-stuff/python/Deep_learning/rocm_sdk_builder/builddir/038_02_aotriton/v2src/flash/gpu_kernel_image.attn_fwd/attn_fwd-Sig-F__^fp16@16_False_16_False_False_False_0__P__128_64_1__CO__warp4_stg1_wave2-Gpu-MI300X.hsaco
/opt/rocm_sdk_611/bin/zstd -f /mnt/Data/programming-stuff/python/Deep_learning/rocm_sdk_builder/builddir/038_02_aotriton/v2src/flash/gpu_kernel_image.attn_fwd/attn_fwd-Sig-F__^fp16@16_False_16_False_False_False_0__P__128_64_0__CO__warp4_stg1_wave0-Gpu-MI300X.hsaco
/opt/rocm_sdk_611/bin/zstd -f /mnt/Data/programming-stuff/python/Deep_learning/rocm_sdk_builder/builddir/038_02_aotriton/v2src/flash/gpu_kernel_image.attn_fwd/attn_fwd-Sig-F__^fp16@16_False_16_False_False_False_1__P__128_64_0__CO__warp4_stg1_wave1-Gpu-MI300X.hsaco
/opt/rocm_sdk_611/bin/zstd -f /mnt/Data/programming-stuff/python/Deep_learning/rocm_sdk_builder/builddir/038_02_aotriton/v2src/flash/gpu_kernel_image.attn_fwd/attn_fwd-Sig-F__^fp16@16_False_16_False_False_False_1__P__128_64_0__CO__warp4_stg1_wave0-Gpu-MI300X.hsaco
zstd: can't stat /mnt/Data/programming-stuff/python/Deep_learning/rocm_sdk_builder/builddir/038_02_aotriton/v2src/flash/gpu_kernzstd: can't stat /mnt/Data/programming-stuff/python/Deep_learning/rocm_sdk_builder/builddir/038_02_aotriton/v2src/flash/gpu_kernel_image.attn_fwd/attn_fwd-Sig-F__^fp16@16_False_16_False_False_False_0__P__128_64_0__CO__warp4_stg1_wave2-Gpu-MI300X.hsaco : Noel_image.attn_fwd/attn_fwd-Sig-F__^fp16@16_False_16_False_False_False_0__P__128_64_1__CO__warp4_stg1_wave2-Gpu-MI300X.hsaco : No such file or directory -- ignored
such file or directory -- ignored
zstd: can't stat /mnt/Data/programming-stuff/python/Deep_learning/rocm_sdk_builder/builddir/038_02_aotriton/v2src/flash/gpu_kernzstd: can't stat /mnt/Data/programming-stuff/python/Deep_learning/rocm_sdk_builder/builddir/038_02_aotriton/v2src/flash/gpu_kernel_image.attn_fwd/attn_fwd-Sig-F__^fp16@16_False_16_False_False_False_0__P__128_64_0__CO__warp4_stg1_wave0-Gpu-MI300X.hsaco : Nozstd: can't stat /mnt/Data/programming-stuff/python/Deep_learning/rocm_sdk_builder/builddir/038_02_aotriton/v2src/flash/gpu_kernzstd: can't stat /mnt/Data/programming-stuff/python/Deep_learning/rocm_sdk_builder/builddir/038_02_aotriton/v2src/flash/gpu_kernel_image.attn_fwd/attn_fwd-Sig-F__^fp16@16_False_16_False_False_False_1__P__128_64_0__CO__warp4_stg1_wave1-Gpu-MI300X.hsaco : No such file or directory -- ignored
el_image.attn_fwd/attn_fwd-Sig-F__^fp16@16_False_16_False_False_False_0__P__128_64_0__CO__warp4_stg1_wave1-Gpu-MI300X.hsaco : Noel_image.attn_fwd/attn_fwd-Sig-F__^fp16@16_False_16_False_False_False_1__P__128_64_1__CO__warp4_stg1_wave1-Gpu-MI300X.hsaco : No such file or directory -- ignored
such file or directory -- ignored
such file or directory -- ignored
zstd: can't stat /mnt/Data/programming-stuff/python/Deep_learning/rocm_sdk_builder/builddir/038_02_aotriton/v2src/flash/gpu_kernel_image.attn_fwd/attn_fwd-Sig-F__^fp16@16_False_16_False_False_False_1__P__128_64_0__CO__warp4_stg1_wave0-Gpu-MI300X.hsaco : No such file or directory -- ignored
make[3]: *** [Makefile.compile:17: flash/gpu_kernel_image.attn_fwd/attn_fwd-Sig-F__^fp16@16_False_16_False_False_False_0__P__128_64_0__CO__warp4_stg1_wave2-Gpu-MI300X.hsaco] Error 1
make[3]: *** Waiting for unfinished jobs....
make[3]: *** [Makefile.compile:47: flash/gpu_kernel_image.attn_fwd/attn_fwd-Sig-F__^fp16@16_False_16_False_False_False_1__P__128_64_1__CO__warp4_stg1_wave1-Gpu-MI300X.hsaco] Error 1
make[3]: *** [Makefile.compile:23: flash/gpu_kernel_image.attn_fwd/attn_fwd-Sig-F__^fp16@16_False_16_False_False_False_0__P__128_64_0__CO__warp4_stg1_wave1-Gpu-MI300X.hsaco] Error 1
make[3]: *** [Makefile.compile:29: flash/gpu_kernel_image.attn_fwd/attn_fwd-Sig-F__^fp16@16_False_16_False_False_False_0__P__128_64_1__CO__warp4_stg1_wave2-Gpu-MI300X.hsaco] Error 1
make[3]: *** [Makefile.compile:35: flash/gpu_kernel_image.attn_fwd/attn_fwd-Sig-F__^fp16@16_False_16_False_False_False_0__P__128_64_0__CO__warp4_stg1_wave0-Gpu-MI300X.hsaco] Error 1
make[3]: *** [Makefile.compile:59: flash/gpu_kernel_image.attn_fwd/attn_fwd-Sig-F__^fp16@16_False_16_False_False_False_1__P__128_64_0__CO__warp4_stg1_wave1-Gpu-MI300X.hsaco] Error 1
make[3]: *** [Makefile.compile:71: flash/gpu_kernel_image.attn_fwd/attn_fwd-Sig-F__^fp16@16_False_16_False_False_False_1__P__128_64_0__CO__warp4_stg1_wave0-Gpu-MI300X.hsaco] Error 1
/opt/rocm_sdk_611/bin/zstd -f /mnt/Data/programming-stuff/python/Deep_learning/rocm_sdk_builder/builddir/038_02_aotriton/v2src/flash/gpu_kernel_image.attn_fwd/attn_fwd-Sig-F__^fp16@16_False_16_False_False_False_0__P__128_64_1__CO__warp4_stg1_wave0-Gpu-MI300X.hsaco
/opt/rocm_sdk_611/bin/zstd -f /mnt/Data/programming-stuff/python/Deep_learning/rocm_sdk_builder/builddir/038_02_aotriton/v2src/flash/gpu_kernel_image.attn_fwd/attn_fwd-Sig-F__^fp16@16_False_16_False_False_False_0__P__128_64_1__CO__warp4_stg1_wave1-Gpu-MI300X.hsaco
zstd: can't stat /mnt/Data/programming-stuff/python/Deep_learning/rocm_sdk_builder/builddir/038_02_aotriton/v2src/flash/gpu_kernel_image.attn_fwd/attn_fwd-Sig-F__^fp16@16_False_16_False_False_False_0__P__128_64_1__CO__warp4_stg1_wave0-Gpu-MI300X.hsaco : No such file or directory -- ignored
make[3]: *** [Makefile.compile:41: flash/gpu_kernel_image.attn_fwd/attn_fwd-Sig-F__^fp16@16_False_16_False_False_False_0__P__128_64_1__CO__warp4_stg1_wave0-Gpu-MI300X.hsaco] Error 1
/opt/rocm_sdk_611/bin/zstd -f /mnt/Data/programming-stuff/python/Deep_learning/rocm_sdk_builder/builddir/038_02_aotriton/v2src/flash/gpu_kernel_image.attn_fwd/attn_fwd-Sig-F__^fp16@16_False_16_False_False_False_1__P__128_64_0__CO__warp4_stg1_wave2-Gpu-MI300X.hsaco
zstd: can't stat /mnt/Data/programming-stuff/python/Deep_learning/rocm_sdk_builder/builddir/038_02_aotriton/v2src/flash/gpu_kernel_image.attn_fwd/attn_fwd-Sig-F__^fp16@16_False_16_False_False_False_0__P__128_64_1__CO__warp4_stg1_wave1-Gpu-MI300X.hsaco : No such file or directory -- ignored
make[3]: *** [Makefile.compile:11: flash/gpu_kernel_image.attn_fwd/attn_fwd-Sig-F__^fp16@16_False_16_False_False_False_0__P__128_64_1__CO__warp4_stg1_wave1-Gpu-MI300X.hsaco] Error 1
zstd: can't stat /mnt/Data/programming-stuff/python/Deep_learning/rocm_sdk_builder/builddir/038_02_aotriton/v2src/flash/gpu_kernel_image.attn_fwd/attn_fwd-Sig-F__^fp16@16_False_16_False_False_False_1__P__128_64_0__CO__warp4_stg1_wave2-Gpu-MI300X.hsaco : No such file or directory -- ignored
make[3]: *** [Makefile.compile:53: flash/gpu_kernel_image.attn_fwd/attn_fwd-Sig-F__^fp16@16_False_16_False_False_False_1__P__128_64_0__CO__warp4_stg1_wave2-Gpu-MI300X.hsaco] Error 1
/opt/rocm_sdk_611/bin/zstd -f /mnt/Data/programming-stuff/python/Deep_learning/rocm_sdk_builder/builddir/038_02_aotriton/v2src/flash/gpu_kernel_image.attn_fwd/attn_fwd-Sig-F__^fp16@16_False_16_False_False_False_1__P__128_64_1__CO__warp4_stg1_wave2-Gpu-MI300X.hsaco
/opt/rocm_sdk_611/bin/zstd -f /mnt/Data/programming-stuff/python/Deep_learning/rocm_sdk_builder/builddir/038_02_aotriton/v2src/flash/gpu_kernel_image.attn_fwd/attn_fwd-Sig-F__^fp16@16_False_16_False_False_False_1__P__128_64_1__CO__warp4_stg1_wave0-Gpu-MI300X.hsaco
zstd: can't stat /mnt/Data/programming-stuff/python/Deep_learning/rocm_sdk_builder/builddir/038_02_aotriton/v2src/flash/gpu_kernel_image.attn_fwd/attn_fwd-Sig-F__^fp16@16_False_16_False_False_False_1__P__128_64_1__CO__warp4_stg1_wave2-Gpu-MI300X.hsaco : No such file or directory -- ignored
zstd: can't stat /mnt/Data/programming-stuff/python/Deep_learning/rocm_sdk_builder/builddir/038_02_aotriton/v2src/flash/gpu_kernel_image.attn_fwd/attn_fwd-Sig-F__^fp16@16_False_16_False_False_False_1__P__128_64_1__CO__warp4_stg1_wave0-Gpu-MI300X.hsaco : No such file or directory -- ignored
make[3]: *** [Makefile.compile:65: flash/gpu_kernel_image.attn_fwd/attn_fwd-Sig-F__^fp16@16_False_16_False_False_False_1__P__128_64_1__CO__warp4_stg1_wave2-Gpu-MI300X.hsaco] Error 1
make[3]: *** [Makefile.compile:77: flash/gpu_kernel_image.attn_fwd/attn_fwd-Sig-F__^fp16@16_False_16_False_False_False_1__P__128_64_1__CO__warp4_stg1_wave0-Gpu-MI300X.hsaco] Error 1
make[3]: Leaving directory '/mnt/Data/programming-stuff/python/Deep_learning/rocm_sdk_builder/builddir/038_02_aotriton/v2src'
make[2]: *** [v2src/CMakeFiles/aotriton_v2_compile.dir/build.make:70: v2src/CMakeFiles/aotriton_v2_compile] Error 2
make[2]: Leaving directory '/mnt/Data/programming-stuff/python/Deep_learning/rocm_sdk_builder/builddir/038_02_aotriton'
make[1]: *** [CMakeFiles/Makefile2:217: v2src/CMakeFiles/aotriton_v2_compile.dir/all] Error 2
make[1]: Leaving directory '/mnt/Data/programming-stuff/python/Deep_learning/rocm_sdk_builder/builddir/038_02_aotriton'
make: *** [Makefile:136: all] Error 2
build failed: aotriton
Build failed
I used the master branch from this commit
Distro : Fedora 40
I had similar problems with gfx1035 2 days ago but it went away after I removed from aotriton one patch that was not yet ready. Can you check if you still have this patch in aotriton folder? "work in progress to add more gpus" I think it should be taken away. I removed it from the patches folder, so if you have run
git pull
./babs.sh -co
./babs.sh -ap
rm -rf builddir/038_02_aotriton/
It should not be there anymore... If that does not help, try to also remove the src_projects aotriton and do babs.sh -i to sync it back.
I am currently AFK, once I'm back I'll try and let you know
Thats fine, I am also doing now some experiments at the moment with the upstream triton.
On my gfx1035 I encountered exactly same error you are seeing (and I have saw sometimes earlier)
F__^fp16@16_False_16_False_False_False_0__P__128_64_1__CO__warp4_stg1_wave2-Gpu-MI300X.hsaco : No such file or directory -- ignored
such file or directory -- ignored
I then started debugging it by doing print lines under "def compile(fn, **kwargs):" method of third_party/triton/python/triton/compiler/compiler.py to see whats going on and the error went away.
It seems that if I add even single line to change, it fixes the problem. I do not know is this a some kind of timing issue that gets fixed by forcing the python code recompile or what, but that way it works... On another machine I did not need to do this operation.
Are you able to test if this fixes your problem?
I am still afk for a week or so, I'll test it once I'm back, sorry.
I believe the real reason for sometimes seeing the "No such file or directory/tmp/amd_triton_kernel-e0146e/amd_triton_kernel-e0146e.hsaco': No such file or directory" error is that the triton cache in directory
~./triton/cache gets somehow corrupted during the build time. Deletion of that directory seems to help.
Hello again, I have just tried to compile using master branch and it compiled sucessfully without making any changes to the code.
As for the ~/.triton/cache
corruption what if one of the build shell files clears that dir everytime someone tries to build trition? it's cache files anyways so cleaning it wouldn't break anything.
You can set custom cache directory with TRITON_CACHE_DIR
environment variable, by the way.
Triton will still store some post-build garbage in ~/.triton/ due to https://github.com/ROCm/triton/blob/5e6e8227686334e7342db17bcdd0f8f5b3d6fcd7/python/triton/runtime/cache.py#L53-L62 , but at least some files will be not there.
The .triton/cache directory is now cleaned on pre-config phase of aotriton build as a workaround. I also copy/pasted the error we are seeing to some upstream bug that was open and looked like a similar problem.
Closing this.
When trying to build aotriton the ram usage sky rockets and even starts using swap memory, is there a way to fix this? The system becomes unusable during that stage. I used the
rocm_sdk_builder_611_bg12_amdmigraphx
branch with this instructions to avoid the previous amdmigraphx build failed error