lamikr / rocm_sdk_builder

Other
132 stars 12 forks source link

aotriton build issue #28

Closed mritunjaymusale closed 1 month ago

mritunjaymusale commented 4 months ago

When trying to build aotriton the ram usage sky rockets and even starts using swap memory, is there a way to fix this? The system becomes unusable during that stage. I used the rocm_sdk_builder_611_bg12_amdmigraphx branch with this instructions to avoid the previous amdmigraphx build failed error

mritunjaymusale commented 4 months ago

I think I found the reason and the source but idk how to patch it. The file in the screenshot seems to be allocating double the number of cpu threads which causes the ram usage to spike and make the system unusable. image so the fix for me was to remove the 2* and run the build again and aotriton build continued without problems.

lamikr commented 4 months ago

Thanks for the finding, I think that would be a good patch to have, If you want you could create a patch from that change. Procedure would be something like following

git add third_party/triton/python/setup.py
git commit -s -m "your commit message"
git format-patch upstream/master
mv 00* ../../patches/rocm-6.1.1/aotriton/
cd rocm_sdk_builder
git add patches/rocm-6.1.1/aotriton/*
git commit -s -m "added aotriton patch for cpu fix"

then push the change to your github fork on some branch and create merge request

mritunjaymusale commented 4 months ago

another thing I would like to point out libaio-dev is the incorrect name for fedora dependency it's named as libaio-devel i'll add this to the patch aswell, later today. dispite adding this package the build for deepspeed fails, but that would be a topic for seperate issue.

lamikr commented 3 months ago

Thank you for the aotriton patch, it's now applied.

About the libaio. On ubuntu the package is called "libaio-dev" and on Fedora it's "libaio-devel" and I think that is ok on the sdk builders install_debs.sh script. Are you referring to some other location?

And for the DeepSpeed, could it be a read-write access problem to /dev/kfd driver. See issue:

https://github.com/lamikr/rocm_sdk_builder/issues/8

mritunjaymusale commented 3 months ago

Ignore the libaio-devel issue that was my bad, as for the deepspeed I am opening another issue where this can be discussed.

mritunjaymusale commented 3 months ago

aotriton build fails again this time with different error :

make[3]: Entering directory '/mnt/Data/programming-stuff/python/Deep_learning/rocm_sdk_builder/builddir/038_02_aotriton/v2src'
make[3]: warning: -j12 forced in submake: resetting jobserver mode.
LD_PRELOAD=/opt/rocm_sdk_611/lib64/libhsa-runtime64.so /mnt/Data/programming-stuff/python/Deep_learning/rocm_sdk_builder/src_projects/aotriton/v2python/compile.py /mnt/Data/programming-stuff/python/Deep_learning/rocm_sdk_builder/src_projects/aotriton/tritonsrc/flash.py --kernel_name attn_fwd -o /mnt/Data/programming-stuff/python/Deep_learning/rocm_sdk_builder/builddir/038_02_aotriton/v2src/flash/gpu_kernel_image.attn_fwd/attn_fwd-Sig-F__^fp16@16_False_16_False_False_False_0__P__128_64_1__CO__warp4_stg1_wave1-Gpu-MI300X.hsaco -g 1,1,1 --num_warps 4 --num_stages 1 --waves_per_eu 1 --target 'MI300X' --signature '*fp16:16, *fp16:16, *fp16:16, *fp16:16, fp32, *fp32:16, *fp16:16, u64:16, u64:16, u64:16, 1, u64:16, u64:16, u64:16, 1, u64:16, u64:16, u64:16, 1, u64:16, u64:16, u64:16, 1, u64:16, u64:16, u64:16, 1, i32, i32, u64, fp32, u64, u32, *fp16:16, False, 128, 16, 64, 1, False, False, False, 0'
LD_PRELOAD=/opt/rocm_sdk_611/lib64/libhsa-runtime64.so /mnt/Data/programming-stuff/python/Deep_learning/rocm_sdk_builder/src_projects/aotriton/v2python/compile.py /mnt/Data/programming-stuff/python/Deep_learning/rocm_sdk_builder/src_projects/aotriton/tritonsrc/flash.py --kernel_name attn_fwd -o /mnt/Data/programming-stuff/python/Deep_learning/rocm_sdk_builder/builddir/038_02_aotriton/v2src/flash/gpu_kernel_image.attn_fwd/attn_fwd-Sig-F__^fp16@16_False_16_False_False_False_0__P__128_64_0__CO__warp4_stg1_wave2-Gpu-MI300X.hsaco -g 1,1,1 --num_warps 4 --num_stages 1 --waves_per_eu 2 --target 'MI300X' --signature '*fp16:16, *fp16:16, *fp16:16, *fp16:16, fp32, *fp32:16, *fp16:16, u64:16, u64:16, u64:16, 1, u64:16, u64:16, u64:16, 1, u64:16, u64:16, u64:16, 1, u64:16, u64:16, u64:16, 1, u64:16, u64:16, u64:16, 1, i32, i32, u64, fp32, u64, u32, *fp16:16, False, 128, 16, 64, 0, False, False, False, 0'
LD_PRELOAD=/opt/rocm_sdk_611/lib64/libhsa-runtime64.so /mnt/Data/programming-stuff/python/Deep_learning/rocm_sdk_builder/src_projects/aotriton/v2python/compile.py /mnt/Data/programming-stuff/python/Deep_learning/rocm_sdk_builder/src_projects/aotriton/tritonsrc/flash.py --kernel_name attn_fwd -o /mnt/Data/programming-stuff/python/Deep_learning/rocm_sdk_builder/builddir/038_02_aotriton/v2src/flash/gpu_kernel_image.attn_fwd/attn_fwd-Sig-F__^fp16@16_False_16_False_False_False_0__P__128_64_0__CO__warp4_stg1_wave1-Gpu-MI300X.hsaco -g 1,1,1 --num_warps 4 --num_stages 1 --waves_per_eu 1 --target 'MI300X' --signature '*fp16:16, *fp16:16, *fp16:16, *fp16:16, fp32, *fp32:16, *fp16:16, u64:16, u64:16, u64:16, 1, u64:16, u64:16, u64:16, 1, u64:16, u64:16, u64:16, 1, u64:16, u64:16, u64:16, 1, u64:16, u64:16, u64:16, 1, i32, i32, u64, fp32, u64, u32, *fp16:16, False, 128, 16, 64, 0, False, False, False, 0'
LD_PRELOAD=/opt/rocm_sdk_611/lib64/libhsa-runtime64.so /mnt/Data/programming-stuff/python/Deep_learning/rocm_sdk_builder/src_projects/aotriton/v2python/compile.py /mnt/Data/programming-stuff/python/Deep_learning/rocm_sdk_builder/src_projects/aotriton/tritonsrc/flash.py --kernel_name attn_fwd -o /mnt/Data/programming-stuff/python/Deep_learning/rocm_sdk_builder/builddir/038_02_aotriton/v2src/flash/gpu_kernel_image.attn_fwd/attn_fwd-Sig-F__^fp16@16_False_16_False_False_False_0__P__128_64_1__CO__warp4_stg1_wave2-Gpu-MI300X.hsaco -g 1,1,1 --num_warps 4 --num_stages 1 --waves_per_eu 2 --target 'MI300X' --signature '*fp16:16, *fp16:16, *fp16:16, *fp16:16, fp32, *fp32:16, *fp16:16, u64:16, u64:16, u64:16, 1, u64:16, u64:16, u64:16, 1, u64:16, u64:16, u64:16, 1, u64:16, u64:16, u64:16, 1, u64:16, u64:16, u64:16, 1, i32, i32, u64, fp32, u64, u32, *fp16:16, False, 128, 16, 64, 1, False, False, False, 0'
LD_PRELOAD=/opt/rocm_sdk_611/lib64/libhsa-runtime64.so /mnt/Data/programming-stuff/python/Deep_learning/rocm_sdk_builder/src_projects/aotriton/v2python/compile.py /mnt/Data/programming-stuff/python/Deep_learning/rocm_sdk_builder/src_projects/aotriton/tritonsrc/flash.py --kernel_name attn_fwd -o /mnt/Data/programming-stuff/python/Deep_learning/rocm_sdk_builder/builddir/038_02_aotriton/v2src/flash/gpu_kernel_image.attn_fwd/attn_fwd-Sig-F__^fp16@16_False_16_False_False_False_0__P__128_64_0__CO__warp4_stg1_wave0-Gpu-MI300X.hsaco -g 1,1,1 --num_warps 4 --num_stages 1 --waves_per_eu 0 --target 'MI300X' --signature '*fp16:16, *fp16:16, *fp16:16, *fp16:16, fp32, *fp32:16, *fp16:16, u64:16, u64:16, u64:16, 1, u64:16, u64:16, u64:16, 1, u64:16, u64:16, u64:16, 1, u64:16, u64:16, u64:16, 1, u64:16, u64:16, u64:16, 1, i32, i32, u64, fp32, u64, u32, *fp16:16, False, 128, 16, 64, 0, False, False, False, 0'
LD_PRELOAD=/opt/rocm_sdk_611/lib64/libhsa-runtime64.so /mnt/Data/programming-stuff/python/Deep_learning/rocm_sdk_builder/src_projects/aotriton/v2python/compile.py /mnt/Data/programming-stuff/python/Deep_learning/rocm_sdk_builder/src_projects/aotriton/tritonsrc/flash.py --kernel_name attn_fwd -o /mnt/Data/programming-stuff/python/Deep_learning/rocm_sdk_builder/builddir/038_02_aotriton/v2src/flash/gpu_kernel_image.attn_fwd/attn_fwd-Sig-F__^fp16@16_False_16_False_False_False_0__P__128_64_1__CO__warp4_stg1_wave0-Gpu-MI300X.hsaco -g 1,1,1 --num_warps 4 --num_stages 1 --waves_per_eu 0 --target 'MI300X' --signature '*fp16:16, *fp16:16, *fp16:16, *fp16:16, fp32, *fp32:16, *fp16:16, u64:16, u64:16, u64:16, 1, u64:16, u64:16, u64:16, 1, u64:16, u64:16, u64:16, 1, u64:16, u64:16, u64:16, 1, u64:16, u64:16, u64:16, 1, i32, i32, u64, fp32, u64, u32, *fp16:16, False, 128, 16, 64, 1, False, False, False, 0'
LD_PRELOAD=/opt/rocm_sdk_611/lib64/libhsa-runtime64.so /mnt/Data/programming-stuff/python/Deep_learning/rocm_sdk_builder/src_projects/aotriton/v2python/compile.py /mnt/Data/programming-stuff/python/Deep_learning/rocm_sdk_builder/src_projects/aotriton/tritonsrc/flash.py --kernel_name attn_fwd -o /mnt/Data/programming-stuff/python/Deep_learning/rocm_sdk_builder/builddir/038_02_aotriton/v2src/flash/gpu_kernel_image.attn_fwd/attn_fwd-Sig-F__^fp16@16_False_16_False_False_False_1__P__128_64_1__CO__warp4_stg1_wave1-Gpu-MI300X.hsaco -g 1,1,1 --num_warps 4 --num_stages 1 --waves_per_eu 1 --target 'MI300X' --signature '*fp16:16, *fp16:16, *fp16:16, *fp16:16, fp32, *fp32:16, *fp16:16, u64:16, u64:16, u64:16, 1, u64:16, u64:16, u64:16, 1, u64:16, u64:16, u64:16, 1, u64:16, u64:16, u64:16, 1, u64:16, u64:16, u64:16, 1, i32, i32, u64, fp32, u64, u32, *fp16:16, False, 128, 16, 64, 1, False, False, False, 1'
LD_PRELOAD=/opt/rocm_sdk_611/lib64/libhsa-runtime64.so /mnt/Data/programming-stuff/python/Deep_learning/rocm_sdk_builder/src_projects/aotriton/v2python/compile.py /mnt/Data/programming-stuff/python/Deep_learning/rocm_sdk_builder/src_projects/aotriton/tritonsrc/flash.py --kernel_name attn_fwd -o /mnt/Data/programming-stuff/python/Deep_learning/rocm_sdk_builder/builddir/038_02_aotriton/v2src/flash/gpu_kernel_image.attn_fwd/attn_fwd-Sig-F__^fp16@16_False_16_False_False_False_1__P__128_64_0__CO__warp4_stg1_wave2-Gpu-MI300X.hsaco -g 1,1,1 --num_warps 4 --num_stages 1 --waves_per_eu 2 --target 'MI300X' --signature '*fp16:16, *fp16:16, *fp16:16, *fp16:16, fp32, *fp32:16, *fp16:16, u64:16, u64:16, u64:16, 1, u64:16, u64:16, u64:16, 1, u64:16, u64:16, u64:16, 1, u64:16, u64:16, u64:16, 1, u64:16, u64:16, u64:16, 1, i32, i32, u64, fp32, u64, u32, *fp16:16, False, 128, 16, 64, 0, False, False, False, 1'
LD_PRELOAD=/opt/rocm_sdk_611/lib64/libhsa-runtime64.so /mnt/Data/programming-stuff/python/Deep_learning/rocm_sdk_builder/src_projects/aotriton/v2python/compile.py /mnt/Data/programming-stuff/python/Deep_learning/rocm_sdk_builder/src_projects/aotriton/tritonsrc/flash.py --kernel_name attn_fwd -o /mnt/Data/programming-stuff/python/Deep_learning/rocm_sdk_builder/builddir/038_02_aotriton/v2src/flash/gpu_kernel_image.attn_fwd/attn_fwd-Sig-F__^fp16@16_False_16_False_False_False_1__P__128_64_0__CO__warp4_stg1_wave1-Gpu-MI300X.hsaco -g 1,1,1 --num_warps 4 --num_stages 1 --waves_per_eu 1 --target 'MI300X' --signature '*fp16:16, *fp16:16, *fp16:16, *fp16:16, fp32, *fp32:16, *fp16:16, u64:16, u64:16, u64:16, 1, u64:16, u64:16, u64:16, 1, u64:16, u64:16, u64:16, 1, u64:16, u64:16, u64:16, 1, u64:16, u64:16, u64:16, 1, i32, i32, u64, fp32, u64, u32, *fp16:16, False, 128, 16, 64, 0, False, False, False, 1'
LD_PRELOAD=/opt/rocm_sdk_611/lib64/libhsa-runtime64.so /mnt/Data/programming-stuff/python/Deep_learning/rocm_sdk_builder/src_projects/aotriton/v2python/compile.py /mnt/Data/programming-stuff/python/Deep_learning/rocm_sdk_builder/src_projects/aotriton/tritonsrc/flash.py --kernel_name attn_fwd -o /mnt/Data/programming-stuff/python/Deep_learning/rocm_sdk_builder/builddir/038_02_aotriton/v2src/flash/gpu_kernel_image.attn_fwd/attn_fwd-Sig-F__^fp16@16_False_16_False_False_False_1__P__128_64_1__CO__warp4_stg1_wave2-Gpu-MI300X.hsaco -g 1,1,1 --num_warps 4 --num_stages 1 --waves_per_eu 2 --target 'MI300X' --signature '*fp16:16, *fp16:16, *fp16:16, *fp16:16, fp32, *fp32:16, *fp16:16, u64:16, u64:16, u64:16, 1, u64:16, u64:16, u64:16, 1, u64:16, u64:16, u64:16, 1, u64:16, u64:16, u64:16, 1, u64:16, u64:16, u64:16, 1, i32, i32, u64, fp32, u64, u32, *fp16:16, False, 128, 16, 64, 1, False, False, False, 1'
LD_PRELOAD=/opt/rocm_sdk_611/lib64/libhsa-runtime64.so /mnt/Data/programming-stuff/python/Deep_learning/rocm_sdk_builder/src_projects/aotriton/v2python/compile.py /mnt/Data/programming-stuff/python/Deep_learning/rocm_sdk_builder/src_projects/aotriton/tritonsrc/flash.py --kernel_name attn_fwd -o /mnt/Data/programming-stuff/python/Deep_learning/rocm_sdk_builder/builddir/038_02_aotriton/v2src/flash/gpu_kernel_image.attn_fwd/attn_fwd-Sig-F__^fp16@16_False_16_False_False_False_1__P__128_64_0__CO__warp4_stg1_wave0-Gpu-MI300X.hsaco -g 1,1,1 --num_warps 4 --num_stages 1 --waves_per_eu 0 --target 'MI300X' --signature '*fp16:16, *fp16:16, *fp16:16, *fp16:16, fp32, *fp32:16, *fp16:16, u64:16, u64:16, u64:16, 1, u64:16, u64:16, u64:16, 1, u64:16, u64:16, u64:16, 1, u64:16, u64:16, u64:16, 1, u64:16, u64:16, u64:16, 1, i32, i32, u64, fp32, u64, u32, *fp16:16, False, 128, 16, 64, 0, False, False, False, 1'
LD_PRELOAD=/opt/rocm_sdk_611/lib64/libhsa-runtime64.so /mnt/Data/programming-stuff/python/Deep_learning/rocm_sdk_builder/src_projects/aotriton/v2python/compile.py /mnt/Data/programming-stuff/python/Deep_learning/rocm_sdk_builder/src_projects/aotriton/tritonsrc/flash.py --kernel_name attn_fwd -o /mnt/Data/programming-stuff/python/Deep_learning/rocm_sdk_builder/builddir/038_02_aotriton/v2src/flash/gpu_kernel_image.attn_fwd/attn_fwd-Sig-F__^fp16@16_False_16_False_False_False_1__P__128_64_1__CO__warp4_stg1_wave0-Gpu-MI300X.hsaco -g 1,1,1 --num_warps 4 --num_stages 1 --waves_per_eu 0 --target 'MI300X' --signature '*fp16:16, *fp16:16, *fp16:16, *fp16:16, fp32, *fp32:16, *fp16:16, u64:16, u64:16, u64:16, 1, u64:16, u64:16, u64:16, 1, u64:16, u64:16, u64:16, 1, u64:16, u64:16, u64:16, 1, u64:16, u64:16, u64:16, 1, i32, i32, u64, fp32, u64, u32, *fp16:16, False, 128, 16, 64, 1, False, False, False, 1'
/opt/rocm_sdk_611/bin/llvm-objcopy/opt/rocm_sdk_611/bin/llvm-objcopy/opt/rocm_sdk_611/bin/llvm-objcopy/opt/rocm_sdk_611/bin/llvm-objcopy/opt/rocm_sdk_611/bin/llvm-objcopy: /opt/rocm_sdk_611/bin/llvm-objcopy: : : : /opt/rocm_sdk_611/bin/llvm-objcopy/opt/rocm_sdk_611/bin/llvm-objcopy/opt/rocm_sdk_611/bin/llvm-objcopy: error: : : : error: error: error: error: error: 'error: '/tmp/amd_triton_kernel-452ee7/amd_triton_kernel-452ee7.hsaco''/tmp/amd_triton_kernel-d218bc/amd_triton_kernel-d218bc.hsacoerror: 'error: '/tmp/amd_triton_kernel-23eb90/amd_triton_kernel-23eb90.hsaco': /tmp/amd_triton_kernel-b31384/amd_triton_kernel-b31384.hsaco': /tmp/amd_triton_kernel-b39d22/amd_triton_kernel-b39d22.hsaco/tmp/amd_triton_kernel-7f2db2/amd_triton_kernel-7f2db2.hsaco'': No such file or directory': '': No such file or directory'': /tmp/amd_triton_kernel-a21919/amd_triton_kernel-a21919.hsacoNo such file or directory
/tmp/amd_triton_kernel-4ab872/amd_triton_kernel-4ab872.hsacoNo such file or directory
No such file or directory/tmp/amd_triton_kernel-e0146e/amd_triton_kernel-e0146e.hsaco': No such file or directory
': 

': 
No such file or directoryNo such file or directory

No such file or directory
/opt/rocm_sdk_611/bin/llvm-objcopy: error: '/tmp/amd_triton_kernel-c2efdf/amd_triton_kernel-c2efdf.hsaco': No such file or directory
/opt/rocm_sdk_611/bin/llvm-objcopy: error: '/tmp/amd_triton_kernel-3bd7a6/amd_triton_kernel-3bd7a6.hsaco': No such file or directory
/opt/rocm_sdk_611/bin/llvm-objcopy: error: '/tmp/amd_triton_kernel-9f7912/amd_triton_kernel-9f7912.hsaco': No such file or directory
/opt/rocm_sdk_611/bin/zstd -f /mnt/Data/programming-stuff/python/Deep_learning/rocm_sdk_builder/builddir/038_02_aotriton/v2src/flash/gpu_kernel_image.attn_fwd/attn_fwd-Sig-F__^fp16@16_False_16_False_False_False_1__P__128_64_1__CO__warp4_stg1_wave1-Gpu-MI300X.hsaco 
/opt/rocm_sdk_611/bin/zstd -f /mnt/Data/programming-stuff/python/Deep_learning/rocm_sdk_builder/builddir/038_02_aotriton/v2src/flash/gpu_kernel_image.attn_fwd/attn_fwd-Sig-F__^fp16@16_False_16_False_False_False_0__P__128_64_0__CO__warp4_stg1_wave2-Gpu-MI300X.hsaco 
/opt/rocm_sdk_611/bin/zstd -f /mnt/Data/programming-stuff/python/Deep_learning/rocm_sdk_builder/builddir/038_02_aotriton/v2src/flash/gpu_kernel_image.attn_fwd/attn_fwd-Sig-F__^fp16@16_False_16_False_False_False_0__P__128_64_0__CO__warp4_stg1_wave1-Gpu-MI300X.hsaco 
/opt/rocm_sdk_611/bin/zstd -f /mnt/Data/programming-stuff/python/Deep_learning/rocm_sdk_builder/builddir/038_02_aotriton/v2src/flash/gpu_kernel_image.attn_fwd/attn_fwd-Sig-F__^fp16@16_False_16_False_False_False_0__P__128_64_1__CO__warp4_stg1_wave2-Gpu-MI300X.hsaco 
/opt/rocm_sdk_611/bin/zstd -f /mnt/Data/programming-stuff/python/Deep_learning/rocm_sdk_builder/builddir/038_02_aotriton/v2src/flash/gpu_kernel_image.attn_fwd/attn_fwd-Sig-F__^fp16@16_False_16_False_False_False_0__P__128_64_0__CO__warp4_stg1_wave0-Gpu-MI300X.hsaco 
/opt/rocm_sdk_611/bin/zstd -f /mnt/Data/programming-stuff/python/Deep_learning/rocm_sdk_builder/builddir/038_02_aotriton/v2src/flash/gpu_kernel_image.attn_fwd/attn_fwd-Sig-F__^fp16@16_False_16_False_False_False_1__P__128_64_0__CO__warp4_stg1_wave1-Gpu-MI300X.hsaco 
/opt/rocm_sdk_611/bin/zstd -f /mnt/Data/programming-stuff/python/Deep_learning/rocm_sdk_builder/builddir/038_02_aotriton/v2src/flash/gpu_kernel_image.attn_fwd/attn_fwd-Sig-F__^fp16@16_False_16_False_False_False_1__P__128_64_0__CO__warp4_stg1_wave0-Gpu-MI300X.hsaco 
zstd: can't stat /mnt/Data/programming-stuff/python/Deep_learning/rocm_sdk_builder/builddir/038_02_aotriton/v2src/flash/gpu_kernzstd: can't stat /mnt/Data/programming-stuff/python/Deep_learning/rocm_sdk_builder/builddir/038_02_aotriton/v2src/flash/gpu_kernel_image.attn_fwd/attn_fwd-Sig-F__^fp16@16_False_16_False_False_False_0__P__128_64_0__CO__warp4_stg1_wave2-Gpu-MI300X.hsaco : Noel_image.attn_fwd/attn_fwd-Sig-F__^fp16@16_False_16_False_False_False_0__P__128_64_1__CO__warp4_stg1_wave2-Gpu-MI300X.hsaco : No such file or directory -- ignored 
 such file or directory -- ignored 
zstd: can't stat /mnt/Data/programming-stuff/python/Deep_learning/rocm_sdk_builder/builddir/038_02_aotriton/v2src/flash/gpu_kernzstd: can't stat /mnt/Data/programming-stuff/python/Deep_learning/rocm_sdk_builder/builddir/038_02_aotriton/v2src/flash/gpu_kernel_image.attn_fwd/attn_fwd-Sig-F__^fp16@16_False_16_False_False_False_0__P__128_64_0__CO__warp4_stg1_wave0-Gpu-MI300X.hsaco : Nozstd: can't stat /mnt/Data/programming-stuff/python/Deep_learning/rocm_sdk_builder/builddir/038_02_aotriton/v2src/flash/gpu_kernzstd: can't stat /mnt/Data/programming-stuff/python/Deep_learning/rocm_sdk_builder/builddir/038_02_aotriton/v2src/flash/gpu_kernel_image.attn_fwd/attn_fwd-Sig-F__^fp16@16_False_16_False_False_False_1__P__128_64_0__CO__warp4_stg1_wave1-Gpu-MI300X.hsaco : No such file or directory -- ignored 
el_image.attn_fwd/attn_fwd-Sig-F__^fp16@16_False_16_False_False_False_0__P__128_64_0__CO__warp4_stg1_wave1-Gpu-MI300X.hsaco : Noel_image.attn_fwd/attn_fwd-Sig-F__^fp16@16_False_16_False_False_False_1__P__128_64_1__CO__warp4_stg1_wave1-Gpu-MI300X.hsaco : No such file or directory -- ignored 
 such file or directory -- ignored 
 such file or directory -- ignored 
zstd: can't stat /mnt/Data/programming-stuff/python/Deep_learning/rocm_sdk_builder/builddir/038_02_aotriton/v2src/flash/gpu_kernel_image.attn_fwd/attn_fwd-Sig-F__^fp16@16_False_16_False_False_False_1__P__128_64_0__CO__warp4_stg1_wave0-Gpu-MI300X.hsaco : No such file or directory -- ignored 
make[3]: *** [Makefile.compile:17: flash/gpu_kernel_image.attn_fwd/attn_fwd-Sig-F__^fp16@16_False_16_False_False_False_0__P__128_64_0__CO__warp4_stg1_wave2-Gpu-MI300X.hsaco] Error 1
make[3]: *** Waiting for unfinished jobs....
make[3]: *** [Makefile.compile:47: flash/gpu_kernel_image.attn_fwd/attn_fwd-Sig-F__^fp16@16_False_16_False_False_False_1__P__128_64_1__CO__warp4_stg1_wave1-Gpu-MI300X.hsaco] Error 1
make[3]: *** [Makefile.compile:23: flash/gpu_kernel_image.attn_fwd/attn_fwd-Sig-F__^fp16@16_False_16_False_False_False_0__P__128_64_0__CO__warp4_stg1_wave1-Gpu-MI300X.hsaco] Error 1
make[3]: *** [Makefile.compile:29: flash/gpu_kernel_image.attn_fwd/attn_fwd-Sig-F__^fp16@16_False_16_False_False_False_0__P__128_64_1__CO__warp4_stg1_wave2-Gpu-MI300X.hsaco] Error 1
make[3]: *** [Makefile.compile:35: flash/gpu_kernel_image.attn_fwd/attn_fwd-Sig-F__^fp16@16_False_16_False_False_False_0__P__128_64_0__CO__warp4_stg1_wave0-Gpu-MI300X.hsaco] Error 1
make[3]: *** [Makefile.compile:59: flash/gpu_kernel_image.attn_fwd/attn_fwd-Sig-F__^fp16@16_False_16_False_False_False_1__P__128_64_0__CO__warp4_stg1_wave1-Gpu-MI300X.hsaco] Error 1
make[3]: *** [Makefile.compile:71: flash/gpu_kernel_image.attn_fwd/attn_fwd-Sig-F__^fp16@16_False_16_False_False_False_1__P__128_64_0__CO__warp4_stg1_wave0-Gpu-MI300X.hsaco] Error 1
/opt/rocm_sdk_611/bin/zstd -f /mnt/Data/programming-stuff/python/Deep_learning/rocm_sdk_builder/builddir/038_02_aotriton/v2src/flash/gpu_kernel_image.attn_fwd/attn_fwd-Sig-F__^fp16@16_False_16_False_False_False_0__P__128_64_1__CO__warp4_stg1_wave0-Gpu-MI300X.hsaco 
/opt/rocm_sdk_611/bin/zstd -f /mnt/Data/programming-stuff/python/Deep_learning/rocm_sdk_builder/builddir/038_02_aotriton/v2src/flash/gpu_kernel_image.attn_fwd/attn_fwd-Sig-F__^fp16@16_False_16_False_False_False_0__P__128_64_1__CO__warp4_stg1_wave1-Gpu-MI300X.hsaco 
zstd: can't stat /mnt/Data/programming-stuff/python/Deep_learning/rocm_sdk_builder/builddir/038_02_aotriton/v2src/flash/gpu_kernel_image.attn_fwd/attn_fwd-Sig-F__^fp16@16_False_16_False_False_False_0__P__128_64_1__CO__warp4_stg1_wave0-Gpu-MI300X.hsaco : No such file or directory -- ignored 
make[3]: *** [Makefile.compile:41: flash/gpu_kernel_image.attn_fwd/attn_fwd-Sig-F__^fp16@16_False_16_False_False_False_0__P__128_64_1__CO__warp4_stg1_wave0-Gpu-MI300X.hsaco] Error 1
/opt/rocm_sdk_611/bin/zstd -f /mnt/Data/programming-stuff/python/Deep_learning/rocm_sdk_builder/builddir/038_02_aotriton/v2src/flash/gpu_kernel_image.attn_fwd/attn_fwd-Sig-F__^fp16@16_False_16_False_False_False_1__P__128_64_0__CO__warp4_stg1_wave2-Gpu-MI300X.hsaco 
zstd: can't stat /mnt/Data/programming-stuff/python/Deep_learning/rocm_sdk_builder/builddir/038_02_aotriton/v2src/flash/gpu_kernel_image.attn_fwd/attn_fwd-Sig-F__^fp16@16_False_16_False_False_False_0__P__128_64_1__CO__warp4_stg1_wave1-Gpu-MI300X.hsaco : No such file or directory -- ignored 
make[3]: *** [Makefile.compile:11: flash/gpu_kernel_image.attn_fwd/attn_fwd-Sig-F__^fp16@16_False_16_False_False_False_0__P__128_64_1__CO__warp4_stg1_wave1-Gpu-MI300X.hsaco] Error 1
zstd: can't stat /mnt/Data/programming-stuff/python/Deep_learning/rocm_sdk_builder/builddir/038_02_aotriton/v2src/flash/gpu_kernel_image.attn_fwd/attn_fwd-Sig-F__^fp16@16_False_16_False_False_False_1__P__128_64_0__CO__warp4_stg1_wave2-Gpu-MI300X.hsaco : No such file or directory -- ignored 
make[3]: *** [Makefile.compile:53: flash/gpu_kernel_image.attn_fwd/attn_fwd-Sig-F__^fp16@16_False_16_False_False_False_1__P__128_64_0__CO__warp4_stg1_wave2-Gpu-MI300X.hsaco] Error 1
/opt/rocm_sdk_611/bin/zstd -f /mnt/Data/programming-stuff/python/Deep_learning/rocm_sdk_builder/builddir/038_02_aotriton/v2src/flash/gpu_kernel_image.attn_fwd/attn_fwd-Sig-F__^fp16@16_False_16_False_False_False_1__P__128_64_1__CO__warp4_stg1_wave2-Gpu-MI300X.hsaco 
/opt/rocm_sdk_611/bin/zstd -f /mnt/Data/programming-stuff/python/Deep_learning/rocm_sdk_builder/builddir/038_02_aotriton/v2src/flash/gpu_kernel_image.attn_fwd/attn_fwd-Sig-F__^fp16@16_False_16_False_False_False_1__P__128_64_1__CO__warp4_stg1_wave0-Gpu-MI300X.hsaco 
zstd: can't stat /mnt/Data/programming-stuff/python/Deep_learning/rocm_sdk_builder/builddir/038_02_aotriton/v2src/flash/gpu_kernel_image.attn_fwd/attn_fwd-Sig-F__^fp16@16_False_16_False_False_False_1__P__128_64_1__CO__warp4_stg1_wave2-Gpu-MI300X.hsaco : No such file or directory -- ignored 
zstd: can't stat /mnt/Data/programming-stuff/python/Deep_learning/rocm_sdk_builder/builddir/038_02_aotriton/v2src/flash/gpu_kernel_image.attn_fwd/attn_fwd-Sig-F__^fp16@16_False_16_False_False_False_1__P__128_64_1__CO__warp4_stg1_wave0-Gpu-MI300X.hsaco : No such file or directory -- ignored 
make[3]: *** [Makefile.compile:65: flash/gpu_kernel_image.attn_fwd/attn_fwd-Sig-F__^fp16@16_False_16_False_False_False_1__P__128_64_1__CO__warp4_stg1_wave2-Gpu-MI300X.hsaco] Error 1
make[3]: *** [Makefile.compile:77: flash/gpu_kernel_image.attn_fwd/attn_fwd-Sig-F__^fp16@16_False_16_False_False_False_1__P__128_64_1__CO__warp4_stg1_wave0-Gpu-MI300X.hsaco] Error 1
make[3]: Leaving directory '/mnt/Data/programming-stuff/python/Deep_learning/rocm_sdk_builder/builddir/038_02_aotriton/v2src'
make[2]: *** [v2src/CMakeFiles/aotriton_v2_compile.dir/build.make:70: v2src/CMakeFiles/aotriton_v2_compile] Error 2
make[2]: Leaving directory '/mnt/Data/programming-stuff/python/Deep_learning/rocm_sdk_builder/builddir/038_02_aotriton'
make[1]: *** [CMakeFiles/Makefile2:217: v2src/CMakeFiles/aotriton_v2_compile.dir/all] Error 2
make[1]: Leaving directory '/mnt/Data/programming-stuff/python/Deep_learning/rocm_sdk_builder/builddir/038_02_aotriton'
make: *** [Makefile:136: all] Error 2
build failed: aotriton
Build failed

I used the master branch from this commit

Distro : Fedora 40

lamikr commented 3 months ago

I had similar problems with gfx1035 2 days ago but it went away after I removed from aotriton one patch that was not yet ready. Can you check if you still have this patch in aotriton folder? "work in progress to add more gpus" I think it should be taken away. I removed it from the patches folder, so if you have run


git pull
./babs.sh -co
./babs.sh -ap
rm -rf builddir/038_02_aotriton/

It should not be there anymore... If that does not help, try to also remove the src_projects aotriton and do babs.sh -i to sync it back.

mritunjaymusale commented 3 months ago

I am currently AFK, once I'm back I'll try and let you know

lamikr commented 3 months ago

Thats fine, I am also doing now some experiments at the moment with the upstream triton.

lamikr commented 3 months ago

On my gfx1035 I encountered exactly same error you are seeing (and I have saw sometimes earlier)

F__^fp16@16_False_16_False_False_False_0__P__128_64_1__CO__warp4_stg1_wave2-Gpu-MI300X.hsaco : No such file or directory -- ignored 
 such file or directory -- ignored 

I then started debugging it by doing print lines under "def compile(fn, **kwargs):" method of third_party/triton/python/triton/compiler/compiler.py to see whats going on and the error went away.

It seems that if I add even single line to change, it fixes the problem. I do not know is this a some kind of timing issue that gets fixed by forcing the python code recompile or what, but that way it works... On another machine I did not need to do this operation.

Are you able to test if this fixes your problem?

mritunjaymusale commented 3 months ago

I am still afk for a week or so, I'll test it once I'm back, sorry.

lamikr commented 3 months ago

I believe the real reason for sometimes seeing the "No such file or directory/tmp/amd_triton_kernel-e0146e/amd_triton_kernel-e0146e.hsaco': No such file or directory" error is that the triton cache in directory

~./triton/cache gets somehow corrupted during the build time. Deletion of that directory seems to help.

mritunjaymusale commented 3 months ago

Hello again, I have just tried to compile using master branch and it compiled sucessfully without making any changes to the code. As for the ~/.triton/cache corruption what if one of the build shell files clears that dir everytime someone tries to build trition? it's cache files anyways so cleaning it wouldn't break anything.

AngryLoki commented 3 months ago

You can set custom cache directory with TRITON_CACHE_DIR environment variable, by the way. Triton will still store some post-build garbage in ~/.triton/ due to https://github.com/ROCm/triton/blob/5e6e8227686334e7342db17bcdd0f8f5b3d6fcd7/python/triton/runtime/cache.py#L53-L62 , but at least some files will be not there.

lamikr commented 2 months ago

The .triton/cache directory is now cleaned on pre-config phase of aotriton build as a workaround. I also copy/pasted the error we are seeing to some upstream bug that was open and looked like a similar problem.

lamikr commented 1 month ago

Closing this.