[2.5] Triton Wheel build failed due to LLVM pin commit does not exist on centOS

Stonepia commented 2 months ago

Seems that OpenAI server does not have the former pre-built llvm package on centOS. Thus current CI would complain the following:

# Build Triton Wheel
downloading and extracting https://github.com/pybind/pybind11/archive/refs/tags/v2.13.1.tar.gz ...
downloading and extracting https://oaitriton.blob.core.windows.net/public/llvm-builds/llvm-ce80c80d-centos-x64.tar.gz ...
error: HTTP Error 404: The specified blob does not exist.

Traceback (most recent call last):
  File "/pytorch/.github/scripts/build_triton_wheel.py", line 225, in <module>
    main()
  File "/pytorch/.github/scripts/build_triton_wheel.py", line 212, in main
    build_triton(
  File "/pytorch/.github/scripts/build_triton_wheel.py", line 182, in build_triton
    check_call(
  File "/opt/python/cp38-cp38/lib/python3.8/subprocess.py", line 364, in check_call
    raise CalledProcessError(retcode, cmd)
subprocess.CalledProcessError: Command '['/opt/python/cp38-cp38/bin/python', 'setup.py', 'bdist_wheel']' returned non-zero exit status

This is because the llvm-hash.txt pins the llvm to ce80c80dca45c7b4636a3e143973e2c6cbdb2884.

The above pin commit has ubuntu version, but NO centos version. i.e.,

# CentOS Fail due to no blob
https://oaitriton.blob.core.windows.net/public/llvm-builds/llvm-ce80c80d-centos-x64.tar.gz
# Ubuntu Pass
 https://oaitriton.blob.core.windows.net/public/llvm-builds/llvm-ce80c80d-ubuntu-x64.tar.gz

As a reference, currently, CUDA/ROCM uses the pin commit as 10dc3a8e916d73291269e5e2b82dd22681489aa1. So they could correctly download the llvm package. We still need the centOS to follow PyTorch's CI convention.

For the full log, please refer to the CI page

Stonepia commented 2 months ago

Related issue on triton-lang/triton : https://github.com/triton-lang/triton/issues/4550

Stonepia commented 2 months ago

Thanks for the quick fix! We have verified and close this issue.

intel / intel-xpu-backend-for-triton

[2.5] Triton Wheel build failed due to LLVM pin commit does not exist on centOS #2095