triton-lang / triton

Development repository for the Triton language and compiler
https://triton-lang.org/
MIT License
13.26k stars 1.62k forks source link

Assertion `parentOp->getNumRegions() == 1 && parentOp->getRegion(0).getBlocks().size() == 1' failed #87089 #3513

Closed NavinKumarMNK closed 7 months ago

NavinKumarMNK commented 7 months ago

Your current environment

root@0fca177ad2d4:/workspace# python3 collect_env.py 
Collecting environment information...
PyTorch version: 2.1.2
Is debug build: False
CUDA used to build PyTorch: 12.2
ROCM used to build PyTorch: N/A

OS: Ubuntu 22.04.4 LTS (ppc64le)
GCC version: (Ubuntu 11.4.0-1ubuntu1~22.04) 11.4.0
Clang version: Could not collect
CMake version: version 3.29.0
Libc version: glibc-2.35

Python version: 3.10.13 | packaged by conda-forge | (main, Dec 23 2023, 16:04:32) [GCC 12.3.0] (64-bit runtime)
Python platform: Linux-5.15.0-100-generic-ppc64le-with-glibc2.35
Is CUDA available: True
CUDA runtime version: 12.2.91
CUDA_MODULE_LOADING set to: LAZY
GPU models and configuration: 
GPU 0: Tesla V100-SXM2-32GB
GPU 1: Tesla V100-SXM2-32GB
GPU 2: Tesla V100-SXM2-32GB
GPU 3: Tesla V100-SXM2-32GB

Nvidia driver version: 535.161.07
cuDNN version: Probably one of the following:
/usr/local/cuda-12.2/targets/ppc64le-linux/lib/libcudnn.so.8.9.5
/usr/local/cuda-12.2/targets/ppc64le-linux/lib/libcudnn_adv_infer.so.8.9.5
/usr/local/cuda-12.2/targets/ppc64le-linux/lib/libcudnn_adv_train.so.8.9.5
/usr/local/cuda-12.2/targets/ppc64le-linux/lib/libcudnn_cnn_infer.so.8.9.5
/usr/local/cuda-12.2/targets/ppc64le-linux/lib/libcudnn_cnn_train.so.8.9.5
/usr/local/cuda-12.2/targets/ppc64le-linux/lib/libcudnn_ops_infer.so.8.9.5
/usr/local/cuda-12.2/targets/ppc64le-linux/lib/libcudnn_ops_train.so.8.9.5
HIP runtime version: N/A
MIOpen runtime version: N/A
Is XNNPACK available: False

CPU:
Architecture:                       ppc64le
Byte Order:                         Little Endian
CPU(s):                             128
On-line CPU(s) list:                0-127
Model name:                         POWER9, altivec supported
Model:                              2.2 (pvr 004e 1202)
Thread(s) per core:                 4
Core(s) per socket:                 16
Socket(s):                          2
Frequency boost:                    enabled
CPU max MHz:                        3800.0000
CPU min MHz:                        2300.0000
L1d cache:                          1 MiB (32 instances)
L1i cache:                          1 MiB (32 instances)
L2 cache:                           8 MiB (16 instances)
L3 cache:                           160 MiB (16 instances)
NUMA node(s):                       6
NUMA node0 CPU(s):                  0-63
NUMA node8 CPU(s):                  64-127
NUMA node252 CPU(s):                
NUMA node253 CPU(s):                
NUMA node254 CPU(s):                
NUMA node255 CPU(s):                
Vulnerability Gather data sampling: Not affected
Vulnerability Itlb multihit:        Not affected
Vulnerability L1tf:                 Mitigation; RFI Flush, L1D private per thread
Vulnerability Mds:                  Not affected
Vulnerability Meltdown:             Mitigation; RFI Flush, L1D private per thread
Vulnerability Mmio stale data:      Not affected
Vulnerability Retbleed:             Not affected
Vulnerability Spec rstack overflow: Not affected
Vulnerability Spec store bypass:    Mitigation; Kernel entry/exit barrier (eieio)
Vulnerability Spectre v1:           Mitigation; __user pointer sanitization, ori31 speculation barrier enabled
Vulnerability Spectre v2:           Mitigation; Indirect branch serialisation (kernel only)
Vulnerability Srbds:                Not affected
Vulnerability Tsx async abort:      Not affected

Versions of relevant libraries:
[pip3] numpy==1.24.3
[pip3] torch==2.1.2
[conda] cudatoolkit               11.8.0              hedcfb66_13    conda-forge
[conda] libmagma                  2.7.2                he288b6c_2    conda-forge
[conda] libmagma_sparse           2.7.2                h5b5c57a_3    conda-forge
[conda] magma                     2.7.2                h097a1ca_3    conda-forge
[conda] numpy                     1.24.3          py310h87cc683_0  
[conda] numpy-base                1.24.3          py310hac71eb6_0  
[conda] torch                     2.1.2                     dev_0    <develop>ROCM Version: Could not collect
Neuron SDK Version: N/A
vLLM Version: 0.3.3
vLLM Build Flags:
CUDA Archs: 7.0; ROCm: Disabled; Neuron: Disabled
GPU Topology:
GPU0 GPU1 GPU2 GPU3 CPU Affinity NUMA Affinity GPU NUMA ID
GPU0  X  NV3 SYS SYS 0-63 0  N/A
GPU1 NV3  X  SYS SYS 0-63 0  N/A
GPU2 SYS SYS  X  NV3 64-127 8   N/A
GPU3 SYS SYS NV3  X  64-127 8   N/A

Legend:
  X    = Self
  SYS  = Connection traversing PCIe as well as the SMP interconnect between NUMA nodes (e.g., QPI/UPI)
  NODE = Connection traversing PCIe as well as the interconnect between PCIe Host Bridges within a NUMA node
  PHB  = Connection traversing PCIe as well as a PCIe Host Bridge (typically the CPU)
  PXB  = Connection traversing multiple PCIe bridges (without traversing the PCIe Host Bridge)
  PIX  = Connection traversing at most a single PCIe bridge
  NV#  = Connection traversing a bonded set of # NVLinks
$ llvm-config --version
17.0.0git

llvm build commit : c5dede880d175f7229c9b2923f4753e12702305d

build command

cmake -G Ninja ../llvm \
   -DLLVM_ENABLE_PROJECTS="mlir;llvm" \
   -DLLVM_BUILD_EXAMPLES=ON \
   -DLLVM_TARGETS_TO_BUILD="PowerPC;NVPTX;X86;AMDGPU;RISCV" \
   -DMLIR_ENABLE_CUDA_RUNNER=ON \
   -DCMAKE_BUILD_TYPE=Release \
   -DLLVM_ENABLE_ASSERTIONS=ON \
   -DCMAKE_C_COMPILER=clang \
   -DCMAKE_CXX_COMPILER=clang++ \
   -DLLVM_ENABLE_RTTI=ON \
   -DLLVM_INSTALL_UTILS=ON \
   -DMLIR_INCLUDE_INTEGRATION_TESTS=ON

Bug

example.py. - i loaded the mixtral-8x7b-instruct fp16 model

from vllm import LLM, SamplingParams
prompts = [
    "Hello, my name is",
    "The president of the United States is",
    "The capital of France is",
    "The future of AI is",
]
sampling_params = SamplingParams(temperature=0.8, top_p=0.95)

llm = LLM(
    model="./models", 
    dtype="float16", 
    tensor_parallel_size=4, 
    enforce_eager=True, 
    trust_remote_code=True, 
    load_format='safetensors',
    # quantization="AWQ",
)
root@0fca177ad2d4:/workspace# python3 example.py 
WARNING 03-29 15:24:46 config.py:686] Casting torch.bfloat16 to torch.float16.
2024-03-29 15:24:48,678 INFO worker.py:1612 -- Started a local Ray instance. View the dashboard at 127.0.0.1:8265 
INFO 03-29 15:24:52 llm_engine.py:68] Initializing an LLM engine (v0.3.3) with config: model='./models', tokenizer='./models', tokenizer_mode=auto, revision=None, tokenizer_revision=None, trust_remote_code=True, dtype=torch.float16, max_seq_len=32768, download_dir=None, load_format=safetensors, tensor_parallel_size=4, disable_custom_all_reduce=True, quantization=None, enforce_eager=True, kv_cache_dtype=auto, device_config=cuda, seed=0)
INFO 03-29 15:25:07 attention.py:67] flash_attn is not found. Using xformers backend.
(RayWorkerVllm pid=37294) INFO 03-29 15:25:07 attention.py:67] flash_attn is not found. Using xformers backend.
INFO 03-29 15:25:27 model_runner.py:97] Loading model weights took 21.7573 GB
(RayWorkerVllm pid=37294) INFO 03-29 15:25:39 model_runner.py:97] Loading model weights took 21.7573 GB
(RayWorkerVllm pid=37345) INFO 03-29 15:25:07 attention.py:67] flash_attn is not found. Using xformers backend. [repeated 2x across cluster] (Ray deduplicates logs by default. Set RAY_DEDUP_LOGS=0 to disable log deduplication, or see https://docs.ray.io/en/master/ray-observability/ray-logging.html#log-deduplication for more options.)
python3: /root/llvm-project/mlir/lib/Analysis/SliceAnalysis.cpp:106: void getBackwardSliceImpl(mlir::Operation *, SetVector<mlir::Operation *> *, mlir::TransitiveFilter): Assertion `parentOp->getNumRegions() == 1 && parentOp->getRegion(0).getBlocks().size() == 1' failed.
*** SIGABRT received at time=1711725941 on cpu 45 ***
PC: @     0x7e79d800866c  (unknown)  pthread_kill
    @     0x7e7143545984  613083184  absl::lts_20220623::AbslFailureSignalHandler()
    @     0x7e79d800870c        224  pthread_kill
    @     0x7e79d7fa1dfc         48  raise
    @     0x7e79d7f7d260        336  abort
    @     0x7e79d7f94ef0        192  (unknown)
    @     0x7e79d7f94f94         64  __assert_fail
    @     0x7e755cc4c8b8        112  getBackwardSliceImpl()
    @     0x7e755cc4c6f0        112  getBackwardSliceImpl()
    @     0x7e755cc4c5a8         64  mlir::getBackwardSlice()
    @     0x7e755c78bc10        384  mlir::multiRootGetSlice()
    @     0x7e755b235e7c        608  CoalescePass::getCoalescedEncoding()
    @     0x7e755b2375d8        256  CoalescePass::runOnOperation()::{lambda()#1}::operator()()
    @     0x7e755b238be0        480  mlir::detail::walk<>()
    @     0x7e755b238eac        320  CoalescePass::runOnOperation()
    @     0x7e755bd83c54        416  mlir::detail::OpToOpPassAdaptor::run()
    @     0x7e755bd84650        160  mlir::detail::OpToOpPassAdaptor::runPipeline()
    @     0x7e755bd878b0        368  mlir::PassManager::run()
    @     0x7e75599cf9cc        128  pybind11::cpp_function::initialize<>()::{lambda()#3}::_FUN()
    @     0x7e75599bacd4        848  pybind11::cpp_function::dispatcher()
    @      0x18e02ca5e40        112  cfunction_call
    @      0x18e02a3bc2c        160  _PyObject_MakeTpCall
    @      0x18e02c80134        160  method_vectorcall
    @      0x18e02a25738        480  _PyEval_EvalFrameDefault
    @      0x18e02b2a974         64  _PyEval_Vector
    @      0x18e02a3b9a0         32  _PyFunction_Vectorcall
    @      0x18e02a22b6c        480  _PyEval_EvalFrameDefault
    @      0x18e02b2a974         64  _PyEval_Vector
    @      0x18e02a3b9a0         32  _PyFunction_Vectorcall
    @      0x18e02a22b6c        480  _PyEval_EvalFrameDefault
    @      0x18e02b2a974         64  _PyEval_Vector
    @      0x18e02a3b9a0         32  _PyFunction_Vectorcall
    @      0x18e02a22224        480  _PyEval_EvalFrameDefault
    @ ... and at least 196 more frames
[2024-03-29 15:25:41,577 E 30164 30164] logging.cc:361: *** SIGABRT received at time=1711725941 on cpu 45 ***
[2024-03-29 15:25:41,577 E 30164 30164] logging.cc:361: PC: @     0x7e79d800866c  (unknown)  pthread_kill
[2024-03-29 15:25:41,581 E 30164 30164] logging.cc:361:     @     0x7e71435459b8  613083184  absl::lts_20220623::AbslFailureSignalHandler()
[2024-03-29 15:25:41,581 E 30164 30164] logging.cc:361:     @     0x7e79d800870c        224  pthread_kill
[2024-03-29 15:25:41,581 E 30164 30164] logging.cc:361:     @     0x7e79d7fa1dfc         48  raise
[2024-03-29 15:25:41,581 E 30164 30164] logging.cc:361:     @     0x7e79d7f7d260        336  abort
[2024-03-29 15:25:41,581 E 30164 30164] logging.cc:361:     @     0x7e79d7f94ef0        192  (unknown)
[2024-03-29 15:25:41,581 E 30164 30164] logging.cc:361:     @     0x7e79d7f94f94         64  __assert_fail
[2024-03-29 15:25:41,581 E 30164 30164] logging.cc:361:     @     0x7e755cc4c8b8        112  getBackwardSliceImpl()
[2024-03-29 15:25:41,581 E 30164 30164] logging.cc:361:     @     0x7e755cc4c6f0        112  getBackwardSliceImpl()
[2024-03-29 15:25:41,581 E 30164 30164] logging.cc:361:     @     0x7e755cc4c5a8         64  mlir::getBackwardSlice()
[2024-03-29 15:25:41,581 E 30164 30164] logging.cc:361:     @     0x7e755c78bc10        384  mlir::multiRootGetSlice()
[2024-03-29 15:25:41,581 E 30164 30164] logging.cc:361:     @     0x7e755b235e7c        608  CoalescePass::getCoalescedEncoding()
[2024-03-29 15:25:41,581 E 30164 30164] logging.cc:361:     @     0x7e755b2375d8        256  CoalescePass::runOnOperation()::{lambda()#1}::operator()()
[2024-03-29 15:25:41,581 E 30164 30164] logging.cc:361:     @     0x7e755b238be0        480  mlir::detail::walk<>()
[2024-03-29 15:25:41,581 E 30164 30164] logging.cc:361:     @     0x7e755b238eac        320  CoalescePass::runOnOperation()
[2024-03-29 15:25:41,581 E 30164 30164] logging.cc:361:     @     0x7e755bd83c54        416  mlir::detail::OpToOpPassAdaptor::run()
[2024-03-29 15:25:41,581 E 30164 30164] logging.cc:361:     @     0x7e755bd84650        160  mlir::detail::OpToOpPassAdaptor::runPipeline()
[2024-03-29 15:25:41,581 E 30164 30164] logging.cc:361:     @     0x7e755bd878b0        368  mlir::PassManager::run()
[2024-03-29 15:25:41,581 E 30164 30164] logging.cc:361:     @     0x7e75599cf9cc        128  pybind11::cpp_function::initialize<>()::{lambda()#3}::_FUN()
[2024-03-29 15:25:41,581 E 30164 30164] logging.cc:361:     @     0x7e75599bacd4        848  pybind11::cpp_function::dispatcher()
[2024-03-29 15:25:41,581 E 30164 30164] logging.cc:361:     @      0x18e02ca5e40        112  cfunction_call
[2024-03-29 15:25:41,581 E 30164 30164] logging.cc:361:     @      0x18e02a3bc2c        160  _PyObject_MakeTpCall
[2024-03-29 15:25:41,581 E 30164 30164] logging.cc:361:     @      0x18e02c80134        160  method_vectorcall
[2024-03-29 15:25:41,581 E 30164 30164] logging.cc:361:     @      0x18e02a25738        480  _PyEval_EvalFrameDefault
[2024-03-29 15:25:41,581 E 30164 30164] logging.cc:361:     @      0x18e02b2a974         64  _PyEval_Vector
[2024-03-29 15:25:41,581 E 30164 30164] logging.cc:361:     @      0x18e02a3b9a0         32  _PyFunction_Vectorcall
[2024-03-29 15:25:41,582 E 30164 30164] logging.cc:361:     @      0x18e02a22b6c        480  _PyEval_EvalFrameDefault
[2024-03-29 15:25:41,582 E 30164 30164] logging.cc:361:     @      0x18e02b2a974         64  _PyEval_Vector
[2024-03-29 15:25:41,582 E 30164 30164] logging.cc:361:     @      0x18e02a3b9a0         32  _PyFunction_Vectorcall
[2024-03-29 15:25:41,582 E 30164 30164] logging.cc:361:     @      0x18e02a22b6c        480  _PyEval_EvalFrameDefault
[2024-03-29 15:25:41,582 E 30164 30164] logging.cc:361:     @      0x18e02b2a974         64  _PyEval_Vector
[2024-03-29 15:25:41,582 E 30164 30164] logging.cc:361:     @      0x18e02a3b9a0         32  _PyFunction_Vectorcall
[2024-03-29 15:25:41,582 E 30164 30164] logging.cc:361:     @      0x18e02a22224        480  _PyEval_EvalFrameDefault
[2024-03-29 15:25:41,582 E 30164 30164] logging.cc:361:     @ ... and at least 196 more frames
Fatal Python error: Aborted

Stack (most recent call first):
  File "/root/triton/python/triton/compiler/compiler.py", line 91 in optimize_ttgir
  File "/root/triton/python/triton/compiler/compiler.py", line 383 in <lambda>
  File "/root/triton/python/triton/compiler/compiler.py", line 476 in compile
  File "<string>", line 63 in fused_moe_kernel
  File "/root/miniconda3/lib/python3.10/site-packages/vllm-0.3.3+cu122-py3.10-linux-ppc64le.egg/vllm/model_executor/layers/fused_moe/fused_moe.py", line 222 in invoke_fused_moe_kernel
  File "/root/miniconda3/lib/python3.10/site-packages/vllm-0.3.3+cu122-py3.10-linux-ppc64le.egg/vllm/model_executor/layers/fused_moe/fused_moe.py", line 397 in fused_moe
  File "/root/miniconda3/lib/python3.10/site-packages/vllm-0.3.3+cu122-py3.10-linux-ppc64le.egg/vllm/model_executor/models/mixtral.py", line 131 in forward
  File "/root/miniconda3/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1527 in _call_impl
  File "/root/miniconda3/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1518 in _wrapped_call_impl
  File "/root/miniconda3/lib/python3.10/site-packages/vllm-0.3.3+cu122-py3.10-linux-ppc64le.egg/vllm/model_executor/models/mixtral.py", line 278 in forward
  File "/root/miniconda3/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1527 in _call_impl
  File "/root/miniconda3/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1518 in _wrapped_call_impl
  File "/root/miniconda3/lib/python3.10/site-packages/vllm-0.3.3+cu122-py3.10-linux-ppc64le.egg/vllm/model_executor/models/mixtral.py", line 319 in forward
  File "/root/miniconda3/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1527 in _call_impl
  File "/root/miniconda3/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1518 in _wrapped_call_impl
  File "/root/miniconda3/lib/python3.10/site-packages/vllm-0.3.3+cu122-py3.10-linux-ppc64le.egg/vllm/model_executor/models/mixtral.py", line 383 in forward
  File "/root/miniconda3/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1527 in _call_impl
  File "/root/miniconda3/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1518 in _wrapped_call_impl
  File "/root/miniconda3/lib/python3.10/site-packages/vllm-0.3.3+cu122-py3.10-linux-ppc64le.egg/vllm/worker/model_runner.py", line 606 in execute_model
  File "/root/miniconda3/lib/python3.10/site-packages/torch/utils/_contextlib.py", line 115 in decorate_context
  File "/root/miniconda3/lib/python3.10/site-packages/vllm-0.3.3+cu122-py3.10-linux-ppc64le.egg/vllm/worker/model_runner.py", line 677 in profile_run
  File "/root/miniconda3/lib/python3.10/site-packages/torch/utils/_contextlib.py", line 115 in decorate_context
  File "/root/miniconda3/lib/python3.10/site-packages/vllm-0.3.3+cu122-py3.10-linux-ppc64le.egg/vllm/worker/worker.py", line 122 in profile_num_available_blocks
  File "/root/miniconda3/lib/python3.10/site-packages/torch/utils/_contextlib.py", line 115 in decorate_context
  File "/root/miniconda3/lib/python3.10/site-packages/vllm-0.3.3+cu122-py3.10-linux-ppc64le.egg/vllm/executor/ray_gpu_executor.py", line 318 in _run_workers
  File "/root/miniconda3/lib/python3.10/site-packages/vllm-0.3.3+cu122-py3.10-linux-ppc64le.egg/vllm/executor/ray_gpu_executor.py", line 221 in _init_cache
  File "/root/miniconda3/lib/python3.10/site-packages/vllm-0.3.3+cu122-py3.10-linux-ppc64le.egg/vllm/executor/ray_gpu_executor.py", line 63 in __init__
  File "/root/miniconda3/lib/python3.10/site-packages/vllm-0.3.3+cu122-py3.10-linux-ppc64le.egg/vllm/engine/llm_engine.py", line 103 in __init__
  File "/root/miniconda3/lib/python3.10/site-packages/vllm-0.3.3+cu122-py3.10-linux-ppc64le.egg/vllm/engine/llm_engine.py", line 146 in from_engine_args
  File "/root/miniconda3/lib/python3.10/site-packages/vllm-0.3.3+cu122-py3.10-linux-ppc64le.egg/vllm/entrypoints/llm.py", line 109 in __init__
  File "/workspace/example.py", line 10 in <module>

Extension modules: numpy.core._multiarray_umath, numpy.core._multiarray_tests, numpy.linalg._umath_linalg, numpy.fft._pocketfft_internal, numpy.random._common, numpy.random.bit_generator, numpy.random._bounded_integers, numpy.random._mt19937, numpy.random.mtrand, numpy.random._philox, numpy.random._pcg64, numpy.random._sfc64, numpy.random._generator, torch._C, torch._C._fft, torch._C._linalg, torch._C._nested, torch._C._nn, torch._C._sparse, torch._C._special, _brotli, yaml._yaml, sentencepiece._sentencepiece, psutil._psutil_linux, psutil._psutil_posix, msgpack._cmsgpack, google.protobuf.pyext._message, setproctitle, uvloop.loop, ray._raylet, grpc._cython.cygrpc, multidict._multidict, yarl._quoting_c, aiohttp._helpers, aiohttp._http_writer, aiohttp._http_parser, aiohttp._websocket, frozenlist._frozenlist, pydantic.typing, pydantic.errors, pydantic.version, pydantic.utils, pydantic.class_validators, pydantic.config, pydantic.color, pydantic.datetime_parse, pydantic.validators, pydantic.networks, pydantic.types, pydantic.json, pydantic.error_wrappers, pydantic.fields, pydantic.parse, pydantic.schema, pydantic.main, pydantic.dataclasses, pydantic.annotated_types, pydantic.decorator, pydantic.env_settings, pydantic.tools, pydantic, cupy_backends.cuda.api._runtime_enum, cupy_backends.cuda.api.runtime, cupy_backends.cuda.stream, cupy_backends.cuda.libs.cublas, cupy_backends.cuda.libs.cusolver, cupy_backends.cuda._softlink, cupy_backends.cuda.libs.cusparse, cupy._util, cupy.cuda.device, fastrlock.rlock, cupy.cuda.memory_hook, cupy.cuda.graph, cupy.cuda.stream, cupy_backends.cuda.api._driver_enum, cupy_backends.cuda.api.driver, cupy.cuda.memory, cupy._core.internal, cupy._core._carray, cupy.cuda.texture, cupy.cuda.function, cupy_backends.cuda.libs.nvrtc, cupy.cuda.jitify, cupy.cuda.pinned_memory, cupy_backends.cuda.libs.curand, cupy_backends.cuda.libs.profiler, cupy.cuda.common, cupy.cuda.cub, cupy_backends.cuda.libs.nvtx, cupy.cuda.thrust, cupy._core._dtype, cupy._core._scalar, cupy._core._accelerator, cupy._core._memory_range, cupy._core._fusion_thread_local, cupy._core._kernel, cupy._core._routines_manipulation, cupy._core._optimize_config, cupy._core._cub_reduction, cupy._core._reduction, cupy._core._routines_binary, cupy._core._routines_math, cupy._core._routines_indexing, cupy._core._routines_linalg, cupy._core._routines_logic, cupy._core._routines_sorting, cupy._core._routines_statistics, cupy._core.dlpack, cupy._core.flags, cupy._core.core, cupy._core._fusion_variable, cupy._core._fusion_trace, cupy._core._fusion_kernel, cupy._core.new_fusion, cupy._core.fusion, cupy._core.raw, cupyx.cusolver, scipy._lib._ccallback_c, numpy.linalg.lapack_lite, scipy.sparse._sparsetools, _csparsetools, scipy.sparse._csparsetools, scipy.linalg._fblas, scipy.linalg._flapack, scipy.linalg.cython_lapack, scipy.linalg._cythonized_array_utils, scipy.linalg._solve_toeplitz, scipy.linalg._flinalg, scipy.linalg._decomp_lu_cython, scipy.linalg._matfuncs_sqrtm_triu, scipy.linalg.cython_blas, scipy.linalg._matfuncs_expm, scipy.linalg._decomp_update, scipy.sparse.linalg._dsolve._superlu, scipy.sparse.linalg._eigen.arpack._arpack, scipy.sparse.csgraph._tools, scipy.sparse.csgraph._shortest_path, scipy.sparse.csgraph._traversal, scipy.sparse.csgraph._min_spanning_tree, scipy.sparse.csgraph._flow, scipy.sparse.csgraph._matching, scipy.sparse.csgraph._reordering, cupy.cuda.cufft, cupy.fft._cache, cupy.fft._callback, cupy.random._generator_api, cupy.random._bit_generator, scipy._lib._uarray._uarray, scipy.special._ufuncs_cxx, scipy.special._ufuncs, scipy.special._specfun, scipy.special._comb, scipy.special._ellip_harm_2, cupy.lib._polynomial, cupy_backends.cuda.libs.nccl, zstandard.backend_c, scipy.optimize._minpack2, scipy.optimize._group_columns, scipy._lib.messagestream, scipy.optimize._trlib._trlib, scipy.optimize._lbfgsb, _moduleTNC, scipy.optimize._moduleTNC, scipy.optimize._cobyla, scipy.optimize._slsqp, scipy.optimize._minpack, scipy.optimize._lsq.givens_elimination, scipy.optimize._zeros, scipy.optimize._highs.cython.src._highs_wrapper, scipy.optimize._highs._highs_wrapper, scipy.optimize._highs.cython.src._highs_constants, scipy.optimize._highs._highs_constants, scipy.linalg._interpolative, scipy.optimize._bglu_dense, scipy.optimize._lsap, scipy.spatial._ckdtree, scipy.spatial._qhull, scipy.spatial._voronoi, scipy.spatial._distance_wrap, scipy.spatial._hausdorff, scipy.spatial.transform._rotation, scipy.optimize._direct (total: 182)
Aborted (core dumped)

I built triton==2.1.0 from source. my fork of triton : https://github.com/NavinKumarMNK/triton that i used to build from source.

llvm installation

RUN git checkout c5dede880d175f7229c9b2923f4753e12702305d
RUN mkdir build
WORKDIR ./build

RUN cmake -G Ninja ../llvm \
   -DLLVM_ENABLE_PROJECTS="mlir;llvm" \
   -DLLVM_BUILD_EXAMPLES=ON \
   -DLLVM_TARGETS_TO_BUILD="PowerPC;NVPTX;X86;AMDGPU;RISCV" \
   -DMLIR_ENABLE_CUDA_RUNNER=ON \
   -DCMAKE_BUILD_TYPE=Release \
   -DLLVM_ENABLE_ASSERTIONS=ON \
   -DCMAKE_C_COMPILER=clang \
   -DCMAKE_CXX_COMPILER=clang++ \
   -DLLVM_ENABLE_RTTI=ON \
   -DLLVM_INSTALL_UTILS=ON \
   -DMLIR_INCLUDE_INTEGRATION_TESTS=ON

RUN ninja && ninja install

triton installation from my fork

RUN cmake .. && make
RUN mv libtriton.so /root/triton/python/triton/_C/
WORKDIR /root/triton/python
RUN python3 download_ptxas.py

I get this error while only running Mixtral MoE models. others models are supported. i raised this issue in both vllm & llvm. https://github.com/vllm-project/vllm/issues/3732 https://github.com/llvm/llvm-project/issues/87089

Thank you. let me know if i can give anymore details.

jlebar commented 7 months ago

This branch is 3 commits ahead of, 860 commits behind openai/triton:main.

Please reproduce with Triton at HEAD; we can't debug something that's months old and could well be fixed at HEAD.

Also please include the output of running Triton with MLIR_ENABLE_DUMP=1 so that we can reproduce this without installing Mixtral.

I would suggest closing the bugs you opened against Mixtral and LLVM, this is extremely unlikely to be a bug in those projects.

NavinKumarMNK commented 7 months ago

Thanks! alright, so this might be the bug in triton. i will close them. i will update with the results asap

NavinKumarMNK commented 7 months ago

The output with MLIR_ENABLE_DUMP=1 is so big, so i am attaching it as txt file sha256: 39fd4760333ba94d4fd7af76bf9598c15065f476d5efef5168dbe7a67dcc11ad mlir_output.txt let me know if anything else is needed.

NavinKumarMNK commented 7 months ago

while building triton HEAD commit with llvm pinned commit

-- Found Python: /root/miniconda3/bin/python3.10 (found version "3.10.13") found components: Interpreter 
-- Performing Test CMAKE_HAVE_LIBC_PTHREAD
-- Performing Test CMAKE_HAVE_LIBC_PTHREAD - Success
-- Found Threads: TRUE  
-- Configuring done (8.2s)
-- Generating done (0.3s)
-- Build files have been written to: /root/triton/build
[  0%] Building AttrInterfaces.cpp.inc...
[  1%] Building AttrInterfaces.h.inc...
[  1%] Building Ops.h.inc...
[  2%] Building Ops.cpp.inc...
[  2%] Building OpsEnums.h.inc...
[  3%] Building OpsEnums.cpp.inc...
[  3%] Building Dialect.h.inc...
[  4%] Building Dialect.cpp.inc...
[  4%] Building Types.h.inc...
[  6%] Building Types.cpp.inc...
[  6%] Building TritonTypeInterfaces.h.inc...
[  7%] Building TritonTypeInterfaces.cpp.inc...
[  7%] Built target TritonTableGen
[  7%] Building CXX object lib/Dialect/Triton/IR/CMakeFiles/TritonIR.dir/Dialect.cpp.o
[  8%] Building CXX object lib/Dialect/Triton/IR/CMakeFiles/TritonIR.dir/Ops.cpp.o
In file included from /root/triton/include/triton/Dialect/TritonGPU/IR/Attributes.h:4,
                 from /root/triton/lib/Dialect/Triton/IR/Ops.cpp:10:
/root/triton/include/triton/Dialect/TritonGPU/IR/TritonGPUInterfaces.h:4:10: fatal error: triton/Dialect/TritonGPU/IR/TritonGPUAttrInterfaces.h.inc: No such file or directory
    4 | #include "triton/Dialect/TritonGPU/IR/TritonGPUAttrInterfaces.h.inc"
      |          ^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
compilation terminated.
make[2]: *** [lib/Dialect/Triton/IR/CMakeFiles/TritonIR.dir/build.make:90: lib/Dialect/Triton/IR/CMakeFiles/TritonIR.dir/Ops.cpp.o] Error 1
make[1]: *** [CMakeFiles/Makefile2:2587: lib/Dialect/Triton/IR/CMakeFiles/TritonIR.dir/all] Error 2
make: *** [Makefile:146: all] Error 2
root@9666068f9a2d:~/triton/build# 
jlebar commented 7 months ago

The README explains why this does not work.

NavinKumarMNK commented 7 months ago

i couldn't get you. how this is related to anything mentioned in README.md. can you give me some more clarity

jlebar commented 7 months ago

Please see the third sentence in the README section which explains how to build with a custom LLVM. I assume you read this section because you figured out how to build Triton with a custom LLVM. It will explain why you're getting an error when you use a pinned LLVM version.

jlebar commented 7 months ago

Actually something else may be wrong.

/root/triton/include/triton/Dialect/TritonGPU/IR/TritonGPUInterfaces.h:4:10: fatal error: triton/Dialect/TritonGPU/IR/TritonGPUAttrInterfaces.h.inc: No such file or directory
    4 | #include "triton/Dialect/TritonGPU/IR/TritonGPUAttrInterfaces.h.inc"

I don't know why you're getting this error. Perhaps you need to do a clean build.

NavinKumarMNK commented 7 months ago

can you say what a clean built means. and how the term clean can violated in normal cases.

i have done the clean build

FROM pytorch AS pytorch-triton
WORKDIR /root/

RUN git clone https://github.com/llvm/llvm-project
RUN apt-get install -y clang lld

WORKDIR /root/llvm-project
RUN git checkout 6f44bb7717897191be25aa01161831c67cdf5b84

RUN mkdir build
WORKDIR ./build

RUN rm -rf /root/miniconda3/lib/libz.so.1 && \
    rm -rf /root/miniconda3/lib/libxml2.so.2

RUN cmake -G Ninja -DCMAKE_BUILD_TYPE=Release -DLLVM_ENABLE_ASSERTIONS=ON  ../llvm -DLLVM_ENABLE_PROJECTS="mlir;llvm"

RUN ninja && ninja install

RUN rm -rf /usr/lib/llvm-14/
ENV LLVM_EXTERNAL_LIT=/root/llvm-project/build/bin/llvm-lit
ENV LLVM_BUILD_DIR=/root/llvm-project/build/

WORKDIR /root/
RUN git clone https://github.com/pybind/pybind11.git
RUN pip install pytest
WORKDIR /root/pybind11
RUN mkdir build
WORKDIR /root/pybind11/build
RUN cmake .. && make check && make install

# install triton
WORKDIR /root/
RUN git clone https://github.com/NavinKumarMNK/triton.git.  # (current HEAD)
WORKDIR /root/triton/
RUN mkdir build
WORKDIR ./build
RUN cmake .. && make
jlebar commented 7 months ago
RUN git clone https://github.com/NavinKumarMNK/triton.git.  # (current HEAD)
WORKDIR /root/triton/
RUN mkdir build
WORKDIR ./build
RUN cmake .. && make

This does not match the instructions for building Triton from source https://github.com/openai/triton?tab=readme-ov-file#install-from-source.

I have limited time and I need to go assist other users with their issues. I recommend you visit the Triton Slack, which is a good place to go for people who need extra assistance.

NavinKumarMNK commented 7 months ago

alright thanks. installing thought setup.py has some problem since there is no ppc64le support. i am fixing it. and i will try to get back to you after all were done.

NavinKumarMNK commented 7 months ago

I successfully installed triton (HEAD commit) with its pinned llvm with the same procedure. i endup getting the below issue as error https://github.com/openai/triton/issues/3448

joker-eph commented 7 months ago

/root/triton/include/triton/Dialect/TritonGPU/IR/TritonGPUInterfaces.h:4:10: fatal error: > triton/Dialect/TritonGPU/IR/TritonGPUAttrInterfaces.h.inc: No such file or directory 4 | #include "triton/Dialect/TritonGPU/IR/TritonGPUAttrInterfaces.h.inc"

This is typically a missing CMake dependency. This file is generated by TritonGPUAttrDefsIncGen and every file that transitively include it should also have a DEPENDS to this target (or TritonGPUIR)

Unfortunately you're not providing enough context for this error for me to easily pin point where is it missing.

joker-eph commented 7 months ago

Seems like TritonIR in lib/Dialect/Triton/IR/Ops.cpp has #include "triton/Dialect/TritonGPU/IR/Attributes.h" but does not specify a DEPENDS on anything from GPUIR I believe.

joker-eph commented 7 months ago

By the way, recent ninja version can detect this with ninja -t missingdeps after a successful clean build.

NavinKumarMNK commented 7 months ago

This is first time I am working in this area. I am not clear whats happening. Thats the reason i couldn't provide much details. Let me know in which way i can help.

I hope below log is the what you need. I am not using AMDGPU tho

root@0be44c460d94:~/triton/python/build/cmake.linux-ppc64le-cpython-3.10# ninja -t missingdeps
Missing dep: lib/Dialect/Triton/IR/CMakeFiles/TritonIR.dir/Ops.cpp.o uses /root/triton/python/build/cmake.linux-ppc64le-cpython-3.10/include/triton/Dialect/TritonGPU/IR/TritonGPUAttrDefs.h.inc (generated by CUSTOM_COMMAND)
Missing dep: lib/Dialect/Triton/IR/CMakeFiles/TritonIR.dir/Ops.cpp.o uses /root/triton/python/build/cmake.linux-ppc64le-cpython-3.10/include/triton/Dialect/TritonGPU/IR/TritonGPUAttrInterfaces.h.inc (generated by CUSTOM_COMMAND)
Missing dep: lib/Dialect/Triton/IR/CMakeFiles/TritonIR.dir/Traits.cpp.o uses /root/triton/python/build/cmake.linux-ppc64le-cpython-3.10/include/triton/Dialect/TritonGPU/IR/Dialect.h.inc (generated by CUSTOM_COMMAND)
Missing dep: lib/Dialect/Triton/IR/CMakeFiles/TritonIR.dir/Traits.cpp.o uses /root/triton/python/build/cmake.linux-ppc64le-cpython-3.10/include/triton/Dialect/TritonGPU/IR/Ops.h.inc (generated by CUSTOM_COMMAND)
Missing dep: lib/Dialect/Triton/IR/CMakeFiles/TritonIR.dir/Traits.cpp.o uses /root/triton/python/build/cmake.linux-ppc64le-cpython-3.10/include/triton/Dialect/TritonGPU/IR/Types.h.inc (generated by CUSTOM_COMMAND)
Missing dep: lib/Dialect/Triton/IR/CMakeFiles/TritonIR.dir/Traits.cpp.o uses /root/triton/python/build/cmake.linux-ppc64le-cpython-3.10/include/triton/Dialect/TritonGPU/IR/TritonGPUAttrDefs.h.inc (generated by CUSTOM_COMMAND)
Missing dep: lib/Dialect/Triton/IR/CMakeFiles/TritonIR.dir/Traits.cpp.o uses /root/triton/python/build/cmake.linux-ppc64le-cpython-3.10/include/triton/Dialect/TritonGPU/IR/TritonGPUAttrInterfaces.h.inc (generated by CUSTOM_COMMAND)
Missing dep: lib/Dialect/Triton/Transforms/CMakeFiles/TritonTransforms.dir/Combine.cpp.o uses /root/triton/python/build/cmake.linux-ppc64le-cpython-3.10/include/triton/Dialect/TritonGPU/IR/Dialect.h.inc (generated by CUSTOM_COMMAND)
Missing dep: lib/Dialect/Triton/Transforms/CMakeFiles/TritonTransforms.dir/Combine.cpp.o uses /root/triton/python/build/cmake.linux-ppc64le-cpython-3.10/include/triton/Dialect/TritonGPU/IR/Ops.h.inc (generated by CUSTOM_COMMAND)
Missing dep: lib/Dialect/Triton/Transforms/CMakeFiles/TritonTransforms.dir/Combine.cpp.o uses /root/triton/python/build/cmake.linux-ppc64le-cpython-3.10/include/triton/Dialect/TritonGPU/IR/Types.h.inc (generated by CUSTOM_COMMAND)
Missing dep: lib/Dialect/Triton/Transforms/CMakeFiles/TritonTransforms.dir/Combine.cpp.o uses /root/triton/python/build/cmake.linux-ppc64le-cpython-3.10/include/triton/Dialect/TritonGPU/IR/TritonGPUAttrDefs.h.inc (generated by CUSTOM_COMMAND)
Missing dep: lib/Dialect/Triton/Transforms/CMakeFiles/TritonTransforms.dir/Combine.cpp.o uses /root/triton/python/build/cmake.linux-ppc64le-cpython-3.10/include/triton/Dialect/TritonGPU/IR/TritonGPUAttrInterfaces.h.inc (generated by CUSTOM_COMMAND)
Missing dep: lib/Dialect/Triton/Transforms/CMakeFiles/TritonTransforms.dir/RewriteTensorPointer.cpp.o uses /root/triton/python/build/cmake.linux-ppc64le-cpython-3.10/include/triton/Dialect/TritonGPU/IR/Dialect.h.inc (generated by CUSTOM_COMMAND)
Missing dep: lib/Dialect/Triton/Transforms/CMakeFiles/TritonTransforms.dir/RewriteTensorPointer.cpp.o uses /root/triton/python/build/cmake.linux-ppc64le-cpython-3.10/include/triton/Dialect/TritonGPU/IR/Ops.h.inc (generated by CUSTOM_COMMAND)
Missing dep: lib/Dialect/Triton/Transforms/CMakeFiles/TritonTransforms.dir/RewriteTensorPointer.cpp.o uses /root/triton/python/build/cmake.linux-ppc64le-cpython-3.10/include/triton/Dialect/TritonGPU/IR/Types.h.inc (generated by CUSTOM_COMMAND)
Missing dep: lib/Dialect/Triton/Transforms/CMakeFiles/TritonTransforms.dir/RewriteTensorPointer.cpp.o uses /root/triton/python/build/cmake.linux-ppc64le-cpython-3.10/include/triton/Dialect/TritonGPU/IR/TritonGPUAttrDefs.h.inc (generated by CUSTOM_COMMAND)
Missing dep: lib/Dialect/Triton/Transforms/CMakeFiles/TritonTransforms.dir/RewriteTensorPointer.cpp.o uses /root/triton/python/build/cmake.linux-ppc64le-cpython-3.10/include/triton/Dialect/TritonGPU/IR/TritonGPUAttrInterfaces.h.inc (generated by CUSTOM_COMMAND)
Missing dep: third_party/nvidia/lib/NVGPUToLLVM/CMakeFiles/NVGPUToLLVM.dir/NVGPUToLLVMPass.cpp.o uses /root/triton/python/build/cmake.linux-ppc64le-cpython-3.10/include/triton/Dialect/Triton/IR/Dialect.h.inc (generated by CUSTOM_COMMAND)
Missing dep: third_party/nvidia/lib/NVGPUToLLVM/CMakeFiles/NVGPUToLLVM.dir/NVGPUToLLVMPass.cpp.o uses /root/triton/python/build/cmake.linux-ppc64le-cpython-3.10/include/triton/Dialect/Triton/IR/Ops.h.inc (generated by CUSTOM_COMMAND)
Missing dep: third_party/nvidia/lib/NVGPUToLLVM/CMakeFiles/NVGPUToLLVM.dir/NVGPUToLLVMPass.cpp.o uses /root/triton/python/build/cmake.linux-ppc64le-cpython-3.10/include/triton/Dialect/Triton/IR/OpsEnums.h.inc (generated by CUSTOM_COMMAND)
Missing dep: third_party/nvidia/lib/NVGPUToLLVM/CMakeFiles/NVGPUToLLVM.dir/NVGPUToLLVMPass.cpp.o uses /root/triton/python/build/cmake.linux-ppc64le-cpython-3.10/include/triton/Dialect/Triton/IR/TritonTypeInterfaces.h.inc (generated by CUSTOM_COMMAND)
Missing dep: third_party/nvidia/lib/NVGPUToLLVM/CMakeFiles/NVGPUToLLVM.dir/NVGPUToLLVMPass.cpp.o uses /root/triton/python/build/cmake.linux-ppc64le-cpython-3.10/include/triton/Dialect/Triton/IR/Types.h.inc (generated by CUSTOM_COMMAND)
Missing dep: third_party/nvidia/lib/NVGPUToLLVM/CMakeFiles/NVGPUToLLVM.dir/NVGPUToLLVMPass.cpp.o uses /root/triton/python/build/cmake.linux-ppc64le-cpython-3.10/include/triton/Dialect/TritonGPU/IR/Dialect.h.inc (generated by CUSTOM_COMMAND)
Missing dep: third_party/nvidia/lib/NVGPUToLLVM/CMakeFiles/NVGPUToLLVM.dir/NVGPUToLLVMPass.cpp.o uses /root/triton/python/build/cmake.linux-ppc64le-cpython-3.10/include/triton/Dialect/TritonGPU/IR/Ops.h.inc (generated by CUSTOM_COMMAND)
Missing dep: third_party/nvidia/lib/NVGPUToLLVM/CMakeFiles/NVGPUToLLVM.dir/NVGPUToLLVMPass.cpp.o uses /root/triton/python/build/cmake.linux-ppc64le-cpython-3.10/include/triton/Dialect/TritonGPU/IR/Types.h.inc (generated by CUSTOM_COMMAND)
Missing dep: third_party/nvidia/lib/NVGPUToLLVM/CMakeFiles/NVGPUToLLVM.dir/NVGPUToLLVMPass.cpp.o uses /root/triton/python/build/cmake.linux-ppc64le-cpython-3.10/include/triton/Dialect/TritonGPU/IR/TritonGPUAttrDefs.h.inc (generated by CUSTOM_COMMAND)
Missing dep: third_party/nvidia/lib/NVGPUToLLVM/CMakeFiles/NVGPUToLLVM.dir/NVGPUToLLVMPass.cpp.o uses /root/triton/python/build/cmake.linux-ppc64le-cpython-3.10/include/triton/Dialect/TritonGPU/IR/TritonGPUAttrInterfaces.h.inc (generated by CUSTOM_COMMAND)
Missing dep: third_party/nvidia/lib/NVGPUToLLVM/CMakeFiles/NVGPUToLLVM.dir/NVGPUToLLVMPass.cpp.o uses /root/triton/python/build/cmake.linux-ppc64le-cpython-3.10/include/triton/Dialect/TritonNvidiaGPU/IR/Dialect.h.inc (generated by CUSTOM_COMMAND)
Missing dep: third_party/nvidia/lib/NVGPUToLLVM/CMakeFiles/NVGPUToLLVM.dir/NVGPUToLLVMPass.cpp.o uses /root/triton/python/build/cmake.linux-ppc64le-cpython-3.10/include/triton/Dialect/TritonNvidiaGPU/IR/Ops.h.inc (generated by CUSTOM_COMMAND)
Missing dep: third_party/nvidia/lib/NVGPUToLLVM/CMakeFiles/NVGPUToLLVM.dir/NVGPUToLLVMPass.cpp.o uses /root/triton/python/build/cmake.linux-ppc64le-cpython-3.10/include/triton/Dialect/TritonNvidiaGPU/IR/Types.h.inc (generated by CUSTOM_COMMAND)
Missing dep: third_party/nvidia/lib/NVGPUToLLVM/CMakeFiles/NVGPUToLLVM.dir/NVGPUToLLVMPass.cpp.o uses /root/triton/python/build/cmake.linux-ppc64le-cpython-3.10/include/triton/Dialect/TritonNvidiaGPU/IR/TritonNvidiaGPUAttrDefs.h.inc (generated by CUSTOM_COMMAND)
Missing dep: third_party/nvidia/lib/NVGPUToLLVM/CMakeFiles/NVGPUToLLVM.dir/NVGPUToLLVMPass.cpp.o uses /root/triton/python/build/cmake.linux-ppc64le-cpython-3.10/include/triton/Dialect/NVGPU/IR/Dialect.h.inc (generated by CUSTOM_COMMAND)
Missing dep: third_party/nvidia/lib/NVGPUToLLVM/CMakeFiles/NVGPUToLLVM.dir/NVGPUToLLVMPass.cpp.o uses /root/triton/python/build/cmake.linux-ppc64le-cpython-3.10/include/triton/Dialect/NVGPU/IR/Ops.h.inc (generated by CUSTOM_COMMAND)
Missing dep: third_party/nvidia/lib/NVGPUToLLVM/CMakeFiles/NVGPUToLLVM.dir/NVGPUToLLVMPass.cpp.o uses /root/triton/python/build/cmake.linux-ppc64le-cpython-3.10/include/triton/Dialect/NVGPU/IR/OpsEnums.h.inc (generated by CUSTOM_COMMAND)
Missing dep: third_party/nvidia/lib/NVGPUToLLVM/CMakeFiles/NVGPUToLLVM.dir/NVGPUToLLVMPass.cpp.o uses /root/triton/python/build/cmake.linux-ppc64le-cpython-3.10/include/triton/Dialect/NVGPU/IR/NVGPUAttrDefs.h.inc (generated by CUSTOM_COMMAND)
Missing dep: third_party/amd/lib/TritonAMDGPUTransforms/CMakeFiles/TritonAMDGPUTransforms.dir/AccelerateAMDMatmul.cpp.o uses /root/triton/python/build/cmake.linux-ppc64le-cpython-3.10/include/triton/Dialect/Triton/IR/Dialect.h.inc (generated by CUSTOM_COMMAND)
Missing dep: third_party/amd/lib/TritonAMDGPUTransforms/CMakeFiles/TritonAMDGPUTransforms.dir/AccelerateAMDMatmul.cpp.o uses /root/triton/python/build/cmake.linux-ppc64le-cpython-3.10/include/triton/Dialect/Triton/IR/Ops.h.inc (generated by CUSTOM_COMMAND)
Missing dep: third_party/amd/lib/TritonAMDGPUTransforms/CMakeFiles/TritonAMDGPUTransforms.dir/AccelerateAMDMatmul.cpp.o uses /root/triton/python/build/cmake.linux-ppc64le-cpython-3.10/include/triton/Dialect/Triton/IR/OpsEnums.h.inc (generated by CUSTOM_COMMAND)
Missing dep: third_party/amd/lib/TritonAMDGPUTransforms/CMakeFiles/TritonAMDGPUTransforms.dir/AccelerateAMDMatmul.cpp.o uses /root/triton/python/build/cmake.linux-ppc64le-cpython-3.10/include/triton/Dialect/Triton/IR/TritonTypeInterfaces.h.inc (generated by CUSTOM_COMMAND)
Missing dep: third_party/amd/lib/TritonAMDGPUTransforms/CMakeFiles/TritonAMDGPUTransforms.dir/AccelerateAMDMatmul.cpp.o uses /root/triton/python/build/cmake.linux-ppc64le-cpython-3.10/include/triton/Dialect/Triton/IR/Types.h.inc (generated by CUSTOM_COMMAND)
Missing dep: third_party/amd/lib/TritonAMDGPUTransforms/CMakeFiles/TritonAMDGPUTransforms.dir/AccelerateAMDMatmul.cpp.o uses /root/triton/python/build/cmake.linux-ppc64le-cpython-3.10/include/triton/Dialect/TritonGPU/IR/Dialect.h.inc (generated by CUSTOM_COMMAND)
Missing dep: third_party/amd/lib/TritonAMDGPUTransforms/CMakeFiles/TritonAMDGPUTransforms.dir/AccelerateAMDMatmul.cpp.o uses /root/triton/python/build/cmake.linux-ppc64le-cpython-3.10/include/triton/Dialect/TritonGPU/IR/Ops.h.inc (generated by CUSTOM_COMMAND)
Missing dep: third_party/amd/lib/TritonAMDGPUTransforms/CMakeFiles/TritonAMDGPUTransforms.dir/AccelerateAMDMatmul.cpp.o uses /root/triton/python/build/cmake.linux-ppc64le-cpython-3.10/include/triton/Dialect/TritonGPU/IR/Types.h.inc (generated by CUSTOM_COMMAND)
Missing dep: third_party/amd/lib/TritonAMDGPUTransforms/CMakeFiles/TritonAMDGPUTransforms.dir/AccelerateAMDMatmul.cpp.o uses /root/triton/python/build/cmake.linux-ppc64le-cpython-3.10/include/triton/Dialect/TritonGPU/IR/TritonGPUAttrDefs.h.inc (generated by CUSTOM_COMMAND)
Missing dep: third_party/amd/lib/TritonAMDGPUTransforms/CMakeFiles/TritonAMDGPUTransforms.dir/AccelerateAMDMatmul.cpp.o uses /root/triton/python/build/cmake.linux-ppc64le-cpython-3.10/include/triton/Dialect/TritonGPU/IR/TritonGPUAttrInterfaces.h.inc (generated by CUSTOM_COMMAND)
Missing dep: third_party/amd/lib/TritonAMDGPUTransforms/CMakeFiles/TritonAMDGPUTransforms.dir/AccelerateAMDMatmul.cpp.o uses /root/triton/python/build/cmake.linux-ppc64le-cpython-3.10/include/triton/Dialect/TritonGPU/Transforms/Passes.h.inc (generated by CUSTOM_COMMAND)
Missing dep: third_party/amd/lib/TritonAMDGPUTransforms/CMakeFiles/TritonAMDGPUTransforms.dir/AccelerateAMDMatmul.cpp.o uses /root/triton/python/build/cmake.linux-ppc64le-cpython-3.10/include/triton/Dialect/TritonNvidiaGPU/IR/Dialect.h.inc (generated by CUSTOM_COMMAND)
Missing dep: third_party/amd/lib/TritonAMDGPUTransforms/CMakeFiles/TritonAMDGPUTransforms.dir/AccelerateAMDMatmul.cpp.o uses /root/triton/python/build/cmake.linux-ppc64le-cpython-3.10/include/triton/Dialect/TritonNvidiaGPU/IR/Ops.h.inc (generated by CUSTOM_COMMAND)
Missing dep: third_party/amd/lib/TritonAMDGPUTransforms/CMakeFiles/TritonAMDGPUTransforms.dir/AccelerateAMDMatmul.cpp.o uses /root/triton/python/build/cmake.linux-ppc64le-cpython-3.10/include/triton/Dialect/TritonNvidiaGPU/IR/Types.h.inc (generated by CUSTOM_COMMAND)
Missing dep: third_party/amd/lib/TritonAMDGPUTransforms/CMakeFiles/TritonAMDGPUTransforms.dir/AccelerateAMDMatmul.cpp.o uses /root/triton/python/build/cmake.linux-ppc64le-cpython-3.10/include/triton/Dialect/TritonNvidiaGPU/IR/TritonNvidiaGPUAttrDefs.h.inc (generated by CUSTOM_COMMAND)
Missing dep: third_party/amd/lib/TritonAMDGPUTransforms/CMakeFiles/TritonAMDGPUTransforms.dir/DecomposeConversions.cpp.o uses /root/triton/python/build/cmake.linux-ppc64le-cpython-3.10/include/triton/Dialect/Triton/IR/Dialect.h.inc (generated by CUSTOM_COMMAND)
Missing dep: third_party/amd/lib/TritonAMDGPUTransforms/CMakeFiles/TritonAMDGPUTransforms.dir/DecomposeConversions.cpp.o uses /root/triton/python/build/cmake.linux-ppc64le-cpython-3.10/include/triton/Dialect/Triton/IR/Ops.h.inc (generated by CUSTOM_COMMAND)
Missing dep: third_party/amd/lib/TritonAMDGPUTransforms/CMakeFiles/TritonAMDGPUTransforms.dir/DecomposeConversions.cpp.o uses /root/triton/python/build/cmake.linux-ppc64le-cpython-3.10/include/triton/Dialect/Triton/IR/OpsEnums.h.inc (generated by CUSTOM_COMMAND)
Missing dep: third_party/amd/lib/TritonAMDGPUTransforms/CMakeFiles/TritonAMDGPUTransforms.dir/DecomposeConversions.cpp.o uses /root/triton/python/build/cmake.linux-ppc64le-cpython-3.10/include/triton/Dialect/Triton/IR/TritonTypeInterfaces.h.inc (generated by CUSTOM_COMMAND)
Missing dep: third_party/amd/lib/TritonAMDGPUTransforms/CMakeFiles/TritonAMDGPUTransforms.dir/DecomposeConversions.cpp.o uses /root/triton/python/build/cmake.linux-ppc64le-cpython-3.10/include/triton/Dialect/Triton/IR/Types.h.inc (generated by CUSTOM_COMMAND)
Missing dep: third_party/amd/lib/TritonAMDGPUTransforms/CMakeFiles/TritonAMDGPUTransforms.dir/DecomposeConversions.cpp.o uses /root/triton/python/build/cmake.linux-ppc64le-cpython-3.10/include/triton/Dialect/TritonGPU/IR/Dialect.h.inc (generated by CUSTOM_COMMAND)
Missing dep: third_party/amd/lib/TritonAMDGPUTransforms/CMakeFiles/TritonAMDGPUTransforms.dir/DecomposeConversions.cpp.o uses /root/triton/python/build/cmake.linux-ppc64le-cpython-3.10/include/triton/Dialect/TritonGPU/IR/Ops.h.inc (generated by CUSTOM_COMMAND)
Missing dep: third_party/amd/lib/TritonAMDGPUTransforms/CMakeFiles/TritonAMDGPUTransforms.dir/DecomposeConversions.cpp.o uses /root/triton/python/build/cmake.linux-ppc64le-cpython-3.10/include/triton/Dialect/TritonGPU/IR/Types.h.inc (generated by CUSTOM_COMMAND)
Missing dep: third_party/amd/lib/TritonAMDGPUTransforms/CMakeFiles/TritonAMDGPUTransforms.dir/DecomposeConversions.cpp.o uses /root/triton/python/build/cmake.linux-ppc64le-cpython-3.10/include/triton/Dialect/TritonGPU/IR/TritonGPUAttrDefs.h.inc (generated by CUSTOM_COMMAND)
Missing dep: third_party/amd/lib/TritonAMDGPUTransforms/CMakeFiles/TritonAMDGPUTransforms.dir/DecomposeConversions.cpp.o uses /root/triton/python/build/cmake.linux-ppc64le-cpython-3.10/include/triton/Dialect/TritonGPU/IR/TritonGPUAttrInterfaces.h.inc (generated by CUSTOM_COMMAND)
Missing dep: third_party/amd/lib/TritonAMDGPUTransforms/CMakeFiles/TritonAMDGPUTransforms.dir/DecomposeConversions.cpp.o uses /root/triton/python/build/cmake.linux-ppc64le-cpython-3.10/include/triton/Dialect/TritonGPU/Transforms/Passes.h.inc (generated by CUSTOM_COMMAND)
Missing dep: third_party/amd/lib/TritonAMDGPUTransforms/CMakeFiles/TritonAMDGPUTransforms.dir/DecomposeConversions.cpp.o uses /root/triton/python/build/cmake.linux-ppc64le-cpython-3.10/include/triton/Dialect/TritonNvidiaGPU/IR/Dialect.h.inc (generated by CUSTOM_COMMAND)
Missing dep: third_party/amd/lib/TritonAMDGPUTransforms/CMakeFiles/TritonAMDGPUTransforms.dir/DecomposeConversions.cpp.o uses /root/triton/python/build/cmake.linux-ppc64le-cpython-3.10/include/triton/Dialect/TritonNvidiaGPU/IR/Ops.h.inc (generated by CUSTOM_COMMAND)
Missing dep: third_party/amd/lib/TritonAMDGPUTransforms/CMakeFiles/TritonAMDGPUTransforms.dir/DecomposeConversions.cpp.o uses /root/triton/python/build/cmake.linux-ppc64le-cpython-3.10/include/triton/Dialect/TritonNvidiaGPU/IR/Types.h.inc (generated by CUSTOM_COMMAND)
Missing dep: third_party/amd/lib/TritonAMDGPUTransforms/CMakeFiles/TritonAMDGPUTransforms.dir/DecomposeConversions.cpp.o uses /root/triton/python/build/cmake.linux-ppc64le-cpython-3.10/include/triton/Dialect/TritonNvidiaGPU/IR/TritonNvidiaGPUAttrDefs.h.inc (generated by CUSTOM_COMMAND)
Missing dep: third_party/amd/lib/TritonAMDGPUTransforms/CMakeFiles/TritonAMDGPUTransforms.dir/OptimizeEpilogue.cpp.o uses /root/triton/python/build/cmake.linux-ppc64le-cpython-3.10/include/triton/Dialect/Triton/IR/Dialect.h.inc (generated by CUSTOM_COMMAND)
Missing dep: third_party/amd/lib/TritonAMDGPUTransforms/CMakeFiles/TritonAMDGPUTransforms.dir/OptimizeEpilogue.cpp.o uses /root/triton/python/build/cmake.linux-ppc64le-cpython-3.10/include/triton/Dialect/Triton/IR/Ops.h.inc (generated by CUSTOM_COMMAND)
Missing dep: third_party/amd/lib/TritonAMDGPUTransforms/CMakeFiles/TritonAMDGPUTransforms.dir/OptimizeEpilogue.cpp.o uses /root/triton/python/build/cmake.linux-ppc64le-cpython-3.10/include/triton/Dialect/Triton/IR/OpsEnums.h.inc (generated by CUSTOM_COMMAND)
Missing dep: third_party/amd/lib/TritonAMDGPUTransforms/CMakeFiles/TritonAMDGPUTransforms.dir/OptimizeEpilogue.cpp.o uses /root/triton/python/build/cmake.linux-ppc64le-cpython-3.10/include/triton/Dialect/Triton/IR/TritonTypeInterfaces.h.inc (generated by CUSTOM_COMMAND)
Missing dep: third_party/amd/lib/TritonAMDGPUTransforms/CMakeFiles/TritonAMDGPUTransforms.dir/OptimizeEpilogue.cpp.o uses /root/triton/python/build/cmake.linux-ppc64le-cpython-3.10/include/triton/Dialect/Triton/IR/Types.h.inc (generated by CUSTOM_COMMAND)
Missing dep: third_party/amd/lib/TritonAMDGPUTransforms/CMakeFiles/TritonAMDGPUTransforms.dir/OptimizeEpilogue.cpp.o uses /root/triton/python/build/cmake.linux-ppc64le-cpython-3.10/include/triton/Dialect/TritonGPU/IR/Dialect.h.inc (generated by CUSTOM_COMMAND)
Missing dep: third_party/amd/lib/TritonAMDGPUTransforms/CMakeFiles/TritonAMDGPUTransforms.dir/OptimizeEpilogue.cpp.o uses /root/triton/python/build/cmake.linux-ppc64le-cpython-3.10/include/triton/Dialect/TritonGPU/IR/Ops.h.inc (generated by CUSTOM_COMMAND)
Missing dep: third_party/amd/lib/TritonAMDGPUTransforms/CMakeFiles/TritonAMDGPUTransforms.dir/OptimizeEpilogue.cpp.o uses /root/triton/python/build/cmake.linux-ppc64le-cpython-3.10/include/triton/Dialect/TritonGPU/IR/Types.h.inc (generated by CUSTOM_COMMAND)
Missing dep: third_party/amd/lib/TritonAMDGPUTransforms/CMakeFiles/TritonAMDGPUTransforms.dir/OptimizeEpilogue.cpp.o uses /root/triton/python/build/cmake.linux-ppc64le-cpython-3.10/include/triton/Dialect/TritonGPU/IR/TritonGPUAttrDefs.h.inc (generated by CUSTOM_COMMAND)
Missing dep: third_party/amd/lib/TritonAMDGPUTransforms/CMakeFiles/TritonAMDGPUTransforms.dir/OptimizeEpilogue.cpp.o uses /root/triton/python/build/cmake.linux-ppc64le-cpython-3.10/include/triton/Dialect/TritonGPU/IR/TritonGPUAttrInterfaces.h.inc (generated by CUSTOM_COMMAND)
Missing dep: third_party/amd/lib/TritonAMDGPUTransforms/CMakeFiles/TritonAMDGPUTransforms.dir/OptimizeEpilogue.cpp.o uses /root/triton/python/build/cmake.linux-ppc64le-cpython-3.10/include/triton/Dialect/TritonGPU/Transforms/Passes.h.inc (generated by CUSTOM_COMMAND)
Missing dep: third_party/amd/lib/TritonAMDGPUTransforms/CMakeFiles/TritonAMDGPUTransforms.dir/OptimizeEpilogue.cpp.o uses /root/triton/python/build/cmake.linux-ppc64le-cpython-3.10/include/triton/Dialect/TritonNvidiaGPU/IR/Dialect.h.inc (generated by CUSTOM_COMMAND)
Missing dep: third_party/amd/lib/TritonAMDGPUTransforms/CMakeFiles/TritonAMDGPUTransforms.dir/OptimizeEpilogue.cpp.o uses /root/triton/python/build/cmake.linux-ppc64le-cpython-3.10/include/triton/Dialect/TritonNvidiaGPU/IR/Ops.h.inc (generated by CUSTOM_COMMAND)
Missing dep: third_party/amd/lib/TritonAMDGPUTransforms/CMakeFiles/TritonAMDGPUTransforms.dir/OptimizeEpilogue.cpp.o uses /root/triton/python/build/cmake.linux-ppc64le-cpython-3.10/include/triton/Dialect/TritonNvidiaGPU/IR/Types.h.inc (generated by CUSTOM_COMMAND)
Missing dep: third_party/amd/lib/TritonAMDGPUTransforms/CMakeFiles/TritonAMDGPUTransforms.dir/OptimizeEpilogue.cpp.o uses /root/triton/python/build/cmake.linux-ppc64le-cpython-3.10/include/triton/Dialect/TritonNvidiaGPU/IR/TritonNvidiaGPUAttrDefs.h.inc (generated by CUSTOM_COMMAND)
Missing dep: third_party/amd/lib/TritonAMDGPUTransforms/CMakeFiles/TritonAMDGPUTransforms.dir/RemoveLayoutConversions.cpp.o uses /root/triton/python/build/cmake.linux-ppc64le-cpython-3.10/include/triton/Dialect/Triton/IR/Dialect.h.inc (generated by CUSTOM_COMMAND)
Missing dep: third_party/amd/lib/TritonAMDGPUTransforms/CMakeFiles/TritonAMDGPUTransforms.dir/RemoveLayoutConversions.cpp.o uses /root/triton/python/build/cmake.linux-ppc64le-cpython-3.10/include/triton/Dialect/Triton/IR/Ops.h.inc (generated by CUSTOM_COMMAND)
Missing dep: third_party/amd/lib/TritonAMDGPUTransforms/CMakeFiles/TritonAMDGPUTransforms.dir/RemoveLayoutConversions.cpp.o uses /root/triton/python/build/cmake.linux-ppc64le-cpython-3.10/include/triton/Dialect/Triton/IR/OpsEnums.h.inc (generated by CUSTOM_COMMAND)
Missing dep: third_party/amd/lib/TritonAMDGPUTransforms/CMakeFiles/TritonAMDGPUTransforms.dir/RemoveLayoutConversions.cpp.o uses /root/triton/python/build/cmake.linux-ppc64le-cpython-3.10/include/triton/Dialect/Triton/IR/TritonTypeInterfaces.h.inc (generated by CUSTOM_COMMAND)
Missing dep: third_party/amd/lib/TritonAMDGPUTransforms/CMakeFiles/TritonAMDGPUTransforms.dir/RemoveLayoutConversions.cpp.o uses /root/triton/python/build/cmake.linux-ppc64le-cpython-3.10/include/triton/Dialect/Triton/IR/Types.h.inc (generated by CUSTOM_COMMAND)
Missing dep: third_party/amd/lib/TritonAMDGPUTransforms/CMakeFiles/TritonAMDGPUTransforms.dir/RemoveLayoutConversions.cpp.o uses /root/triton/python/build/cmake.linux-ppc64le-cpython-3.10/include/triton/Dialect/TritonGPU/IR/Dialect.h.inc (generated by CUSTOM_COMMAND)
Missing dep: third_party/amd/lib/TritonAMDGPUTransforms/CMakeFiles/TritonAMDGPUTransforms.dir/RemoveLayoutConversions.cpp.o uses /root/triton/python/build/cmake.linux-ppc64le-cpython-3.10/include/triton/Dialect/TritonGPU/IR/Ops.h.inc (generated by CUSTOM_COMMAND)
Missing dep: third_party/amd/lib/TritonAMDGPUTransforms/CMakeFiles/TritonAMDGPUTransforms.dir/RemoveLayoutConversions.cpp.o uses /root/triton/python/build/cmake.linux-ppc64le-cpython-3.10/include/triton/Dialect/TritonGPU/IR/Types.h.inc (generated by CUSTOM_COMMAND)
Missing dep: third_party/amd/lib/TritonAMDGPUTransforms/CMakeFiles/TritonAMDGPUTransforms.dir/RemoveLayoutConversions.cpp.o uses /root/triton/python/build/cmake.linux-ppc64le-cpython-3.10/include/triton/Dialect/TritonGPU/IR/TritonGPUAttrDefs.h.inc (generated by CUSTOM_COMMAND)
Missing dep: third_party/amd/lib/TritonAMDGPUTransforms/CMakeFiles/TritonAMDGPUTransforms.dir/RemoveLayoutConversions.cpp.o uses /root/triton/python/build/cmake.linux-ppc64le-cpython-3.10/include/triton/Dialect/TritonGPU/IR/TritonGPUAttrInterfaces.h.inc (generated by CUSTOM_COMMAND)
Missing dep: third_party/amd/lib/TritonAMDGPUTransforms/CMakeFiles/TritonAMDGPUTransforms.dir/RemoveLayoutConversions.cpp.o uses /root/triton/python/build/cmake.linux-ppc64le-cpython-3.10/include/triton/Dialect/TritonNvidiaGPU/IR/Dialect.h.inc (generated by CUSTOM_COMMAND)
Missing dep: third_party/amd/lib/TritonAMDGPUTransforms/CMakeFiles/TritonAMDGPUTransforms.dir/RemoveLayoutConversions.cpp.o uses /root/triton/python/build/cmake.linux-ppc64le-cpython-3.10/include/triton/Dialect/TritonNvidiaGPU/IR/Ops.h.inc (generated by CUSTOM_COMMAND)
Missing dep: third_party/amd/lib/TritonAMDGPUTransforms/CMakeFiles/TritonAMDGPUTransforms.dir/RemoveLayoutConversions.cpp.o uses /root/triton/python/build/cmake.linux-ppc64le-cpython-3.10/include/triton/Dialect/TritonNvidiaGPU/IR/Types.h.inc (generated by CUSTOM_COMMAND)
Missing dep: third_party/amd/lib/TritonAMDGPUTransforms/CMakeFiles/TritonAMDGPUTransforms.dir/RemoveLayoutConversions.cpp.o uses /root/triton/python/build/cmake.linux-ppc64le-cpython-3.10/include/triton/Dialect/TritonNvidiaGPU/IR/TritonNvidiaGPUAttrDefs.h.inc (generated by CUSTOM_COMMAND)
Missing dep: third_party/amd/lib/TritonAMDGPUTransforms/CMakeFiles/TritonAMDGPUTransforms.dir/ReorderInstructions.cpp.o uses /root/triton/python/build/cmake.linux-ppc64le-cpython-3.10/include/triton/Dialect/Triton/IR/Dialect.h.inc (generated by CUSTOM_COMMAND)
Missing dep: third_party/amd/lib/TritonAMDGPUTransforms/CMakeFiles/TritonAMDGPUTransforms.dir/ReorderInstructions.cpp.o uses /root/triton/python/build/cmake.linux-ppc64le-cpython-3.10/include/triton/Dialect/Triton/IR/Ops.h.inc (generated by CUSTOM_COMMAND)
Missing dep: third_party/amd/lib/TritonAMDGPUTransforms/CMakeFiles/TritonAMDGPUTransforms.dir/ReorderInstructions.cpp.o uses /root/triton/python/build/cmake.linux-ppc64le-cpython-3.10/include/triton/Dialect/Triton/IR/OpsEnums.h.inc (generated by CUSTOM_COMMAND)
Missing dep: third_party/amd/lib/TritonAMDGPUTransforms/CMakeFiles/TritonAMDGPUTransforms.dir/ReorderInstructions.cpp.o uses /root/triton/python/build/cmake.linux-ppc64le-cpython-3.10/include/triton/Dialect/Triton/IR/TritonTypeInterfaces.h.inc (generated by CUSTOM_COMMAND)
Missing dep: third_party/amd/lib/TritonAMDGPUTransforms/CMakeFiles/TritonAMDGPUTransforms.dir/ReorderInstructions.cpp.o uses /root/triton/python/build/cmake.linux-ppc64le-cpython-3.10/include/triton/Dialect/Triton/IR/Types.h.inc (generated by CUSTOM_COMMAND)
Missing dep: third_party/amd/lib/TritonAMDGPUTransforms/CMakeFiles/TritonAMDGPUTransforms.dir/ReorderInstructions.cpp.o uses /root/triton/python/build/cmake.linux-ppc64le-cpython-3.10/include/triton/Dialect/TritonGPU/IR/Dialect.h.inc (generated by CUSTOM_COMMAND)
Missing dep: third_party/amd/lib/TritonAMDGPUTransforms/CMakeFiles/TritonAMDGPUTransforms.dir/ReorderInstructions.cpp.o uses /root/triton/python/build/cmake.linux-ppc64le-cpython-3.10/include/triton/Dialect/TritonGPU/IR/Ops.h.inc (generated by CUSTOM_COMMAND)
Missing dep: third_party/amd/lib/TritonAMDGPUTransforms/CMakeFiles/TritonAMDGPUTransforms.dir/ReorderInstructions.cpp.o uses /root/triton/python/build/cmake.linux-ppc64le-cpython-3.10/include/triton/Dialect/TritonGPU/IR/Types.h.inc (generated by CUSTOM_COMMAND)
Missing dep: third_party/amd/lib/TritonAMDGPUTransforms/CMakeFiles/TritonAMDGPUTransforms.dir/ReorderInstructions.cpp.o uses /root/triton/python/build/cmake.linux-ppc64le-cpython-3.10/include/triton/Dialect/TritonGPU/IR/TritonGPUAttrDefs.h.inc (generated by CUSTOM_COMMAND)
Missing dep: third_party/amd/lib/TritonAMDGPUTransforms/CMakeFiles/TritonAMDGPUTransforms.dir/ReorderInstructions.cpp.o uses /root/triton/python/build/cmake.linux-ppc64le-cpython-3.10/include/triton/Dialect/TritonGPU/IR/TritonGPUAttrInterfaces.h.inc (generated by CUSTOM_COMMAND)
Missing dep: third_party/amd/lib/TritonAMDGPUTransforms/CMakeFiles/TritonAMDGPUTransforms.dir/ReorderInstructions.cpp.o uses /root/triton/python/build/cmake.linux-ppc64le-cpython-3.10/include/triton/Dialect/TritonGPU/Transforms/Passes.h.inc (generated by CUSTOM_COMMAND)
Missing dep: third_party/amd/lib/TritonAMDGPUTransforms/CMakeFiles/TritonAMDGPUTransforms.dir/ReorderInstructions.cpp.o uses /root/triton/python/build/cmake.linux-ppc64le-cpython-3.10/include/triton/Dialect/TritonNvidiaGPU/IR/Dialect.h.inc (generated by CUSTOM_COMMAND)
Missing dep: third_party/amd/lib/TritonAMDGPUTransforms/CMakeFiles/TritonAMDGPUTransforms.dir/ReorderInstructions.cpp.o uses /root/triton/python/build/cmake.linux-ppc64le-cpython-3.10/include/triton/Dialect/TritonNvidiaGPU/IR/Ops.h.inc (generated by CUSTOM_COMMAND)
Missing dep: third_party/amd/lib/TritonAMDGPUTransforms/CMakeFiles/TritonAMDGPUTransforms.dir/ReorderInstructions.cpp.o uses /root/triton/python/build/cmake.linux-ppc64le-cpython-3.10/include/triton/Dialect/TritonNvidiaGPU/IR/Types.h.inc (generated by CUSTOM_COMMAND)
Missing dep: third_party/amd/lib/TritonAMDGPUTransforms/CMakeFiles/TritonAMDGPUTransforms.dir/ReorderInstructions.cpp.o uses /root/triton/python/build/cmake.linux-ppc64le-cpython-3.10/include/triton/Dialect/TritonNvidiaGPU/IR/TritonNvidiaGPUAttrDefs.h.inc (generated by CUSTOM_COMMAND)
Missing dep: third_party/amd/lib/TritonAMDGPUTransforms/CMakeFiles/TritonAMDGPUTransforms.dir/StreamPipeline.cpp.o uses /root/triton/python/build/cmake.linux-ppc64le-cpython-3.10/include/triton/Dialect/Triton/IR/Dialect.h.inc (generated by CUSTOM_COMMAND)
Missing dep: third_party/amd/lib/TritonAMDGPUTransforms/CMakeFiles/TritonAMDGPUTransforms.dir/StreamPipeline.cpp.o uses /root/triton/python/build/cmake.linux-ppc64le-cpython-3.10/include/triton/Dialect/Triton/IR/Ops.h.inc (generated by CUSTOM_COMMAND)
Missing dep: third_party/amd/lib/TritonAMDGPUTransforms/CMakeFiles/TritonAMDGPUTransforms.dir/StreamPipeline.cpp.o uses /root/triton/python/build/cmake.linux-ppc64le-cpython-3.10/include/triton/Dialect/Triton/IR/OpsEnums.h.inc (generated by CUSTOM_COMMAND)
Missing dep: third_party/amd/lib/TritonAMDGPUTransforms/CMakeFiles/TritonAMDGPUTransforms.dir/StreamPipeline.cpp.o uses /root/triton/python/build/cmake.linux-ppc64le-cpython-3.10/include/triton/Dialect/Triton/IR/TritonTypeInterfaces.h.inc (generated by CUSTOM_COMMAND)
Missing dep: third_party/amd/lib/TritonAMDGPUTransforms/CMakeFiles/TritonAMDGPUTransforms.dir/StreamPipeline.cpp.o uses /root/triton/python/build/cmake.linux-ppc64le-cpython-3.10/include/triton/Dialect/Triton/IR/Types.h.inc (generated by CUSTOM_COMMAND)
Missing dep: third_party/amd/lib/TritonAMDGPUTransforms/CMakeFiles/TritonAMDGPUTransforms.dir/StreamPipeline.cpp.o uses /root/triton/python/build/cmake.linux-ppc64le-cpython-3.10/include/triton/Dialect/TritonGPU/IR/Dialect.h.inc (generated by CUSTOM_COMMAND)
Missing dep: third_party/amd/lib/TritonAMDGPUTransforms/CMakeFiles/TritonAMDGPUTransforms.dir/StreamPipeline.cpp.o uses /root/triton/python/build/cmake.linux-ppc64le-cpython-3.10/include/triton/Dialect/TritonGPU/IR/Ops.h.inc (generated by CUSTOM_COMMAND)
Missing dep: third_party/amd/lib/TritonAMDGPUTransforms/CMakeFiles/TritonAMDGPUTransforms.dir/StreamPipeline.cpp.o uses /root/triton/python/build/cmake.linux-ppc64le-cpython-3.10/include/triton/Dialect/TritonGPU/IR/Types.h.inc (generated by CUSTOM_COMMAND)
Missing dep: third_party/amd/lib/TritonAMDGPUTransforms/CMakeFiles/TritonAMDGPUTransforms.dir/StreamPipeline.cpp.o uses /root/triton/python/build/cmake.linux-ppc64le-cpython-3.10/include/triton/Dialect/TritonGPU/IR/TritonGPUAttrDefs.h.inc (generated by CUSTOM_COMMAND)
Missing dep: third_party/amd/lib/TritonAMDGPUTransforms/CMakeFiles/TritonAMDGPUTransforms.dir/StreamPipeline.cpp.o uses /root/triton/python/build/cmake.linux-ppc64le-cpython-3.10/include/triton/Dialect/TritonGPU/IR/TritonGPUAttrInterfaces.h.inc (generated by CUSTOM_COMMAND)
Missing dep: third_party/amd/lib/TritonAMDGPUTransforms/CMakeFiles/TritonAMDGPUTransforms.dir/StreamPipeline.cpp.o uses /root/triton/python/build/cmake.linux-ppc64le-cpython-3.10/include/triton/Dialect/TritonNvidiaGPU/IR/Dialect.h.inc (generated by CUSTOM_COMMAND)
Missing dep: third_party/amd/lib/TritonAMDGPUTransforms/CMakeFiles/TritonAMDGPUTransforms.dir/StreamPipeline.cpp.o uses /root/triton/python/build/cmake.linux-ppc64le-cpython-3.10/include/triton/Dialect/TritonNvidiaGPU/IR/Ops.h.inc (generated by CUSTOM_COMMAND)
Missing dep: third_party/amd/lib/TritonAMDGPUTransforms/CMakeFiles/TritonAMDGPUTransforms.dir/StreamPipeline.cpp.o uses /root/triton/python/build/cmake.linux-ppc64le-cpython-3.10/include/triton/Dialect/TritonNvidiaGPU/IR/Types.h.inc (generated by CUSTOM_COMMAND)
Missing dep: third_party/amd/lib/TritonAMDGPUTransforms/CMakeFiles/TritonAMDGPUTransforms.dir/StreamPipeline.cpp.o uses /root/triton/python/build/cmake.linux-ppc64le-cpython-3.10/include/triton/Dialect/TritonNvidiaGPU/IR/TritonNvidiaGPUAttrDefs.h.inc (generated by CUSTOM_COMMAND)
Processed 346 nodes.
Error: There are 11 missing dependency paths.
11 targets had depfile dependencies on 19 distinct generated inputs (from 1 rules)  without a non-depfile dep path to the generator.
There might be build flakiness if any of the targets listed above are built alone, or not late enough, in a clean output directory.
NavinKumarMNK commented 7 months ago

As you noted there were missing dependancies, what can be done to solve this?

joker-eph commented 7 months ago

The missing dependencies need to be added to CMakeLists.txt.

For example: Missing dep: third_party/nvidia/lib/NVGPUToLLVM/CMakeFiles/NVGPUToLLVM.dir/NVGPUToLLVMPass.cpp.o uses /root/triton/python/build/cmake.linux-ppc64le-cpython-3.10/include/triton/Dialect/NVGPU/IR/NVGPUAttrDefs.h.inc (generated by CUSTOM_COMMAND)

This indicates that the target that builds NVGPUToLLVMPass.cpp is missing a dependency on the target that generates NVGPUAttrDefs.h.inc (I would use TritonGPUIR).

That would be here: https://github.com/openai/triton/blob/main/third_party/nvidia/lib/NVGPUToLLVM/CMakeLists.txt#L5

NavinKumarMNK commented 7 months ago

I almost understood whats happening but not sure about the code changes need to be made. It will be helpful if you can mention the changes need to be done. I will try building it again with the changes.

I think i need to add this add_dependencies(NVGPUToLLVM TritonGPUIR) to CMakeLists.txt. please correct me if i am wrong

NavinKumarMNK commented 7 months ago

Just to mention, i tried this https://github.com/intel/opencl-clang/issues/188 today, since this looks similar to my issue, but then after following this, when i import the triton the core got dumped with the error, llvm is linked in two places and python3 exited. seems like this might be helpful to you and let me know if i need to try something regarding this procedure.

root@93024fd18a48:/opt# python3                                                                                                                                                                                    
Python 3.10.13 | packaged by conda-forge | (main, Dec 23 2023, 16:04:32) [GCC 12.3.0] on linux                                                                                                                     
Type "help", "copyright", "credits" or "license" for more information.                                                                                                                                             
>>> import triton                                                                                                                                                                                                  
: CommandLine Error: Option 'print-pipeline-passes' registered more than once!                                                                                                                                     
LLVM ERROR: inconsistency in registered CommandLine options                                                                                                                                                        
Aborted (core dumped)         
joker-eph commented 7 months ago

I think i need to add this add_dependencies(NVGPUToLLVM TritonGPUIR) to CMakeLists.txt. please correct me if i am wrong

Almost: I provided a link to the CMakeLists where you have:

add_triton_library(NVGPUToLLVM
    NVGPUToLLVMPass.cpp

    DEPENDS
    NVGPUConversionPassIncGen
)

You have the "DEPENDS" list here, this is where to append the missing library, so the fix for this missing dependency would be:

add_triton_library(NVGPUToLLVM
    NVGPUToLLVMPass.cpp

    DEPENDS
    NVGPUConversionPassIncGen
    TritonGPUIR
)
NavinKumarMNK commented 6 months ago

I get the same missing dependacy logs

...
Missing dep: third_party/nvidia/lib/NVGPUToLLVM/CMakeFiles/NVGPUToLLVM.dir/NVGPUToLLVMPass.cpp.o uses /root/triton/python/build/cmake.linux-ppc64le-cpython-3.10/include/triton/Dialect/NVGPU/IR/NVGPUAttrDefs.h.inc (generated by CUSTOM_COMMAND)
...
Processed 346 nodes.
Error: There are 11 missing dependency paths.
11 targets had depfile dependencies on 19 distinct generated inputs (from 1 rules)  without a non-depfile dep path to the generator.
There might be build flakiness if any of the targets listed above are built alone, or not late enough, in a clean output directory.
root@4e8d4724af01:~/triton/python/build/cmake.linux-ppc64le-cpython-3.10# cat ~/triton/third_party/nvidia/lib/NVGPUToLLVM/CMakeLists.txt 
add_triton_library(NVGPUToLLVM
    NVGPUToLLVMPass.cpp

    DEPENDS
    NVGPUConversionPassIncGen
    TritonGPUIR
)
root@4e8d4724af01:~/triton/python/build/cmake.linux-ppc64le-cpython-3.10# python3
Python 3.10.13 | packaged by conda-forge | (main, Dec 23 2023, 16:04:32) [GCC 12.3.0] on linux
Type "help", "copyright", "credits" or "license" for more information.
>>> import triton
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
  File "/root/triton/python/triton/__init__.py", line 8, in <module>
    from .runtime import (
  File "/root/triton/python/triton/runtime/__init__.py", line 1, in <module>
    from .autotuner import (Autotuner, Config, Heuristics, autotune, heuristics)
  File "/root/triton/python/triton/runtime/autotuner.py", line 7, in <module>
    from ..testing import do_bench
  File "/root/triton/python/triton/testing.py", line 7, in <module>
    from . import language as tl
  File "/root/triton/python/triton/language/__init__.py", line 4, in <module>
    from . import math
  File "/root/triton/python/triton/language/math.py", line 1, in <module>
    from . import core
  File "/root/triton/python/triton/language/core.py", line 9, in <module>
    from ..runtime.jit import jit
  File "/root/triton/python/triton/runtime/jit.py", line 11, in <module>
    from ..runtime.driver import driver
  File "/root/triton/python/triton/runtime/driver.py", line 1, in <module>
    from ..backends import backends
  File "/root/triton/python/triton/backends/__init__.py", line 50, in <module>
    backends = _discover_backends()
  File "/root/triton/python/triton/backends/__init__.py", line 43, in _discover_backends
    compiler = _load_module(name, os.path.join(root, name, 'compiler.py'))
  File "/root/triton/python/triton/backends/__init__.py", line 12, in _load_module
    spec.loader.exec_module(module)
  File "/root/triton/python/triton/backends/nvidia/compiler.py", line 2, in <module>
    from triton._C.libtriton import ir, passes, llvm, nvidia
ImportError: /root/triton/python/triton/_C/libtriton.so: undefined symbol: LLVMInitializeSparcTarget
joker-eph commented 6 months ago

Did you first build before checking the missing deps? I believe ninja needs a clean build for this since it'll use it's build log.