Assertion `parentOp->getNumRegions() == 1 && parentOp->getRegion(0).getBlocks().size() == 1' failed

llvm / llvm-project

The LLVM Project is a collection of modular and reusable compiler and toolchain technologies.

Other

28.87k stars 11.93k forks source link

Your current environment

root@0fca177ad2d4:/workspace# python3 collect_env.py 
Collecting environment information...
PyTorch version: 2.1.2
Is debug build: False
CUDA used to build PyTorch: 12.2
ROCM used to build PyTorch: N/A

OS: Ubuntu 22.04.4 LTS (ppc64le)
GCC version: (Ubuntu 11.4.0-1ubuntu1~22.04) 11.4.0
Clang version: Could not collect
CMake version: version 3.29.0
Libc version: glibc-2.35

Python version: 3.10.13 | packaged by conda-forge | (main, Dec 23 2023, 16:04:32) [GCC 12.3.0] (64-bit runtime)
Python platform: Linux-5.15.0-100-generic-ppc64le-with-glibc2.35
Is CUDA available: True
CUDA runtime version: 12.2.91
CUDA_MODULE_LOADING set to: LAZY
GPU models and configuration: 
GPU 0: Tesla V100-SXM2-32GB
GPU 1: Tesla V100-SXM2-32GB
GPU 2: Tesla V100-SXM2-32GB
GPU 3: Tesla V100-SXM2-32GB

Nvidia driver version: 535.161.07
cuDNN version: Probably one of the following:
/usr/local/cuda-12.2/targets/ppc64le-linux/lib/libcudnn.so.8.9.5
/usr/local/cuda-12.2/targets/ppc64le-linux/lib/libcudnn_adv_infer.so.8.9.5
/usr/local/cuda-12.2/targets/ppc64le-linux/lib/libcudnn_adv_train.so.8.9.5
/usr/local/cuda-12.2/targets/ppc64le-linux/lib/libcudnn_cnn_infer.so.8.9.5
/usr/local/cuda-12.2/targets/ppc64le-linux/lib/libcudnn_cnn_train.so.8.9.5
/usr/local/cuda-12.2/targets/ppc64le-linux/lib/libcudnn_ops_infer.so.8.9.5
/usr/local/cuda-12.2/targets/ppc64le-linux/lib/libcudnn_ops_train.so.8.9.5
HIP runtime version: N/A
MIOpen runtime version: N/A
Is XNNPACK available: False

CPU:
Architecture:                       ppc64le
Byte Order:                         Little Endian
CPU(s):                             128
On-line CPU(s) list:                0-127
Model name:                         POWER9, altivec supported
Model:                              2.2 (pvr 004e 1202)
Thread(s) per core:                 4
Core(s) per socket:                 16
Socket(s):                          2
Frequency boost:                    enabled
CPU max MHz:                        3800.0000
CPU min MHz:                        2300.0000
L1d cache:                          1 MiB (32 instances)
L1i cache:                          1 MiB (32 instances)
L2 cache:                           8 MiB (16 instances)
L3 cache:                           160 MiB (16 instances)
NUMA node(s):                       6
NUMA node0 CPU(s):                  0-63
NUMA node8 CPU(s):                  64-127
NUMA node252 CPU(s):                
NUMA node253 CPU(s):                
NUMA node254 CPU(s):                
NUMA node255 CPU(s):                
Vulnerability Gather data sampling: Not affected
Vulnerability Itlb multihit:        Not affected
Vulnerability L1tf:                 Mitigation; RFI Flush, L1D private per thread
Vulnerability Mds:                  Not affected
Vulnerability Meltdown:             Mitigation; RFI Flush, L1D private per thread
Vulnerability Mmio stale data:      Not affected
Vulnerability Retbleed:             Not affected
Vulnerability Spec rstack overflow: Not affected
Vulnerability Spec store bypass:    Mitigation; Kernel entry/exit barrier (eieio)
Vulnerability Spectre v1:           Mitigation; __user pointer sanitization, ori31 speculation barrier enabled
Vulnerability Spectre v2:           Mitigation; Indirect branch serialisation (kernel only)
Vulnerability Srbds:                Not affected
Vulnerability Tsx async abort:      Not affected

Versions of relevant libraries:
[pip3] numpy==1.24.3
[pip3] torch==2.1.2
[conda] cudatoolkit               11.8.0              hedcfb66_13    conda-forge
[conda] libmagma                  2.7.2                he288b6c_2    conda-forge
[conda] libmagma_sparse           2.7.2                h5b5c57a_3    conda-forge
[conda] magma                     2.7.2                h097a1ca_3    conda-forge
[conda] numpy                     1.24.3          py310h87cc683_0  
[conda] numpy-base                1.24.3          py310hac71eb6_0  
[conda] torch                     2.1.2                     dev_0    <develop>ROCM Version: Could not collect
Neuron SDK Version: N/A
vLLM Version: 0.3.3
vLLM Build Flags:
CUDA Archs: 7.0; ROCm: Disabled; Neuron: Disabled
GPU Topology:
GPU0 GPU1 GPU2 GPU3 CPU Affinity NUMA Affinity GPU NUMA ID
GPU0  X  NV3 SYS SYS 0-63 0  N/A
GPU1 NV3  X  SYS SYS 0-63 0  N/A
GPU2 SYS SYS  X  NV3 64-127 8   N/A
GPU3 SYS SYS NV3  X  64-127 8   N/A

Legend:
  X    = Self
  SYS  = Connection traversing PCIe as well as the SMP interconnect between NUMA nodes (e.g., QPI/UPI)
  NODE = Connection traversing PCIe as well as the interconnect between PCIe Host Bridges within a NUMA node
  PHB  = Connection traversing PCIe as well as a PCIe Host Bridge (typically the CPU)
  PXB  = Connection traversing multiple PCIe bridges (without traversing the PCIe Host Bridge)
  PIX  = Connection traversing at most a single PCIe bridge
  NV#  = Connection traversing a bonded set of # NVLinks

$ llvm-config --version
17.0.0git

llvm build commit : c5dede880d175f7229c9b2923f4753e12702305d

build command

cmake -G Ninja ../llvm \
   -DLLVM_ENABLE_PROJECTS="mlir;llvm" \
   -DLLVM_BUILD_EXAMPLES=ON \
   -DLLVM_TARGETS_TO_BUILD="PowerPC;NVPTX;X86;AMDGPU;RISCV" \
   -DMLIR_ENABLE_CUDA_RUNNER=ON \
   -DCMAKE_BUILD_TYPE=Release \
   -DLLVM_ENABLE_ASSERTIONS=ON \
   -DCMAKE_C_COMPILER=clang \
   -DCMAKE_CXX_COMPILER=clang++ \
   -DLLVM_ENABLE_RTTI=ON \
   -DLLVM_INSTALL_UTILS=ON \
   -DMLIR_INCLUDE_INTEGRATION_TESTS=ON

Bug

example.py. - i loaded the mixtral-8x7b-instruct fp16 model

from vllm import LLM, SamplingParams
prompts = [
    "Hello, my name is",
    "The president of the United States is",
    "The capital of France is",
    "The future of AI is",
]
sampling_params = SamplingParams(temperature=0.8, top_p=0.95)

llm = LLM(
    model="./models", 
    dtype="float16", 
    tensor_parallel_size=4, 
    enforce_eager=True, 
    trust_remote_code=True, 
    load_format='safetensors',
    # quantization="AWQ",
)

root@0fca177ad2d4:/workspace# python3 example.py 
WARNING 03-29 15:24:46 config.py:686] Casting torch.bfloat16 to torch.float16.
2024-03-29 15:24:48,678 INFO worker.py:1612 -- Started a local Ray instance. View the dashboard at 127.0.0.1:8265 
INFO 03-29 15:24:52 llm_engine.py:68] Initializing an LLM engine (v0.3.3) with config: model='./models', tokenizer='./models', tokenizer_mode=auto, revision=None, tokenizer_revision=None, trust_remote_code=True, dtype=torch.float16, max_seq_len=32768, download_dir=None, load_format=safetensors, tensor_parallel_size=4, disable_custom_all_reduce=True, quantization=None, enforce_eager=True, kv_cache_dtype=auto, device_config=cuda, seed=0)
INFO 03-29 15:25:07 attention.py:67] flash_attn is not found. Using xformers backend.
(RayWorkerVllm pid=37294) INFO 03-29 15:25:07 attention.py:67] flash_attn is not found. Using xformers backend.
INFO 03-29 15:25:27 model_runner.py:97] Loading model weights took 21.7573 GB
(RayWorkerVllm pid=37294) INFO 03-29 15:25:39 model_runner.py:97] Loading model weights took 21.7573 GB
(RayWorkerVllm pid=37345) INFO 03-29 15:25:07 attention.py:67] flash_attn is not found. Using xformers backend. [repeated 2x across cluster] (Ray deduplicates logs by default. Set RAY_DEDUP_LOGS=0 to disable log deduplication, or see https://docs.ray.io/en/master/ray-observability/ray-logging.html#log-deduplication for more options.)
python3: /root/llvm-project/mlir/lib/Analysis/SliceAnalysis.cpp:106: void getBackwardSliceImpl(mlir::Operation *, SetVector<mlir::Operation *> *, mlir::TransitiveFilter): Assertion `parentOp->getNumRegions() == 1 && parentOp->getRegion(0).getBlocks().size() == 1' failed.
*** SIGABRT received at time=1711725941 on cpu 45 ***
PC: @     0x7e79d800866c  (unknown)  pthread_kill
    @     0x7e7143545984  613083184  absl::lts_20220623::AbslFailureSignalHandler()
    @     0x7e79d800870c        224  pthread_kill
    @     0x7e79d7fa1dfc         48  raise
    @     0x7e79d7f7d260        336  abort
    @     0x7e79d7f94ef0        192  (unknown)
    @     0x7e79d7f94f94         64  __assert_fail
    @     0x7e755cc4c8b8        112  getBackwardSliceImpl()
    @     0x7e755cc4c6f0        112  getBackwardSliceImpl()
    @     0x7e755cc4c5a8         64  mlir::getBackwardSlice()
    @     0x7e755c78bc10        384  mlir::multiRootGetSlice()
    @     0x7e755b235e7c        608  CoalescePass::getCoalescedEncoding()
    @     0x7e755b2375d8        256  CoalescePass::runOnOperation()::{lambda()#1}::operator()()
    @     0x7e755b238be0        480  mlir::detail::walk<>()
    @     0x7e755b238eac        320  CoalescePass::runOnOperation()
    @     0x7e755bd83c54        416  mlir::detail::OpToOpPassAdaptor::run()
    @     0x7e755bd84650        160  mlir::detail::OpToOpPassAdaptor::runPipeline()
    @     0x7e755bd878b0        368  mlir::PassManager::run()
    @     0x7e75599cf9cc        128  pybind11::cpp_function::initialize<>()::{lambda()#3}::_FUN()
    @     0x7e75599bacd4        848  pybind11::cpp_function::dispatcher()
    @      0x18e02ca5e40        112  cfunction_call
    @      0x18e02a3bc2c        160  _PyObject_MakeTpCall
    @      0x18e02c80134        160  method_vectorcall
    @      0x18e02a25738        480  _PyEval_EvalFrameDefault
    @      0x18e02b2a974         64  _PyEval_Vector
    @      0x18e02a3b9a0         32  _PyFunction_Vectorcall
    @      0x18e02a22b6c        480  _PyEval_EvalFrameDefault
    @      0x18e02b2a974         64  _PyEval_Vector
    @      0x18e02a3b9a0         32  _PyFunction_Vectorcall
    @      0x18e02a22b6c        480  _PyEval_EvalFrameDefault
    @      0x18e02b2a974         64  _PyEval_Vector
    @      0x18e02a3b9a0         32  _PyFunction_Vectorcall
    @      0x18e02a22224        480  _PyEval_EvalFrameDefault
    @ ... and at least 196 more frames
[2024-03-29 15:25:41,577 E 30164 30164] logging.cc:361: *** SIGABRT received at time=1711725941 on cpu 45 ***
[2024-03-29 15:25:41,577 E 30164 30164] logging.cc:361: PC: @     0x7e79d800866c  (unknown)  pthread_kill
[2024-03-29 15:25:41,581 E 30164 30164] logging.cc:361:     @     0x7e71435459b8  613083184  absl::lts_20220623::AbslFailureSignalHandler()
[2024-03-29 15:25:41,581 E 30164 30164] logging.cc:361:     @     0x7e79d800870c        224  pthread_kill
[2024-03-29 15:25:41,581 E 30164 30164] logging.cc:361:     @     0x7e79d7fa1dfc         48  raise
[2024-03-29 15:25:41,581 E 30164 30164] logging.cc:361:     @     0x7e79d7f7d260        336  abort
[2024-03-29 15:25:41,581 E 30164 30164] logging.cc:361:     @     0x7e79d7f94ef0        192  (unknown)
[2024-03-29 15:25:41,581 E 30164 30164] logging.cc:361:     @     0x7e79d7f94f94         64  __assert_fail
[2024-03-29 15:25:41,581 E 30164 30164] logging.cc:361:     @     0x7e755cc4c8b8        112  getBackwardSliceImpl()
[2024-03-29 15:25:41,581 E 30164 30164] logging.cc:361:     @     0x7e755cc4c6f0        112  getBackwardSliceImpl()
[2024-03-29 15:25:41,581 E 30164 30164] logging.cc:361:     @     0x7e755cc4c5a8         64  mlir::getBackwardSlice()
[2024-03-29 15:25:41,581 E 30164 30164] logging.cc:361:     @     0x7e755c78bc10        384  mlir::multiRootGetSlice()
[2024-03-29 15:25:41,581 E 30164 30164] logging.cc:361:     @     0x7e755b235e7c        608  CoalescePass::getCoalescedEncoding()
[2024-03-29 15:25:41,581 E 30164 30164] logging.cc:361:     @     0x7e755b2375d8        256  CoalescePass::runOnOperation()::{lambda()#1}::operator()()
[2024-03-29 15:25:41,581 E 30164 30164] logging.cc:361:     @     0x7e755b238be0        480  mlir::detail::walk<>()
[2024-03-29 15:25:41,581 E 30164 30164] logging.cc:361:     @     0x7e755b238eac        320  CoalescePass::runOnOperation()
[2024-03-29 15:25:41,581 E 30164 30164] logging.cc:361:     @     0x7e755bd83c54        416  mlir::detail::OpToOpPassAdaptor::run()
[2024-03-29 15:25:41,581 E 30164 30164] logging.cc:361:     @     0x7e755bd84650        160  mlir::detail::OpToOpPassAdaptor::runPipeline()
[2024-03-29 15:25:41,581 E 30164 30164] logging.cc:361:     @     0x7e755bd878b0        368  mlir::PassManager::run()
[2024-03-29 15:25:41,581 E 30164 30164] logging.cc:361:     @     0x7e75599cf9cc        128  pybind11::cpp_function::initialize<>()::{lambda()#3}::_FUN()
[2024-03-29 15:25:41,581 E 30164 30164] logging.cc:361:     @     0x7e75599bacd4        848  pybind11::cpp_function::dispatcher()
[2024-03-29 15:25:41,581 E 30164 30164] logging.cc:361:     @      0x18e02ca5e40        112  cfunction_call
[2024-03-29 15:25:41,581 E 30164 30164] logging.cc:361:     @      0x18e02a3bc2c        160  _PyObject_MakeTpCall
[2024-03-29 15:25:41,581 E 30164 30164] logging.cc:361:     @      0x18e02c80134        160  method_vectorcall
[2024-03-29 15:25:41,581 E 30164 30164] logging.cc:361:     @      0x18e02a25738        480  _PyEval_EvalFrameDefault
[2024-03-29 15:25:41,581 E 30164 30164] logging.cc:361:     @      0x18e02b2a974         64  _PyEval_Vector
[2024-03-29 15:25:41,581 E 30164 30164] logging.cc:361:     @      0x18e02a3b9a0         32  _PyFunction_Vectorcall
[2024-03-29 15:25:41,582 E 30164 30164] logging.cc:361:     @      0x18e02a22b6c        480  _PyEval_EvalFrameDefault
[2024-03-29 15:25:41,582 E 30164 30164] logging.cc:361:     @      0x18e02b2a974         64  _PyEval_Vector
[2024-03-29 15:25:41,582 E 30164 30164] logging.cc:361:     @      0x18e02a3b9a0         32  _PyFunction_Vectorcall
[2024-03-29 15:25:41,582 E 30164 30164] logging.cc:361:     @      0x18e02a22b6c        480  _PyEval_EvalFrameDefault
[2024-03-29 15:25:41,582 E 30164 30164] logging.cc:361:     @      0x18e02b2a974         64  _PyEval_Vector
[2024-03-29 15:25:41,582 E 30164 30164] logging.cc:361:     @      0x18e02a3b9a0         32  _PyFunction_Vectorcall
[2024-03-29 15:25:41,582 E 30164 30164] logging.cc:361:     @      0x18e02a22224        480  _PyEval_EvalFrameDefault
[2024-03-29 15:25:41,582 E 30164 30164] logging.cc:361:     @ ... and at least 196 more frames
Fatal Python error: Aborted

Stack (most recent call first):
  File "/root/triton/python/triton/compiler/compiler.py", line 91 in optimize_ttgir
  File "/root/triton/python/triton/compiler/compiler.py", line 383 in <lambda>
  File "/root/triton/python/triton/compiler/compiler.py", line 476 in compile
  File "<string>", line 63 in fused_moe_kernel
  File "/root/miniconda3/lib/python3.10/site-packages/vllm-0.3.3+cu122-py3.10-linux-ppc64le.egg/vllm/model_executor/layers/fused_moe/fused_moe.py", line 222 in invoke_fused_moe_kernel
  File "/root/miniconda3/lib/python3.10/site-packages/vllm-0.3.3+cu122-py3.10-linux-ppc64le.egg/vllm/model_executor/layers/fused_moe/fused_moe.py", line 397 in fused_moe
  File "/root/miniconda3/lib/python3.10/site-packages/vllm-0.3.3+cu122-py3.10-linux-ppc64le.egg/vllm/model_executor/models/mixtral.py", line 131 in forward
  File "/root/miniconda3/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1527 in _call_impl
  File "/root/miniconda3/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1518 in _wrapped_call_impl
  File "/root/miniconda3/lib/python3.10/site-packages/vllm-0.3.3+cu122-py3.10-linux-ppc64le.egg/vllm/model_executor/models/mixtral.py", line 278 in forward
  File "/root/miniconda3/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1527 in _call_impl
  File "/root/miniconda3/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1518 in _wrapped_call_impl
  File "/root/miniconda3/lib/python3.10/site-packages/vllm-0.3.3+cu122-py3.10-linux-ppc64le.egg/vllm/model_executor/models/mixtral.py", line 319 in forward
  File "/root/miniconda3/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1527 in _call_impl
  File "/root/miniconda3/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1518 in _wrapped_call_impl
  File "/root/miniconda3/lib/python3.10/site-packages/vllm-0.3.3+cu122-py3.10-linux-ppc64le.egg/vllm/model_executor/models/mixtral.py", line 383 in forward
  File "/root/miniconda3/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1527 in _call_impl
  File "/root/miniconda3/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1518 in _wrapped_call_impl
  File "/root/miniconda3/lib/python3.10/site-packages/vllm-0.3.3+cu122-py3.10-linux-ppc64le.egg/vllm/worker/model_runner.py", line 606 in execute_model
  File "/root/miniconda3/lib/python3.10/site-packages/torch/utils/_contextlib.py", line 115 in decorate_context
  File "/root/miniconda3/lib/python3.10/site-packages/vllm-0.3.3+cu122-py3.10-linux-ppc64le.egg/vllm/worker/model_runner.py", line 677 in profile_run
  File "/root/miniconda3/lib/python3.10/site-packages/torch/utils/_contextlib.py", line 115 in decorate_context
  File "/root/miniconda3/lib/python3.10/site-packages/vllm-0.3.3+cu122-py3.10-linux-ppc64le.egg/vllm/worker/worker.py", line 122 in profile_num_available_blocks
  File "/root/miniconda3/lib/python3.10/site-packages/torch/utils/_contextlib.py", line 115 in decorate_context
  File "/root/miniconda3/lib/python3.10/site-packages/vllm-0.3.3+cu122-py3.10-linux-ppc64le.egg/vllm/executor/ray_gpu_executor.py", line 318 in _run_workers
  File "/root/miniconda3/lib/python3.10/site-packages/vllm-0.3.3+cu122-py3.10-linux-ppc64le.egg/vllm/executor/ray_gpu_executor.py", line 221 in _init_cache
  File "/root/miniconda3/lib/python3.10/site-packages/vllm-0.3.3+cu122-py3.10-linux-ppc64le.egg/vllm/executor/ray_gpu_executor.py", line 63 in __init__
  File "/root/miniconda3/lib/python3.10/site-packages/vllm-0.3.3+cu122-py3.10-linux-ppc64le.egg/vllm/engine/llm_engine.py", line 103 in __init__
  File "/root/miniconda3/lib/python3.10/site-packages/vllm-0.3.3+cu122-py3.10-linux-ppc64le.egg/vllm/engine/llm_engine.py", line 146 in from_engine_args
  File "/root/miniconda3/lib/python3.10/site-packages/vllm-0.3.3+cu122-py3.10-linux-ppc64le.egg/vllm/entrypoints/llm.py", line 109 in __init__
  File "/workspace/example.py", line 10 in <module>

Extension modules: numpy.core._multiarray_umath, numpy.core._multiarray_tests, numpy.linalg._umath_linalg, numpy.fft._pocketfft_internal, numpy.random._common, numpy.random.bit_generator, numpy.random._bounded_integers, numpy.random._mt19937, numpy.random.mtrand, numpy.random._philox, numpy.random._pcg64, numpy.random._sfc64, numpy.random._generator, torch._C, torch._C._fft, torch._C._linalg, torch._C._nested, torch._C._nn, torch._C._sparse, torch._C._special, _brotli, yaml._yaml, sentencepiece._sentencepiece, psutil._psutil_linux, psutil._psutil_posix, msgpack._cmsgpack, google.protobuf.pyext._message, setproctitle, uvloop.loop, ray._raylet, grpc._cython.cygrpc, multidict._multidict, yarl._quoting_c, aiohttp._helpers, aiohttp._http_writer, aiohttp._http_parser, aiohttp._websocket, frozenlist._frozenlist, pydantic.typing, pydantic.errors, pydantic.version, pydantic.utils, pydantic.class_validators, pydantic.config, pydantic.color, pydantic.datetime_parse, pydantic.validators, pydantic.networks, pydantic.types, pydantic.json, pydantic.error_wrappers, pydantic.fields, pydantic.parse, pydantic.schema, pydantic.main, pydantic.dataclasses, pydantic.annotated_types, pydantic.decorator, pydantic.env_settings, pydantic.tools, pydantic, cupy_backends.cuda.api._runtime_enum, cupy_backends.cuda.api.runtime, cupy_backends.cuda.stream, cupy_backends.cuda.libs.cublas, cupy_backends.cuda.libs.cusolver, cupy_backends.cuda._softlink, cupy_backends.cuda.libs.cusparse, cupy._util, cupy.cuda.device, fastrlock.rlock, cupy.cuda.memory_hook, cupy.cuda.graph, cupy.cuda.stream, cupy_backends.cuda.api._driver_enum, cupy_backends.cuda.api.driver, cupy.cuda.memory, cupy._core.internal, cupy._core._carray, cupy.cuda.texture, cupy.cuda.function, cupy_backends.cuda.libs.nvrtc, cupy.cuda.jitify, cupy.cuda.pinned_memory, cupy_backends.cuda.libs.curand, cupy_backends.cuda.libs.profiler, cupy.cuda.common, cupy.cuda.cub, cupy_backends.cuda.libs.nvtx, cupy.cuda.thrust, cupy._core._dtype, cupy._core._scalar, cupy._core._accelerator, cupy._core._memory_range, cupy._core._fusion_thread_local, cupy._core._kernel, cupy._core._routines_manipulation, cupy._core._optimize_config, cupy._core._cub_reduction, cupy._core._reduction, cupy._core._routines_binary, cupy._core._routines_math, cupy._core._routines_indexing, cupy._core._routines_linalg, cupy._core._routines_logic, cupy._core._routines_sorting, cupy._core._routines_statistics, cupy._core.dlpack, cupy._core.flags, cupy._core.core, cupy._core._fusion_variable, cupy._core._fusion_trace, cupy._core._fusion_kernel, cupy._core.new_fusion, cupy._core.fusion, cupy._core.raw, cupyx.cusolver, scipy._lib._ccallback_c, numpy.linalg.lapack_lite, scipy.sparse._sparsetools, _csparsetools, scipy.sparse._csparsetools, scipy.linalg._fblas, scipy.linalg._flapack, scipy.linalg.cython_lapack, scipy.linalg._cythonized_array_utils, scipy.linalg._solve_toeplitz, scipy.linalg._flinalg, scipy.linalg._decomp_lu_cython, scipy.linalg._matfuncs_sqrtm_triu, scipy.linalg.cython_blas, scipy.linalg._matfuncs_expm, scipy.linalg._decomp_update, scipy.sparse.linalg._dsolve._superlu, scipy.sparse.linalg._eigen.arpack._arpack, scipy.sparse.csgraph._tools, scipy.sparse.csgraph._shortest_path, scipy.sparse.csgraph._traversal, scipy.sparse.csgraph._min_spanning_tree, scipy.sparse.csgraph._flow, scipy.sparse.csgraph._matching, scipy.sparse.csgraph._reordering, cupy.cuda.cufft, cupy.fft._cache, cupy.fft._callback, cupy.random._generator_api, cupy.random._bit_generator, scipy._lib._uarray._uarray, scipy.special._ufuncs_cxx, scipy.special._ufuncs, scipy.special._specfun, scipy.special._comb, scipy.special._ellip_harm_2, cupy.lib._polynomial, cupy_backends.cuda.libs.nccl, zstandard.backend_c, scipy.optimize._minpack2, scipy.optimize._group_columns, scipy._lib.messagestream, scipy.optimize._trlib._trlib, scipy.optimize._lbfgsb, _moduleTNC, scipy.optimize._moduleTNC, scipy.optimize._cobyla, scipy.optimize._slsqp, scipy.optimize._minpack, scipy.optimize._lsq.givens_elimination, scipy.optimize._zeros, scipy.optimize._highs.cython.src._highs_wrapper, scipy.optimize._highs._highs_wrapper, scipy.optimize._highs.cython.src._highs_constants, scipy.optimize._highs._highs_constants, scipy.linalg._interpolative, scipy.optimize._bglu_dense, scipy.optimize._lsap, scipy.spatial._ckdtree, scipy.spatial._qhull, scipy.spatial._voronoi, scipy.spatial._distance_wrap, scipy.spatial._hausdorff, scipy.spatial.transform._rotation, scipy.optimize._direct (total: 182)
Aborted (core dumped)

Thank you. let me know if i can give anymore details.

@llvm/issue-subscribers-mlir

Author: Navin Kumar M (NavinKumarMNK)

### Your current environment ```text root@0fca177ad2d4:/workspace# python3 collect_env.py Collecting environment information... PyTorch version: 2.1.2 Is debug build: False CUDA used to build PyTorch: 12.2 ROCM used to build PyTorch: N/A OS: Ubuntu 22.04.4 LTS (ppc64le) GCC version: (Ubuntu 11.4.0-1ubuntu1~22.04) 11.4.0 Clang version: Could not collect CMake version: version 3.29.0 Libc version: glibc-2.35 Python version: 3.10.13 | packaged by conda-forge | (main, Dec 23 2023, 16:04:32) [GCC 12.3.0] (64-bit runtime) Python platform: Linux-5.15.0-100-generic-ppc64le-with-glibc2.35 Is CUDA available: True CUDA runtime version: 12.2.91 CUDA_MODULE_LOADING set to: LAZY GPU models and configuration: GPU 0: Tesla V100-SXM2-32GB GPU 1: Tesla V100-SXM2-32GB GPU 2: Tesla V100-SXM2-32GB GPU 3: Tesla V100-SXM2-32GB Nvidia driver version: 535.161.07 cuDNN version: Probably one of the following: /usr/local/cuda-12.2/targets/ppc64le-linux/lib/libcudnn.so.8.9.5 /usr/local/cuda-12.2/targets/ppc64le-linux/lib/libcudnn_adv_infer.so.8.9.5 /usr/local/cuda-12.2/targets/ppc64le-linux/lib/libcudnn_adv_train.so.8.9.5 /usr/local/cuda-12.2/targets/ppc64le-linux/lib/libcudnn_cnn_infer.so.8.9.5 /usr/local/cuda-12.2/targets/ppc64le-linux/lib/libcudnn_cnn_train.so.8.9.5 /usr/local/cuda-12.2/targets/ppc64le-linux/lib/libcudnn_ops_infer.so.8.9.5 /usr/local/cuda-12.2/targets/ppc64le-linux/lib/libcudnn_ops_train.so.8.9.5 HIP runtime version: N/A MIOpen runtime version: N/A Is XNNPACK available: False CPU: Architecture: ppc64le Byte Order: Little Endian CPU(s): 128 On-line CPU(s) list: 0-127 Model name: POWER9, altivec supported Model: 2.2 (pvr 004e 1202) Thread(s) per core: 4 Core(s) per socket: 16 Socket(s): 2 Frequency boost: enabled CPU max MHz: 3800.0000 CPU min MHz: 2300.0000 L1d cache: 1 MiB (32 instances) L1i cache: 1 MiB (32 instances) L2 cache: 8 MiB (16 instances) L3 cache: 160 MiB (16 instances) NUMA node(s): 6 NUMA node0 CPU(s): 0-63 NUMA node8 CPU(s): 64-127 NUMA node252 CPU(s): NUMA node253 CPU(s): NUMA node254 CPU(s): NUMA node255 CPU(s): Vulnerability Gather data sampling: Not affected Vulnerability Itlb multihit: Not affected Vulnerability L1tf: Mitigation; RFI Flush, L1D private per thread Vulnerability Mds: Not affected Vulnerability Meltdown: Mitigation; RFI Flush, L1D private per thread Vulnerability Mmio stale data: Not affected Vulnerability Retbleed: Not affected Vulnerability Spec rstack overflow: Not affected Vulnerability Spec store bypass: Mitigation; Kernel entry/exit barrier (eieio) Vulnerability Spectre v1: Mitigation; __user pointer sanitization, ori31 speculation barrier enabled Vulnerability Spectre v2: Mitigation; Indirect branch serialisation (kernel only) Vulnerability Srbds: Not affected Vulnerability Tsx async abort: Not affected Versions of relevant libraries: [pip3] numpy==1.24.3 [pip3] torch==2.1.2 [conda] cudatoolkit 11.8.0 hedcfb66_13 conda-forge [conda] libmagma 2.7.2 he288b6c_2 conda-forge [conda] libmagma_sparse 2.7.2 h5b5c57a_3 conda-forge [conda] magma 2.7.2 h097a1ca_3 conda-forge [conda] numpy 1.24.3 py310h87cc683_0 [conda] numpy-base 1.24.3 py310hac71eb6_0 [conda] torch 2.1.2 dev_0 <develop>ROCM Version: Could not collect Neuron SDK Version: N/A vLLM Version: 0.3.3 vLLM Build Flags: CUDA Archs: 7.0; ROCm: Disabled; Neuron: Disabled GPU Topology: GPU0 GPU1 GPU2 GPU3 CPU Affinity NUMA Affinity GPU NUMA ID GPU0 X NV3 SYS SYS 0-63 0 N/A GPU1 NV3 X SYS SYS 0-63 0 N/A GPU2 SYS SYS X NV3 64-127 8 N/A GPU3 SYS SYS NV3 X 64-127 8 N/A Legend: X = Self SYS = Connection traversing PCIe as well as the SMP interconnect between NUMA nodes (e.g., QPI/UPI) NODE = Connection traversing PCIe as well as the interconnect between PCIe Host Bridges within a NUMA node PHB = Connection traversing PCIe as well as a PCIe Host Bridge (typically the CPU) PXB = Connection traversing multiple PCIe bridges (without traversing the PCIe Host Bridge) PIX = Connection traversing at most a single PCIe bridge NV# = Connection traversing a bonded set of # NVLinks ``` ```bash $ llvm-config --version 17.0.0git ``` llvm build commit : c5dede880d175f7229c9b2923f4753e12702305d build command ```bash cmake -G Ninja ../llvm \ -DLLVM_ENABLE_PROJECTS="mlir;llvm" \ -DLLVM_BUILD_EXAMPLES=ON \ -DLLVM_TARGETS_TO_BUILD="PowerPC;NVPTX;X86;AMDGPU;RISCV" \ -DMLIR_ENABLE_CUDA_RUNNER=ON \ -DCMAKE_BUILD_TYPE=Release \ -DLLVM_ENABLE_ASSERTIONS=ON \ -DCMAKE_C_COMPILER=clang \ -DCMAKE_CXX_COMPILER=clang++ \ -DLLVM_ENABLE_RTTI=ON \ -DLLVM_INSTALL_UTILS=ON \ -DMLIR_INCLUDE_INTEGRATION_TESTS=ON ``` ### Bug example.py. - i loaded the mixtral-8x7b-instruct fp16 model ```python from vllm import LLM, SamplingParams prompts = [ "Hello, my name is", "The president of the United States is", "The capital of France is", "The future of AI is", ] sampling_params = SamplingParams(temperature=0.8, top_p=0.95) llm = LLM( model="./models", dtype="float16", tensor_parallel_size=4, enforce_eager=True, trust_remote_code=True, load_format='safetensors', # quantization="AWQ", ) ``` ```bash root@0fca177ad2d4:/workspace# python3 example.py WARNING 03-29 15:24:46 config.py:686] Casting torch.bfloat16 to torch.float16. 2024-03-29 15:24:48,678 INFO worker.py:1612 -- Started a local Ray instance. View the dashboard at 127.0.0.1:8265 INFO 03-29 15:24:52 llm_engine.py:68] Initializing an LLM engine (v0.3.3) with config: model='./models', tokenizer='./models', tokenizer_mode=auto, revision=None, tokenizer_revision=None, trust_remote_code=True, dtype=torch.float16, max_seq_len=32768, download_dir=None, load_format=safetensors, tensor_parallel_size=4, disable_custom_all_reduce=True, quantization=None, enforce_eager=True, kv_cache_dtype=auto, device_config=cuda, seed=0) INFO 03-29 15:25:07 attention.py:67] flash_attn is not found. Using xformers backend. (RayWorkerVllm pid=37294) INFO 03-29 15:25:07 attention.py:67] flash_attn is not found. Using xformers backend. INFO 03-29 15:25:27 model_runner.py:97] Loading model weights took 21.7573 GB (RayWorkerVllm pid=37294) INFO 03-29 15:25:39 model_runner.py:97] Loading model weights took 21.7573 GB (RayWorkerVllm pid=37345) INFO 03-29 15:25:07 attention.py:67] flash_attn is not found. Using xformers backend. [repeated 2x across cluster] (Ray deduplicates logs by default. Set RAY_DEDUP_LOGS=0 to disable log deduplication, or see https://docs.ray.io/en/master/ray-observability/ray-logging.html#log-deduplication for more options.) python3: /root/llvm-project/mlir/lib/Analysis/SliceAnalysis.cpp:106: void getBackwardSliceImpl(mlir::Operation *, SetVector<mlir::Operation *> *, mlir::TransitiveFilter): Assertion `parentOp->getNumRegions() == 1 && parentOp->getRegion(0).getBlocks().size() == 1' failed. *** SIGABRT received at time=1711725941 on cpu 45 *** PC: @ 0x7e79d800866c (unknown) pthread_kill @ 0x7e7143545984 613083184 absl::lts_20220623::AbslFailureSignalHandler() @ 0x7e79d800870c 224 pthread_kill @ 0x7e79d7fa1dfc 48 raise @ 0x7e79d7f7d260 336 abort @ 0x7e79d7f94ef0 192 (unknown) @ 0x7e79d7f94f94 64 __assert_fail @ 0x7e755cc4c8b8 112 getBackwardSliceImpl() @ 0x7e755cc4c6f0 112 getBackwardSliceImpl() @ 0x7e755cc4c5a8 64 mlir::getBackwardSlice() @ 0x7e755c78bc10 384 mlir::multiRootGetSlice() @ 0x7e755b235e7c 608 CoalescePass::getCoalescedEncoding() @ 0x7e755b2375d8 256 CoalescePass::runOnOperation()::{lambda()#1}::operator()() @ 0x7e755b238be0 480 mlir::detail::walk<>() @ 0x7e755b238eac 320 CoalescePass::runOnOperation() @ 0x7e755bd83c54 416 mlir::detail::OpToOpPassAdaptor::run() @ 0x7e755bd84650 160 mlir::detail::OpToOpPassAdaptor::runPipeline() @ 0x7e755bd878b0 368 mlir::PassManager::run() @ 0x7e75599cf9cc 128 pybind11::cpp_function::initialize<>()::{lambda()#3}::_FUN() @ 0x7e75599bacd4 848 pybind11::cpp_function::dispatcher() @ 0x18e02ca5e40 112 cfunction_call @ 0x18e02a3bc2c 160 _PyObject_MakeTpCall @ 0x18e02c80134 160 method_vectorcall @ 0x18e02a25738 480 _PyEval_EvalFrameDefault @ 0x18e02b2a974 64 _PyEval_Vector @ 0x18e02a3b9a0 32 _PyFunction_Vectorcall @ 0x18e02a22b6c 480 _PyEval_EvalFrameDefault @ 0x18e02b2a974 64 _PyEval_Vector @ 0x18e02a3b9a0 32 _PyFunction_Vectorcall @ 0x18e02a22b6c 480 _PyEval_EvalFrameDefault @ 0x18e02b2a974 64 _PyEval_Vector @ 0x18e02a3b9a0 32 _PyFunction_Vectorcall @ 0x18e02a22224 480 _PyEval_EvalFrameDefault @ ... and at least 196 more frames [2024-03-29 15:25:41,577 E 30164 30164] logging.cc:361: *** SIGABRT received at time=1711725941 on cpu 45 *** [2024-03-29 15:25:41,577 E 30164 30164] logging.cc:361: PC: @ 0x7e79d800866c (unknown) pthread_kill [2024-03-29 15:25:41,581 E 30164 30164] logging.cc:361: @ 0x7e71435459b8 613083184 absl::lts_20220623::AbslFailureSignalHandler() [2024-03-29 15:25:41,581 E 30164 30164] logging.cc:361: @ 0x7e79d800870c 224 pthread_kill [2024-03-29 15:25:41,581 E 30164 30164] logging.cc:361: @ 0x7e79d7fa1dfc 48 raise [2024-03-29 15:25:41,581 E 30164 30164] logging.cc:361: @ 0x7e79d7f7d260 336 abort [2024-03-29 15:25:41,581 E 30164 30164] logging.cc:361: @ 0x7e79d7f94ef0 192 (unknown) [2024-03-29 15:25:41,581 E 30164 30164] logging.cc:361: @ 0x7e79d7f94f94 64 __assert_fail [2024-03-29 15:25:41,581 E 30164 30164] logging.cc:361: @ 0x7e755cc4c8b8 112 getBackwardSliceImpl() [2024-03-29 15:25:41,581 E 30164 30164] logging.cc:361: @ 0x7e755cc4c6f0 112 getBackwardSliceImpl() [2024-03-29 15:25:41,581 E 30164 30164] logging.cc:361: @ 0x7e755cc4c5a8 64 mlir::getBackwardSlice() [2024-03-29 15:25:41,581 E 30164 30164] logging.cc:361: @ 0x7e755c78bc10 384 mlir::multiRootGetSlice() [2024-03-29 15:25:41,581 E 30164 30164] logging.cc:361: @ 0x7e755b235e7c 608 CoalescePass::getCoalescedEncoding() [2024-03-29 15:25:41,581 E 30164 30164] logging.cc:361: @ 0x7e755b2375d8 256 CoalescePass::runOnOperation()::{lambda()#1}::operator()() [2024-03-29 15:25:41,581 E 30164 30164] logging.cc:361: @ 0x7e755b238be0 480 mlir::detail::walk<>() [2024-03-29 15:25:41,581 E 30164 30164] logging.cc:361: @ 0x7e755b238eac 320 CoalescePass::runOnOperation() [2024-03-29 15:25:41,581 E 30164 30164] logging.cc:361: @ 0x7e755bd83c54 416 mlir::detail::OpToOpPassAdaptor::run() [2024-03-29 15:25:41,581 E 30164 30164] logging.cc:361: @ 0x7e755bd84650 160 mlir::detail::OpToOpPassAdaptor::runPipeline() [2024-03-29 15:25:41,581 E 30164 30164] logging.cc:361: @ 0x7e755bd878b0 368 mlir::PassManager::run() [2024-03-29 15:25:41,581 E 30164 30164] logging.cc:361: @ 0x7e75599cf9cc 128 pybind11::cpp_function::initialize<>()::{lambda()#3}::_FUN() [2024-03-29 15:25:41,581 E 30164 30164] logging.cc:361: @ 0x7e75599bacd4 848 pybind11::cpp_function::dispatcher() [2024-03-29 15:25:41,581 E 30164 30164] logging.cc:361: @ 0x18e02ca5e40 112 cfunction_call [2024-03-29 15:25:41,581 E 30164 30164] logging.cc:361: @ 0x18e02a3bc2c 160 _PyObject_MakeTpCall [2024-03-29 15:25:41,581 E 30164 30164] logging.cc:361: @ 0x18e02c80134 160 method_vectorcall [2024-03-29 15:25:41,581 E 30164 30164] logging.cc:361: @ 0x18e02a25738 480 _PyEval_EvalFrameDefault [2024-03-29 15:25:41,581 E 30164 30164] logging.cc:361: @ 0x18e02b2a974 64 _PyEval_Vector [2024-03-29 15:25:41,581 E 30164 30164] logging.cc:361: @ 0x18e02a3b9a0 32 _PyFunction_Vectorcall [2024-03-29 15:25:41,582 E 30164 30164] logging.cc:361: @ 0x18e02a22b6c 480 _PyEval_EvalFrameDefault [2024-03-29 15:25:41,582 E 30164 30164] logging.cc:361: @ 0x18e02b2a974 64 _PyEval_Vector [2024-03-29 15:25:41,582 E 30164 30164] logging.cc:361: @ 0x18e02a3b9a0 32 _PyFunction_Vectorcall [2024-03-29 15:25:41,582 E 30164 30164] logging.cc:361: @ 0x18e02a22b6c 480 _PyEval_EvalFrameDefault [2024-03-29 15:25:41,582 E 30164 30164] logging.cc:361: @ 0x18e02b2a974 64 _PyEval_Vector [2024-03-29 15:25:41,582 E 30164 30164] logging.cc:361: @ 0x18e02a3b9a0 32 _PyFunction_Vectorcall [2024-03-29 15:25:41,582 E 30164 30164] logging.cc:361: @ 0x18e02a22224 480 _PyEval_EvalFrameDefault [2024-03-29 15:25:41,582 E 30164 30164] logging.cc:361: @ ... and at least 196 more frames Fatal Python error: Aborted Stack (most recent call first): File "/root/triton/python/triton/compiler/compiler.py", line 91 in optimize_ttgir File "/root/triton/python/triton/compiler/compiler.py", line 383 in <lambda> File "/root/triton/python/triton/compiler/compiler.py", line 476 in compile File "<string>", line 63 in fused_moe_kernel File "/root/miniconda3/lib/python3.10/site-packages/vllm-0.3.3+cu122-py3.10-linux-ppc64le.egg/vllm/model_executor/layers/fused_moe/fused_moe.py", line 222 in invoke_fused_moe_kernel File "/root/miniconda3/lib/python3.10/site-packages/vllm-0.3.3+cu122-py3.10-linux-ppc64le.egg/vllm/model_executor/layers/fused_moe/fused_moe.py", line 397 in fused_moe File "/root/miniconda3/lib/python3.10/site-packages/vllm-0.3.3+cu122-py3.10-linux-ppc64le.egg/vllm/model_executor/models/mixtral.py", line 131 in forward File "/root/miniconda3/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1527 in _call_impl File "/root/miniconda3/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1518 in _wrapped_call_impl File "/root/miniconda3/lib/python3.10/site-packages/vllm-0.3.3+cu122-py3.10-linux-ppc64le.egg/vllm/model_executor/models/mixtral.py", line 278 in forward File "/root/miniconda3/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1527 in _call_impl File "/root/miniconda3/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1518 in _wrapped_call_impl File "/root/miniconda3/lib/python3.10/site-packages/vllm-0.3.3+cu122-py3.10-linux-ppc64le.egg/vllm/model_executor/models/mixtral.py", line 319 in forward File "/root/miniconda3/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1527 in _call_impl File "/root/miniconda3/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1518 in _wrapped_call_impl File "/root/miniconda3/lib/python3.10/site-packages/vllm-0.3.3+cu122-py3.10-linux-ppc64le.egg/vllm/model_executor/models/mixtral.py", line 383 in forward File "/root/miniconda3/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1527 in _call_impl File "/root/miniconda3/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1518 in _wrapped_call_impl File "/root/miniconda3/lib/python3.10/site-packages/vllm-0.3.3+cu122-py3.10-linux-ppc64le.egg/vllm/worker/model_runner.py", line 606 in execute_model File "/root/miniconda3/lib/python3.10/site-packages/torch/utils/_contextlib.py", line 115 in decorate_context File "/root/miniconda3/lib/python3.10/site-packages/vllm-0.3.3+cu122-py3.10-linux-ppc64le.egg/vllm/worker/model_runner.py", line 677 in profile_run File "/root/miniconda3/lib/python3.10/site-packages/torch/utils/_contextlib.py", line 115 in decorate_context File "/root/miniconda3/lib/python3.10/site-packages/vllm-0.3.3+cu122-py3.10-linux-ppc64le.egg/vllm/worker/worker.py", line 122 in profile_num_available_blocks File "/root/miniconda3/lib/python3.10/site-packages/torch/utils/_contextlib.py", line 115 in decorate_context File "/root/miniconda3/lib/python3.10/site-packages/vllm-0.3.3+cu122-py3.10-linux-ppc64le.egg/vllm/executor/ray_gpu_executor.py", line 318 in _run_workers File "/root/miniconda3/lib/python3.10/site-packages/vllm-0.3.3+cu122-py3.10-linux-ppc64le.egg/vllm/executor/ray_gpu_executor.py", line 221 in _init_cache File "/root/miniconda3/lib/python3.10/site-packages/vllm-0.3.3+cu122-py3.10-linux-ppc64le.egg/vllm/executor/ray_gpu_executor.py", line 63 in __init__ File "/root/miniconda3/lib/python3.10/site-packages/vllm-0.3.3+cu122-py3.10-linux-ppc64le.egg/vllm/engine/llm_engine.py", line 103 in __init__ File "/root/miniconda3/lib/python3.10/site-packages/vllm-0.3.3+cu122-py3.10-linux-ppc64le.egg/vllm/engine/llm_engine.py", line 146 in from_engine_args File "/root/miniconda3/lib/python3.10/site-packages/vllm-0.3.3+cu122-py3.10-linux-ppc64le.egg/vllm/entrypoints/llm.py", line 109 in __init__ File "/workspace/example.py", line 10 in <module> Extension modules: numpy.core._multiarray_umath, numpy.core._multiarray_tests, numpy.linalg._umath_linalg, numpy.fft._pocketfft_internal, numpy.random._common, numpy.random.bit_generator, numpy.random._bounded_integers, numpy.random._mt19937, numpy.random.mtrand, numpy.random._philox, numpy.random._pcg64, numpy.random._sfc64, numpy.random._generator, torch._C, torch._C._fft, torch._C._linalg, torch._C._nested, torch._C._nn, torch._C._sparse, torch._C._special, _brotli, yaml._yaml, sentencepiece._sentencepiece, psutil._psutil_linux, psutil._psutil_posix, msgpack._cmsgpack, google.protobuf.pyext._message, setproctitle, uvloop.loop, ray._raylet, grpc._cython.cygrpc, multidict._multidict, yarl._quoting_c, aiohttp._helpers, aiohttp._http_writer, aiohttp._http_parser, aiohttp._websocket, frozenlist._frozenlist, pydantic.typing, pydantic.errors, pydantic.version, pydantic.utils, pydantic.class_validators, pydantic.config, pydantic.color, pydantic.datetime_parse, pydantic.validators, pydantic.networks, pydantic.types, pydantic.json, pydantic.error_wrappers, pydantic.fields, pydantic.parse, pydantic.schema, pydantic.main, pydantic.dataclasses, pydantic.annotated_types, pydantic.decorator, pydantic.env_settings, pydantic.tools, pydantic, cupy_backends.cuda.api._runtime_enum, cupy_backends.cuda.api.runtime, cupy_backends.cuda.stream, cupy_backends.cuda.libs.cublas, cupy_backends.cuda.libs.cusolver, cupy_backends.cuda._softlink, cupy_backends.cuda.libs.cusparse, cupy._util, cupy.cuda.device, fastrlock.rlock, cupy.cuda.memory_hook, cupy.cuda.graph, cupy.cuda.stream, cupy_backends.cuda.api._driver_enum, cupy_backends.cuda.api.driver, cupy.cuda.memory, cupy._core.internal, cupy._core._carray, cupy.cuda.texture, cupy.cuda.function, cupy_backends.cuda.libs.nvrtc, cupy.cuda.jitify, cupy.cuda.pinned_memory, cupy_backends.cuda.libs.curand, cupy_backends.cuda.libs.profiler, cupy.cuda.common, cupy.cuda.cub, cupy_backends.cuda.libs.nvtx, cupy.cuda.thrust, cupy._core._dtype, cupy._core._scalar, cupy._core._accelerator, cupy._core._memory_range, cupy._core._fusion_thread_local, cupy._core._kernel, cupy._core._routines_manipulation, cupy._core._optimize_config, cupy._core._cub_reduction, cupy._core._reduction, cupy._core._routines_binary, cupy._core._routines_math, cupy._core._routines_indexing, cupy._core._routines_linalg, cupy._core._routines_logic, cupy._core._routines_sorting, cupy._core._routines_statistics, cupy._core.dlpack, cupy._core.flags, cupy._core.core, cupy._core._fusion_variable, cupy._core._fusion_trace, cupy._core._fusion_kernel, cupy._core.new_fusion, cupy._core.fusion, cupy._core.raw, cupyx.cusolver, scipy._lib._ccallback_c, numpy.linalg.lapack_lite, scipy.sparse._sparsetools, _csparsetools, scipy.sparse._csparsetools, scipy.linalg._fblas, scipy.linalg._flapack, scipy.linalg.cython_lapack, scipy.linalg._cythonized_array_utils, scipy.linalg._solve_toeplitz, scipy.linalg._flinalg, scipy.linalg._decomp_lu_cython, scipy.linalg._matfuncs_sqrtm_triu, scipy.linalg.cython_blas, scipy.linalg._matfuncs_expm, scipy.linalg._decomp_update, scipy.sparse.linalg._dsolve._superlu, scipy.sparse.linalg._eigen.arpack._arpack, scipy.sparse.csgraph._tools, scipy.sparse.csgraph._shortest_path, scipy.sparse.csgraph._traversal, scipy.sparse.csgraph._min_spanning_tree, scipy.sparse.csgraph._flow, scipy.sparse.csgraph._matching, scipy.sparse.csgraph._reordering, cupy.cuda.cufft, cupy.fft._cache, cupy.fft._callback, cupy.random._generator_api, cupy.random._bit_generator, scipy._lib._uarray._uarray, scipy.special._ufuncs_cxx, scipy.special._ufuncs, scipy.special._specfun, scipy.special._comb, scipy.special._ellip_harm_2, cupy.lib._polynomial, cupy_backends.cuda.libs.nccl, zstandard.backend_c, scipy.optimize._minpack2, scipy.optimize._group_columns, scipy._lib.messagestream, scipy.optimize._trlib._trlib, scipy.optimize._lbfgsb, _moduleTNC, scipy.optimize._moduleTNC, scipy.optimize._cobyla, scipy.optimize._slsqp, scipy.optimize._minpack, scipy.optimize._lsq.givens_elimination, scipy.optimize._zeros, scipy.optimize._highs.cython.src._highs_wrapper, scipy.optimize._highs._highs_wrapper, scipy.optimize._highs.cython.src._highs_constants, scipy.optimize._highs._highs_constants, scipy.linalg._interpolative, scipy.optimize._bglu_dense, scipy.optimize._lsap, scipy.spatial._ckdtree, scipy.spatial._qhull, scipy.spatial._voronoi, scipy.spatial._distance_wrap, scipy.spatial._hausdorff, scipy.spatial.transform._rotation, scipy.optimize._direct (total: 182) Aborted (core dumped) ``` Thank you. let me know if i can give anymore details.

llvm / llvm-project

Assertion `parentOp->getNumRegions() == 1 && parentOp->getRegion(0).getBlocks().size() == 1' failed #87089

Your current environment

Bug