triton-lang / triton

Development repository for the Triton language and compiler
https://triton-lang.org/
MIT License
13.15k stars 1.61k forks source link

Segfault on mixed mm #3882

Open williamwen42 opened 5 months ago

williamwen42 commented 5 months ago

Repro command: gdb --args python -m pytest test/inductor/test_torchinductor.py::CpuTests::test_multi_gpu_recompile_on_index_cpu test/inductor/test_torchinductor.py::GPUTests::test_mixed_mm_cuda (may have to run on a debug Python 3.12 build)

gdb backtrace:

0x00007ff8b4c5b51f in mlir::detail::OperandStorage::OperandStorage(mlir::Operation*, mlir::OpOperand*, mlir::ValueRange) () from /data/users/williamwen/triton/python/triton/_C/libtriton.so
(gdb) bt
#0  0x00007ff8b4c5b51f in mlir::detail::OperandStorage::OperandStorage(mlir::Operation*, mlir::OpOperand*, mlir::ValueRange) () from /data/users/williamwen/triton/python/triton/_C/libtriton.so
#1  0x00007ff8b4c4e867 in mlir::Operation::create(mlir::Location, mlir::OperationName, mlir::TypeRange, mlir::ValueRange, mlir::DictionaryAttr, mlir::OpaqueProperties, mlir::BlockRange, unsigned int) () from /data/users/williamwen/triton/python/triton/_C/libtriton.so
#2  0x00007ff8b4c4e087 in mlir::Operation::create(mlir::Location, mlir::OperationName, mlir::TypeRange, mlir::ValueRange, mlir::NamedAttrList&&, mlir::OpaqueProperties, mlir::BlockRange, mlir::RegionRange) () from /data/users/williamwen/triton/python/triton/_C/libtriton.so
#3  0x00007ff8b4c4df04 in mlir::Operation::create(mlir::OperationState const&) () from /data/users/williamwen/triton/python/triton/_C/libtriton.so
#4  0x00007ff8b4b9a170 in mlir::OpBuilder::create(mlir::OperationState const&) () from /data/users/williamwen/triton/python/triton/_C/libtriton.so
#5  0x00007ff8b48910c5 in mlir::OpBuilder::create<mlir::LLVM::InsertElementOp, mlir::Value&, mlir::LLVM::ExtractElementOp, mlir::Value> (location=..., this=0x7ff842dfbea0) at /home/williamwen/.triton/llvm/llvm-ed4e505c-ubuntu-x64/include/mlir/IR/Value.h:615
#6  MMA16816SmemLoader::loadX4 (this=this@entry=0x7ff842dfaea0, batch=batch@entry=0, mat0=0, mat1=0, ptrs=..., matTy=..., matTy@entry=..., shemTy=...) at /data/users/williamwen/triton/third_party/nvidia/lib/TritonNVIDIAGPUToLLVM/ConvertLayoutOpToLLVM/SharedToDotOperandMMAv2.cpp:418
#7  0x00007ff8b489223a in operator() (__closure=0x7ff75c0c3980, batch=0, a=0, b=0) at /home/williamwen/.triton/llvm/llvm-ed4e505c-ubuntu-x64/include/llvm/ADT/SmallVector.h:91
#8  0x00007ff8b488f5b3 in std::function<void (int, int, int)>::operator()(int, int, int) const (__args#2=0, __args#1=0, __args#0=0, this=0x7ff842dfb140) at /usr/include/c++/11/bits/std_function.h:586
#9  loadArg (rewriter=..., loc=..., loc@entry=..., descTy=descTy@entry=..., encoding=..., encoding@entry=..., smemObj=..., typeConverter=typeConverter@entry=0x7ff842dfc9b0, thread=..., isA=true)
    at /data/users/williamwen/triton/third_party/nvidia/lib/TritonNVIDIAGPUToLLVM/ConvertLayoutOpToLLVM/SharedToDotOperandMMAv2.cpp:647
#10 0x00007ff8b488fb15 in SharedToDotOperandMMAv2::convertLayout (opIdx=0, rewriter=..., loc=loc@entry=..., tensor=<error reading variable: dwarf2_find_location_expression: Corrupted DWARF expression.>, encoding=..., smemObj=..., typeConverter=0x7ff842dfc9b0, thread=...)
    at /data/users/williamwen/triton/third_party/nvidia/lib/TritonNVIDIAGPUToLLVM/ConvertLayoutOpToLLVM/SharedToDotOperandMMAv2.cpp:779
#11 0x00007ff8b489e3a7 in (anonymous namespace)::LocalLoadOpConversion::lowerSharedToDotOperandMMA (this=0x7ff842dfb6d0, isOuter=false, dotOperandLayout=..., mmaLayout=..., rewriter=..., typeConverter=0x7ff842dfc9b0, adaptor=..., op=...)
    at /data/users/williamwen/triton/third_party/nvidia/lib/TritonNVIDIAGPUToLLVM/ConvertLayoutOpToLLVM.cpp:89
#12 (anonymous namespace)::LocalLoadOpConversion::lowerSharedToDotOperand (rewriter=..., typeConverter=0x7ff842dfc9b0, adaptor=..., op=..., this=0x7ff842dfb6d0) at /data/users/williamwen/triton/third_party/nvidia/lib/TritonNVIDIAGPUToLLVM/ConvertLayoutOpToLLVM.cpp:131
#13 (anonymous namespace)::LocalLoadOpConversion::matchAndRewrite (this=this@entry=0x7ff75c0c3230, op=op@entry=..., adaptor=..., rewriter=...) at /data/users/williamwen/triton/third_party/nvidia/lib/TritonNVIDIAGPUToLLVM/ConvertLayoutOpToLLVM.cpp:64
#14 0x00007ff8b482986e in mlir::ConvertOpToLLVMPattern<mlir::triton::gpu::LocalLoadOp>::matchAndRewrite (this=0x7ff75c0c3230, op=0x7ff75c0a8cb0, operands=..., rewriter=...) at /home/williamwen/.triton/llvm/llvm-ed4e505c-ubuntu-x64/include/mlir/IR/OpDefinition.h:1716
#15 0x00007ff8b64295d0 in mlir::ConversionPattern::matchAndRewrite(mlir::Operation*, mlir::PatternRewriter&) const () from /data/users/williamwen/triton/python/triton/_C/libtriton.so
#16 0x00007ff8b6471a3b in mlir::PatternApplicator::matchAndRewrite(mlir::Operation*, mlir::PatternRewriter&, llvm::function_ref<bool (mlir::Pattern const&)>, llvm::function_ref<void (mlir::Pattern const&)>, llvm::function_ref<mlir::LogicalResult (mlir::Pattern const&)>)::$_2::operator()() const () from /data/users/williamwen/triton/python/triton/_C/libtriton.so
#17 0x00007ff8b646e55f in mlir::PatternApplicator::matchAndRewrite(mlir::Operation*, mlir::PatternRewriter&, llvm::function_ref<bool (mlir::Pattern const&)>, llvm::function_ref<void (mlir::Pattern const&)>, llvm::function_ref<mlir::LogicalResult (mlir::Pattern const&)>) ()
   from /data/users/williamwen/triton/python/triton/_C/libtriton.so
#18 0x00007ff8b642a4d9 in (anonymous namespace)::OperationLegalizer::legalize(mlir::Operation*, mlir::ConversionPatternRewriter&) () from /data/users/williamwen/triton/python/triton/_C/libtriton.so
#19 0x00007ff8b6429674 in mlir::OperationConverter::convert(mlir::ConversionPatternRewriter&, mlir::Operation*) () from /data/users/williamwen/triton/python/triton/_C/libtriton.so
#20 0x00007ff8b642a8d1 in mlir::OperationConverter::convertOperations(llvm::ArrayRef<mlir::Operation*>) () from /data/users/williamwen/triton/python/triton/_C/libtriton.so
#21 0x00007ff8b643238b in mlir::applyPartialConversion(mlir::Operation*, mlir::ConversionTarget const&, mlir::FrozenRewritePatternSet const&, mlir::ConversionConfig) () from /data/users/williamwen/triton/python/triton/_C/libtriton.so
#22 0x00007ff8b48ef47a in (anonymous namespace)::ConvertTritonGPUToLLVM::runOnOperation (this=0x7ff75c0038a0) at /home/williamwen/.triton/llvm/llvm-ed4e505c-ubuntu-x64/include/llvm/ADT/ArrayRef.h:70
#23 0x00007ff8b4c99966 in mlir::detail::OpToOpPassAdaptor::run(mlir::Pass*, mlir::Operation*, mlir::AnalysisManager, bool, unsigned int) () from /data/users/williamwen/triton/python/triton/_C/libtriton.so
#24 0x00007ff8b4c9a2b1 in mlir::detail::OpToOpPassAdaptor::runPipeline(mlir::OpPassManager&, mlir::Operation*, mlir::AnalysisManager, bool, unsigned int, mlir::PassInstrumentor*, mlir::PassInstrumentation::PipelineParentInfo const*) ()
   from /data/users/williamwen/triton/python/triton/_C/libtriton.so
#25 0x00007ff8b4c9ca7b in mlir::PassManager::run(mlir::Operation*) () from /data/users/williamwen/triton/python/triton/_C/libtriton.so
#26 0x00007ff8b49b2695 in operator() (self=..., mod=..., __closure=<error reading variable: dwarf2_find_location_expression: Corrupted DWARF expression.>) at /home/williamwen/.triton/llvm/llvm-ed4e505c-ubuntu-x64/include/mlir/IR/OpDefinition.h:108
#27 0x00007ff8b49be0d8 in pybind11::detail::argument_loader<mlir::PassManager&, mlir::ModuleOp&>::call_impl<void, init_triton_ir(pybind11::module&&)::<lambda(mlir::PassManager&, mlir::ModuleOp&)>&, 0, 1, pybind11::detail::void_type> (f=..., this=0x7ff842dfd840)
    at /home/williamwen/.triton/pybind11/pybind11-2.11.1/include/pybind11/detail/../detail/type_caster_base.h:1136
#28 pybind11::detail::argument_loader<mlir::PassManager&, mlir::ModuleOp&>::call<void, pybind11::detail::void_type, init_triton_ir(pybind11::module&&)::<lambda(mlir::PassManager&, mlir::ModuleOp&)>&> (f=..., this=0x7ff842dfd840)
    at /home/williamwen/.triton/pybind11/pybind11-2.11.1/include/pybind11/detail/../cast.h:1454
#29 operator() (__closure=0x0, __closure@entry=<error reading variable: dwarf2_find_location_expression: Corrupted DWARF expression.>, call=..., call@entry=<error reading variable: dwarf2_find_location_expression: Corrupted DWARF expression.>)
    at /home/williamwen/.triton/pybind11/pybind11-2.11.1/include/pybind11/pybind11.h:254
#30 _FUN () at /home/williamwen/.triton/pybind11/pybind11-2.11.1/include/pybind11/pybind11.h:224
#31 0x00007ff8b499fb5f in pybind11::cpp_function::dispatcher (self=<optimized out>, args_in=args_in@entry=0x7fff70e386e0, kwargs_in=kwargs_in@entry=0x0) at /home/williamwen/.triton/pybind11/pybind11-2.11.1/include/pybind11/pybind11.h:946
--Type <RET> for more, q to quit, c to continue without paging--
#32 0x00000000005259c1 in cfunction_call (func=func@entry=0x7fffb2f3f710, args=args@entry=0x7fff70e386e0, kwargs=kwargs@entry=0x0) at Objects/methodobject.c:537
#33 0x00000000004c1327 in _PyObject_MakeTpCall (tstate=tstate@entry=0xf2a0e5d0, callable=callable@entry=0x7fffb2f3f710, args=args@entry=0x7fffea8c0a00, nargs=<optimized out>, keywords=keywords@entry=0x0) at Objects/call.c:240
#34 0x00000000004c1708 in _PyObject_VectorcallTstate (tstate=0xf2a0e5d0, callable=0x7fffb2f3f710, args=0x7fffea8c0a00, nargsf=<optimized out>, kwnames=0x0) at ./Include/internal/pycore_call.h:90
#35 0x00000000004c1730 in PyObject_Vectorcall (callable=callable@entry=0x7fffb2f3f710, args=args@entry=0x7fffea8c0a00, nargsf=<optimized out>, kwnames=kwnames@entry=0x0) at Objects/call.c:325
#36 0x000000000060cf62 in _PyEval_EvalFrameDefault (tstate=0xf2a0e5d0, frame=0x7fffea8c0948, throwflag=0) at Python/bytecodes.c:2706
#37 0x000000000061421d in _PyEval_EvalFrame (throwflag=0, frame=<optimized out>, tstate=0xf2a0e5d0) at ./Include/internal/pycore_ceval.h:91
#38 _PyEval_Vector (tstate=0xf2a0e5d0, func=0x7fffb031c710, locals=locals@entry=0x0, args=0x7fffea8c0928, argcount=4, kwnames=0x0) at Python/ceval.c:1683
#39 0x00000000004c1136 in _PyFunction_Vectorcall (func=<optimized out>, stack=<optimized out>, nargsf=<optimized out>, kwnames=<optimized out>) at Objects/call.c:419
#40 0x00000000004c1633 in _PyObject_VectorcallTstate (tstate=0xf2a0e5d0, callable=0x7fffb031c710, args=0x7fffea8c0928, nargsf=9223372036854775812, kwnames=0x0) at ./Include/internal/pycore_call.h:92
#41 0x00000000004c1730 in PyObject_Vectorcall (callable=callable@entry=0x7fffb031c710, args=args@entry=0x7fffea8c0928, nargsf=<optimized out>, kwnames=kwnames@entry=0x0) at Objects/call.c:325
#42 0x000000000060cf62 in _PyEval_EvalFrameDefault (tstate=0xf2a0e5d0, frame=0x7fffea8c08b0, throwflag=0) at Python/bytecodes.c:2706
#43 0x000000000061421d in _PyEval_EvalFrame (throwflag=0, frame=<optimized out>, tstate=0xf2a0e5d0) at ./Include/internal/pycore_ceval.h:91
...

Segfault output:

Fatal Python error: Segmentation fault

Thread 0x00007ff837e00640 (most recent call first):
  <no Python frame>

Thread 0x00007ff83aa00640 (most recent call first):
  File "/home/williamwen/local/installs/python3.12/debug/install/lib/python3.12/collections/__init__.py", line 447 in _make
  File "/home/williamwen/local/installs/python3.12/debug/install/lib/python3.12/tokenize.py", line 538 in _generate_tokens_from_c_tokenizer
  File "/home/williamwen/local/installs/python3.12/debug/install/lib/python3.12/inspect.py", line 1241 in getblock
  File "/home/williamwen/local/installs/python3.12/debug/install/lib/python3.12/inspect.py", line 1274 in getsourcelines
  File "/data/users/williamwen/triton/python/triton/compiler/code_generator.py", line 84 in _get_fn_file_line
  File "/data/users/williamwen/triton/python/triton/compiler/code_generator.py", line 1288 in ast_to_ttir
  File "/data/users/williamwen/triton/python/triton/compiler/compiler.py", line 113 in make_ir
  File "/data/users/williamwen/triton/python/triton/compiler/compiler.py", line 276 in compile
  File "/data/users/williamwen/pytorch/torch/_inductor/runtime/triton_heuristics.py", line 397 in _precompile_config
  File "/data/users/williamwen/pytorch/torch/_inductor/runtime/triton_heuristics.py", line 203 in precompile
  File "/data/users/williamwen/pytorch/torch/_inductor/autotune_process.py", line 643 in precompile
  File "/data/users/williamwen/pytorch/torch/_inductor/select_algorithm.py", line 781 in precompile
  File "/data/users/williamwen/pytorch/torch/_inductor/select_algorithm.py", line 1031 in precompile_with_captured_stdout
  File "/data/users/williamwen/pytorch/torch/_inductor/select_algorithm.py", line 1035 in <lambda>
  File "/home/williamwen/local/installs/python3.12/debug/install/lib/python3.12/concurrent/futures/thread.py", line 58 in run
  File "/home/williamwen/local/installs/python3.12/debug/install/lib/python3.12/concurrent/futures/thread.py", line 92 in _worker
  File "/home/williamwen/local/installs/python3.12/debug/install/lib/python3.12/threading.py", line 1010 in run
  File "/home/williamwen/local/installs/python3.12/debug/install/lib/python3.12/threading.py", line 1073 in _bootstrap_inner
  File "/home/williamwen/local/installs/python3.12/debug/install/lib/python3.12/threading.py", line 1030 in _bootstrap

Thread 0x00007ff83b400640 (most recent call first):
  File "/home/williamwen/local/installs/python3.12/debug/install/lib/python3.12/ast.py", line 52 in parse
  File "/data/users/williamwen/triton/python/triton/runtime/jit.py", line 794 in parse
  File "/data/users/williamwen/triton/python/triton/compiler/code_generator.py", line 1294 in ast_to_ttir
  File "/data/users/williamwen/triton/python/triton/compiler/compiler.py", line 113 in make_ir
  File "/data/users/williamwen/triton/python/triton/compiler/compiler.py", line 276 in compile
  File "/data/users/williamwen/pytorch/torch/_inductor/runtime/triton_heuristics.py", line 397 in _precompile_config
  File "/data/users/williamwen/pytorch/torch/_inductor/runtime/triton_heuristics.py", line 203 in precompile
  File "/data/users/williamwen/pytorch/torch/_inductor/autotune_process.py", line 643 in precompile
  File "/data/users/williamwen/pytorch/torch/_inductor/select_algorithm.py", line 781 in precompile
  File "/data/users/williamwen/pytorch/torch/_inductor/select_algorithm.py", line 1031 in precompile_with_captured_stdout
  File "/data/users/williamwen/pytorch/torch/_inductor/select_algorithm.py", line 1035 in <lambda>
  File "/home/williamwen/local/installs/python3.12/debug/install/lib/python3.12/concurrent/futures/thread.py", line 58 in run
  File "/home/williamwen/local/installs/python3.12/debug/install/lib/python3.12/concurrent/futures/thread.py", line 92 in _worker
  File "/home/williamwen/local/installs/python3.12/debug/install/lib/python3.12/threading.py", line 1010 in run
  File "/home/williamwen/local/installs/python3.12/debug/install/lib/python3.12/threading.py", line 1073 in _bootstrap_inner
  File "/home/williamwen/local/installs/python3.12/debug/install/lib/python3.12/threading.py", line 1030 in _bootstrap

Thread 0x00007ff83be00640 (most recent call first):
  File "/data/users/williamwen/triton/python/triton/compiler/code_generator.py", line 1198 in visit
  File "/home/williamwen/local/installs/python3.12/debug/install/lib/python3.12/ast.py", line 417 in generic_visit
  File "/data/users/williamwen/triton/python/triton/compiler/code_generator.py", line 1163 in visit_Expr
  File "/home/williamwen/local/installs/python3.12/debug/install/lib/python3.12/ast.py", line 407 in visit
  File "/data/users/williamwen/triton/python/triton/compiler/code_generator.py", line 1201 in visit
  File "/data/users/williamwen/triton/python/triton/compiler/code_generator.py", line 351 in visit_compound_statement
  File "/data/users/williamwen/triton/python/triton/compiler/code_generator.py", line 443 in visit_FunctionDef
  File "/home/williamwen/local/installs/python3.12/debug/install/lib/python3.12/ast.py", line 407 in visit
  File "/data/users/williamwen/triton/python/triton/compiler/code_generator.py", line 1201 in visit
  File "/home/williamwen/local/installs/python3.12/debug/install/lib/python3.12/ast.py", line 415 in generic_visit
  File "/data/users/williamwen/triton/python/triton/compiler/code_generator.py", line 359 in visit_Module
  File "/home/williamwen/local/installs/python3.12/debug/install/lib/python3.12/ast.py", line 407 in visit
  File "/data/users/williamwen/triton/python/triton/compiler/code_generator.py", line 1201 in visit
  File "/data/users/williamwen/triton/python/triton/compiler/code_generator.py", line 1294 in ast_to_ttir
  File "/data/users/williamwen/triton/python/triton/compiler/compiler.py", line 113 in make_ir
  File "/data/users/williamwen/triton/python/triton/compiler/compiler.py", line 276 in compile
  File "/data/users/williamwen/pytorch/torch/_inductor/runtime/triton_heuristics.py", line 397 in _precompile_config
  File "/data/users/williamwen/pytorch/torch/_inductor/runtime/triton_heuristics.py", line 203 in precompile
  File "/data/users/williamwen/pytorch/torch/_inductor/autotune_process.py", line 643 in precompile
  File "/data/users/williamwen/pytorch/torch/_inductor/select_algorithm.py", line 781 in precompile
  File "/data/users/williamwen/pytorch/torch/_inductor/select_algorithm.py", line 1031 in precompile_with_captured_stdout
  File "/data/users/williamwen/pytorch/torch/_inductor/select_algorithm.py", line 1035 in <lambda>
  File "/home/williamwen/local/installs/python3.12/debug/install/lib/python3.12/concurrent/futures/thread.py", line 58 in run
  File "/home/williamwen/local/installs/python3.12/debug/install/lib/python3.12/concurrent/futures/thread.py", line 92 in _worker
  File "/home/williamwen/local/installs/python3.12/debug/install/lib/python3.12/threading.py", line 1010 in run
  File "/home/williamwen/local/installs/python3.12/debug/install/lib/python3.12/threading.py", line 1073 in _bootstrap_inner
  File "/home/williamwen/local/installs/python3.12/debug/install/lib/python3.12/threading.py", line 1030 in _bootstrap

Thread 0x00007ff83ea00640 (most recent call first):
  File "/data/users/williamwen/triton/python/triton/runtime/cache.py", line 123 in put
  File "/data/users/williamwen/triton/python/triton/compiler/compiler.py", line 284 in compile
  File "/data/users/williamwen/pytorch/torch/_inductor/runtime/triton_heuristics.py", line 397 in _precompile_config
  File "/data/users/williamwen/pytorch/torch/_inductor/runtime/triton_heuristics.py", line 203 in precompile
  File "/data/users/williamwen/pytorch/torch/_inductor/autotune_process.py", line 643 in precompile
  File "/data/users/williamwen/pytorch/torch/_inductor/select_algorithm.py", line 781 in precompile
  File "/data/users/williamwen/pytorch/torch/_inductor/select_algorithm.py", line 1031 in precompile_with_captured_stdout
  File "/data/users/williamwen/pytorch/torch/_inductor/select_algorithm.py", line 1035 in <lambda>
  File "/home/williamwen/local/installs/python3.12/debug/install/lib/python3.12/concurrent/futures/thread.py", line 58 in run
  File "/home/williamwen/local/installs/python3.12/debug/install/lib/python3.12/concurrent/futures/thread.py", line 92 in _worker
  File "/home/williamwen/local/installs/python3.12/debug/install/lib/python3.12/threading.py", line 1010 in run
  File "/home/williamwen/local/installs/python3.12/debug/install/lib/python3.12/threading.py", line 1073 in _bootstrap_inner
  File "/home/williamwen/local/installs/python3.12/debug/install/lib/python3.12/threading.py", line 1030 in _bootstrap

Thread 0x00007ff83f400640 (most recent call first):
  File "/data/users/williamwen/triton/python/triton/runtime/cache.py", line 122 in put
  File "/data/users/williamwen/triton/python/triton/compiler/compiler.py", line 284 in compile
  File "/data/users/williamwen/pytorch/torch/_inductor/runtime/triton_heuristics.py", line 397 in _precompile_config
  File "/data/users/williamwen/pytorch/torch/_inductor/runtime/triton_heuristics.py", line 203 in precompile
  File "/data/users/williamwen/pytorch/torch/_inductor/autotune_process.py", line 643 in precompile
  File "/data/users/williamwen/pytorch/torch/_inductor/select_algorithm.py", line 781 in precompile
  File "/data/users/williamwen/pytorch/torch/_inductor/select_algorithm.py", line 1031 in precompile_with_captured_stdout
  File "/data/users/williamwen/pytorch/torch/_inductor/select_algorithm.py", line 1035 in <lambda>
  File "/home/williamwen/local/installs/python3.12/debug/install/lib/python3.12/concurrent/futures/thread.py", line 58 in run
  File "/home/williamwen/local/installs/python3.12/debug/install/lib/python3.12/concurrent/futures/thread.py", line 92 in _worker
  File "/home/williamwen/local/installs/python3.12/debug/install/lib/python3.12/threading.py", line 1010 in run
  File "/home/williamwen/local/installs/python3.12/debug/install/lib/python3.12/threading.py", line 1073 in _bootstrap_inner
  File "/home/williamwen/local/installs/python3.12/debug/install/lib/python3.12/threading.py", line 1030 in _bootstrap

Thread 0x00007ff83fe00640 (most recent call first):
  File "/data/users/williamwen/triton/python/triton/runtime/cache.py", line 122 in put
  File "/data/users/williamwen/triton/python/triton/compiler/compiler.py", line 284 in compile
  File "/data/users/williamwen/pytorch/torch/_inductor/runtime/triton_heuristics.py", line 397 in _precompile_config
  File "/data/users/williamwen/pytorch/torch/_inductor/runtime/triton_heuristics.py", line 203 in precompile
  File "/data/users/williamwen/pytorch/torch/_inductor/autotune_process.py", line 643 in precompile
  File "/data/users/williamwen/pytorch/torch/_inductor/select_algorithm.py", line 781 in precompile
  File "/data/users/williamwen/pytorch/torch/_inductor/select_algorithm.py", line 1031 in precompile_with_captured_stdout
  File "/data/users/williamwen/pytorch/torch/_inductor/select_algorithm.py", line 1035 in <lambda>
  File "/home/williamwen/local/installs/python3.12/debug/install/lib/python3.12/concurrent/futures/thread.py", line 58 in run
  File "/home/williamwen/local/installs/python3.12/debug/install/lib/python3.12/concurrent/futures/thread.py", line 92 in _worker
  File "/home/williamwen/local/installs/python3.12/debug/install/lib/python3.12/threading.py", line 1010 in run
  File "/home/williamwen/local/installs/python3.12/debug/install/lib/python3.12/threading.py", line 1073 in _bootstrap_inner
  File "/home/williamwen/local/installs/python3.12/debug/install/lib/python3.12/threading.py", line 1030 in _bootstrap

Current thread 0x00007ff842e00640 (most recent call first):
  File "/data/users/williamwen/triton/python/triton/backends/nvidia/compiler.py", line 212 in make_llir
  File "/data/users/williamwen/triton/python/triton/backends/nvidia/compiler.py", line 302 in <lambda>
  File "/data/users/williamwen/triton/python/triton/compiler/compiler.py", line 282 in compile
  File "/data/users/williamwen/pytorch/torch/_inductor/runtime/triton_heuristics.py", line 397 in _precompile_config
  File "/data/users/williamwen/pytorch/torch/_inductor/runtime/triton_heuristics.py", line 203 in precompile
  File "/data/users/williamwen/pytorch/torch/_inductor/autotune_process.py", line 643 in precompile
  File "/data/users/williamwen/pytorch/torch/_inductor/select_algorithm.py", line 781 in precompile
  File "/data/users/williamwen/pytorch/torch/_inductor/select_algorithm.py", line 1031 in precompile_with_captured_stdout
  File "/data/users/williamwen/pytorch/torch/_inductor/select_algorithm.py", line 1035 in <lambda>
  File "/home/williamwen/local/installs/python3.12/debug/install/lib/python3.12/concurrent/futures/thread.py", line 58 in run
  File "/home/williamwen/local/installs/python3.12/debug/install/lib/python3.12/concurrent/futures/thread.py", line 92 in _worker
  File "/home/williamwen/local/installs/python3.12/debug/install/lib/python3.12/threading.py", line 1010 in run
  File "/home/williamwen/local/installs/python3.12/debug/install/lib/python3.12/threading.py", line 1073 in _bootstrap_inner
  File "/home/williamwen/local/installs/python3.12/debug/install/lib/python3.12/threading.py", line 1030 in _bootstrap

Thread 0x00007ffff7ed6740 (most recent call first):
  File "/home/williamwen/local/installs/python3.12/debug/install/lib/python3.12/threading.py", line 355 in wait
  File "/home/williamwen/local/installs/python3.12/debug/install/lib/python3.12/threading.py", line 655 in wait
  File "/home/williamwen/local/installs/python3.12/debug/install/lib/python3.12/threading.py", line 997 in start
  File "/home/williamwen/local/installs/python3.12/debug/install/lib/python3.12/concurrent/futures/thread.py", line 202 in _adjust_thread_count
  File "/home/williamwen/local/installs/python3.12/debug/install/lib/python3.12/concurrent/futures/thread.py", line 179 in submit
  File "/home/williamwen/local/installs/python3.12/debug/install/lib/python3.12/concurrent/futures/_base.py", line 608 in map
  File "/data/users/williamwen/pytorch/torch/_inductor/select_algorithm.py", line 1034 in precompile
  File "/data/users/williamwen/pytorch/torch/_inductor/select_algorithm.py", line 1107 in __call__
  File "/data/users/williamwen/pytorch/torch/_inductor/select_algorithm.py", line 1405 in autotune_select_algorithm
  File "/data/users/williamwen/pytorch/torch/_inductor/kernel/mm.py", line 398 in tuned_mixed_mm
  File "/data/users/williamwen/pytorch/torch/_inductor/fx_passes/post_grad.py", line 312 in mixed_mm
  File "/data/users/williamwen/pytorch/torch/_inductor/graph.py", line 923 in call_function
  File "/data/users/williamwen/pytorch/torch/fx/interpreter.py", line 202 in run_node
  File "/data/users/williamwen/pytorch/torch/_inductor/graph.py", line 1199 in run_node
  File "/data/users/williamwen/pytorch/torch/fx/interpreter.py", line 145 in run
  File "/data/users/williamwen/pytorch/torch/_inductor/graph.py", line 730 in run
  File "/data/users/williamwen/pytorch/torch/_dynamo/utils.py", line 273 in time_wrapper
  File "/data/users/williamwen/pytorch/torch/_inductor/compile_fx.py", line 784 in fx_codegen_and_compile
  File "/home/williamwen/local/installs/python3.12/debug/install/lib/python3.12/contextlib.py", line 81 in inner
  File "/data/users/williamwen/pytorch/torch/_inductor/codecache.py", line 987 in load
  File "/data/users/williamwen/pytorch/torch/_inductor/compile_fx.py", line 498 in compile_fx_inner
  File "/data/users/williamwen/pytorch/torch/_dynamo/utils.py", line 273 in time_wrapper
  File "/home/williamwen/local/installs/python3.12/debug/install/lib/python3.12/contextlib.py", line 81 in inner
  File "/home/williamwen/local/installs/python3.12/debug/install/lib/python3.12/contextlib.py", line 81 in inner
  File "/data/users/williamwen/pytorch/torch/_inductor/debug.py", line 304 in inner
  File "/data/users/williamwen/pytorch/torch/_dynamo/repro/after_aot.py", line 83 in debug_wrapper
  File "/data/users/williamwen/pytorch/torch/_inductor/compile_fx.py", line 1382 in fw_compiler_base
  File "/data/users/williamwen/pytorch/torch/_dynamo/utils.py", line 273 in time_wrapper
  File "/data/users/williamwen/pytorch/torch/_functorch/_aot_autograd/jit_compile_runtime_wrappers.py", line 169 in aot_dispatch_base
  File "/data/users/williamwen/pytorch/torch/_functorch/_aot_autograd/runtime_wrappers.py", line 672 in aot_wrapper_synthetic_base
  File "/data/users/williamwen/pytorch/torch/_functorch/_aot_autograd/runtime_wrappers.py", line 470 in aot_wrapper_dedupe
  File "/data/users/williamwen/pytorch/torch/_functorch/aot_autograd.py", line 685 in create_aot_dispatcher_function
  File "/data/users/williamwen/pytorch/torch/_dynamo/utils.py", line 273 in time_wrapper
  File "/data/users/williamwen/pytorch/torch/_functorch/aot_autograd.py", line 958 in aot_module_simplified
  File "/data/users/williamwen/pytorch/torch/_dynamo/backends/common.py", line 65 in compiler_fn
  File "/data/users/williamwen/pytorch/torch/_inductor/compile_fx.py", line 1478 in compile_fx
  File "/home/williamwen/local/installs/python3.12/debug/install/lib/python3.12/contextlib.py", line 81 in inner
  File "/data/users/williamwen/pytorch/test/inductor/test_torchinductor.py", line 434 in compile_fx_wrapper
  File "/data/users/williamwen/pytorch/torch/_dynamo/repro/after_dynamo.py", line 127 in debug_wrapper
  File "/data/users/williamwen/pytorch/torch/_dynamo/repro/after_dynamo.py", line 127 in debug_wrapper
  File "/data/users/williamwen/pytorch/torch/_dynamo/output_graph.py", line 1368 in call_user_compiler
  File "/data/users/williamwen/pytorch/torch/_dynamo/utils.py", line 273 in time_wrapper
  File "/data/users/williamwen/pytorch/torch/_dynamo/output_graph.py", line 1296 in compile_and_call_fx_graph
  File "/home/williamwen/local/installs/python3.12/debug/install/lib/python3.12/contextlib.py", line 81 in inner
  File "/data/users/williamwen/pytorch/torch/_dynamo/output_graph.py", line 1079 in compile_subgraph
  File "/data/users/williamwen/pytorch/torch/_dynamo/symbolic_convert.py", line 2409 in _return
  File "/data/users/williamwen/pytorch/torch/_dynamo/symbolic_convert.py", line 2424 in RETURN_VALUE
  File "/data/users/williamwen/pytorch/torch/_dynamo/symbolic_convert.py", line 799 in step
  File "/data/users/williamwen/pytorch/torch/_dynamo/symbolic_convert.py", line 884 in run
  File "/data/users/williamwen/pytorch/torch/_dynamo/symbolic_convert.py", line 2234 in run
  File "/data/users/williamwen/pytorch/torch/_dynamo/convert_frame.py", line 517 in transform
  File "/data/users/williamwen/pytorch/torch/_dynamo/convert_frame.py", line 172 in _fn
  File "/data/users/williamwen/pytorch/torch/_dynamo/bytecode_transformation.py", line 1167 in transform_code_object
  File "/data/users/williamwen/pytorch/torch/_dynamo/convert_frame.py", line 570 in compile_inner
  File "/data/users/williamwen/pytorch/torch/_dynamo/utils.py", line 273 in time_wrapper
  File "/data/users/williamwen/pytorch/torch/_dynamo/convert_frame.py", line 703 in _compile
  File "/home/williamwen/local/installs/python3.12/debug/install/lib/python3.12/contextlib.py", line 81 in inner
  File "/data/users/williamwen/pytorch/torch/_utils_internal.py", line 70 in wrapper_function
  File "/data/users/williamwen/pytorch/torch/_dynamo/convert_frame.py", line 410 in _convert_frame_assert
  File "/data/users/williamwen/pytorch/torch/_dynamo/convert_frame.py", line 981 in catch_errors
  File "/data/users/williamwen/pytorch/torch/_dynamo/eval_frame.py", line 420 in _fn
  File "/data/users/williamwen/pytorch/test/inductor/test_torchinductor.py", line 442 in check_model
  File "/data/users/williamwen/pytorch/test/inductor/test_torchinductor.py", line 595 in check_model_gpu
  File "/home/williamwen/local/installs/python3.12/debug/install/lib/python3.12/contextlib.py", line 81 in inner
  File "/data/users/williamwen/pytorch/test/inductor/test_torchinductor.py", line 2507 in test_mixed_mm
  File "/home/williamwen/local/installs/python3.12/debug/install/lib/python3.12/contextlib.py", line 81 in inner
  File "/data/users/williamwen/pytorch/test/inductor/test_torchinductor.py", line 9843 in new_test
  File "/data/users/williamwen/pytorch/torch/testing/_internal/common_utils.py", line 2763 in wrapper
  File "/home/williamwen/local/installs/python3.12/debug/install/lib/python3.12/unittest/case.py", line 589 in _callTestMethod
  File "/home/williamwen/local/installs/python3.12/debug/install/lib/python3.12/unittest/case.py", line 634 in run
  File "/data/users/williamwen/pytorch/torch/testing/_internal/common_utils.py", line 2862 in _run_custom
  File "/data/users/williamwen/pytorch/torch/testing/_internal/common_utils.py", line 2890 in run
  File "/home/williamwen/local/installs/python3.12/debug/install/lib/python3.12/unittest/case.py", line 690 in __call__
  File "/data/users/williamwen/py312-debug-env/lib/python3.12/site-packages/_pytest/unittest.py", line 343 in runtest
  File "/data/users/williamwen/py312-debug-env/lib/python3.12/site-packages/_pytest/runner.py", line 173 in pytest_runtest_call
  File "/data/users/williamwen/py312-debug-env/lib/python3.12/site-packages/pluggy/_callers.py", line 103 in _multicall
  File "/data/users/williamwen/py312-debug-env/lib/python3.12/site-packages/pluggy/_manager.py", line 120 in _hookexec
  File "/data/users/williamwen/py312-debug-env/lib/python3.12/site-packages/pluggy/_hooks.py", line 513 in __call__
  File "/data/users/williamwen/py312-debug-env/lib/python3.12/site-packages/_pytest/runner.py", line 241 in <lambda>
  File "/data/users/williamwen/py312-debug-env/lib/python3.12/site-packages/_pytest/runner.py", line 341 in from_call
  File "/data/users/williamwen/py312-debug-env/lib/python3.12/site-packages/_pytest/runner.py", line 240 in call_and_report
  File "/data/users/williamwen/py312-debug-env/lib/python3.12/site-packages/_pytest/runner.py", line 135 in runtestprotocol
  File "/data/users/williamwen/py312-debug-env/lib/python3.12/site-packages/_pytest/runner.py", line 116 in pytest_runtest_protocol
  File "/data/users/williamwen/py312-debug-env/lib/python3.12/site-packages/pluggy/_callers.py", line 103 in _multicall
  File "/data/users/williamwen/py312-debug-env/lib/python3.12/site-packages/pluggy/_manager.py", line 120 in _hookexec
  File "/data/users/williamwen/py312-debug-env/lib/python3.12/site-packages/pluggy/_hooks.py", line 513 in __call__
  File "/data/users/williamwen/py312-debug-env/lib/python3.12/site-packages/_pytest/main.py", line 364 in pytest_runtestloop
  File "/data/users/williamwen/py312-debug-env/lib/python3.12/site-packages/pluggy/_callers.py", line 103 in _multicall
  File "/data/users/williamwen/py312-debug-env/lib/python3.12/site-packages/pluggy/_manager.py", line 120 in _hookexec
  File "/data/users/williamwen/py312-debug-env/lib/python3.12/site-packages/pluggy/_hooks.py", line 513 in __call__
  File "/data/users/williamwen/py312-debug-env/lib/python3.12/site-packages/_pytest/main.py", line 339 in _main
  File "/data/users/williamwen/py312-debug-env/lib/python3.12/site-packages/_pytest/main.py", line 285 in wrap_session
  File "/data/users/williamwen/py312-debug-env/lib/python3.12/site-packages/_pytest/main.py", line 332 in pytest_cmdline_main
  File "/data/users/williamwen/py312-debug-env/lib/python3.12/site-packages/pluggy/_callers.py", line 103 in _multicall
  File "/data/users/williamwen/py312-debug-env/lib/python3.12/site-packages/pluggy/_manager.py", line 120 in _hookexec
  File "/data/users/williamwen/py312-debug-env/lib/python3.12/site-packages/pluggy/_hooks.py", line 513 in __call__
  File "/data/users/williamwen/py312-debug-env/lib/python3.12/site-packages/_pytest/config/__init__.py", line 178 in main
  File "/data/users/williamwen/py312-debug-env/lib/python3.12/site-packages/_pytest/config/__init__.py", line 206 in console_main
  File "/data/users/williamwen/py312-debug-env/lib/python3.12/site-packages/pytest/__main__.py", line 7 in <module>
  File "/home/williamwen/local/installs/python3.12/debug/install/lib/python3.12/runpy.py", line 88 in _run_code
  ...

Extension modules: numpy.core._multiarray_umath, numpy.core._multiarray_tests, numpy.linalg._umath_linalg, numpy.fft._pocketfft_internal, numpy.random._common, numpy.random.bit_generator, numpy.random._bounded_integers, numpy.random._mt19937, numpy.random.mtrand, numpy.random._philox, numpy.random._pcg64, numpy.random._sfc64, numpy.random._generator, torch._C, torch._C._fft, torch._C._linalg, torch._C._nested, torch._C._nn, torch._C._sparse, torch._C._special, markupsafe._speedups, psutil._psutil_linux, psutil._psutil_posix, cuda_utils (total: 24)

Thread 68 "pt_main_thread" received signal SIGSEGV, Segmentation fault.

Notes: Running the tests on CPU does not segfault. Running test_mixed_mm_cuda alone passes (most of the time). I also see a segfault on the test_mixed_mm2_cuda test (also preceded by test_multi_gpu_recompile_on_index_cpu).

ThomasRaoux commented 5 months ago

Sharing in the IR usually makes easier for people to repro/debug, you can get it by setting export MLIR_ENABLE_DUMP=1

williamwen42 commented 5 months ago

https://gist.github.com/williamwen42/7d269e24b6f3cdb85acdef3690cc9a07

williamwen42 commented 5 months ago

Hi, any updates on this? This should be fixed ideally ahead of the PyTorch 2.4 release as we will package in a triton binary.

williamwen42 commented 5 months ago

This may be related to https://github.com/triton-lang/triton/issues/2853.

I've isolated a pytorch minimal repro: https://gist.github.com/williamwen42/b8e8bb1c70c87525430525cecd6fd85e and a triton minimal repro: https://gist.github.com/williamwen42/e75ffd8389aa4b908ffcbc00cdad5790.

This is happening beyond Python 3.12