Open williamwen42 opened 5 months ago
Sharing in the IR usually makes easier for people to repro/debug, you can get it by setting export MLIR_ENABLE_DUMP=1
Hi, any updates on this? This should be fixed ideally ahead of the PyTorch 2.4 release as we will package in a triton binary.
This may be related to https://github.com/triton-lang/triton/issues/2853.
I've isolated a pytorch minimal repro: https://gist.github.com/williamwen42/b8e8bb1c70c87525430525cecd6fd85e and a triton minimal repro: https://gist.github.com/williamwen42/e75ffd8389aa4b908ffcbc00cdad5790.
This is happening beyond Python 3.12
Repro command:
gdb --args python -m pytest test/inductor/test_torchinductor.py::CpuTests::test_multi_gpu_recompile_on_index_cpu test/inductor/test_torchinductor.py::GPUTests::test_mixed_mm_cuda
(may have to run on a debug Python 3.12 build)gdb backtrace:
Segfault output:
Notes: Running the tests on CPU does not segfault. Running
test_mixed_mm_cuda
alone passes (most of the time). I also see a segfault on thetest_mixed_mm2_cuda
test (also preceded bytest_multi_gpu_recompile_on_index_cpu
).