Segmentation fault with PyTorch upstream with Agama Rolling

pbchekin commented 3 months ago

There are segmentation faults in test_dot that are reproducible with both PyTorch 2.4 and main with Agama Rolling. Example: https://github.com/intel/intel-xpu-backend-for-triton/actions/runs/10188871774/job/28185873132#step:17:24255

Current thread 0x00007f969a040200 (most recent call first):
  File "/opt/hostedtoolcache/Python/3.9.19/x64/lib/python3.9/site-packages/triton/compiler/compiler.py", line 386 in _init_handles
  File "/opt/hostedtoolcache/Python/3.9.19/x64/lib/python3.9/site-packages/triton/compiler/compiler.py", line 391 in __getattribute__
  File "/opt/hostedtoolcache/Python/3.9.19/x64/lib/python3.9/site-packages/triton/runtime/jit.py", line 672 in run
  File "/opt/hostedtoolcache/Python/3.9.19/x64/lib/python3.9/site-packages/triton/runtime/jit.py", line 326 in <lambda>
  File "/runner/_work/intel-xpu-backend-for-triton/intel-xpu-backend-for-triton/python/test/unit/language/test_core.py", line 3266 in test_dot
...

Not happening with Agama LTS (https://github.com/intel/intel-xpu-backend-for-triton/actions/runs/10188871774/job/28185872992#step:28:40).

anmyachev commented 3 months ago

@pbchekin I don't see this problem in CI anymore (ran with pytorch-upstream commit) https://github.com/intel/intel-xpu-backend-for-triton/actions/runs/10292166320/job/28485982261. Can we close this issue?

pbchekin commented 3 months ago

No longer reproducible, closing.

intel / intel-xpu-backend-for-triton

Segmentation fault with PyTorch upstream with Agama Rolling #1754