[FA][Upstream PT] `XPU out of memory` raised by FA kernel with upstream pytorch

ESI-SYD commented 2 months ago

flash attention benchmark fails with changes to use upstream pytorch.

It should be a torch issue.

Traceback (most recent call last):
  File "/runner/_work/intel-xpu-backend-for-triton/intel-xpu-backend-for-triton/benchmarks/key_benchmarks/flash_attention_fwd_benchmark.py", line 245, in <module>
    benchmark.run(show_plots=False, print_data=True)
  File "/runner/_work/intel-xpu-backend-for-triton/intel-xpu-backend-for-triton/benchmarks/key_benchmarks/triton_kernels_benchmark/benchmark_testing.py", line 249, in run
    result_dfs.append(self._run(bench, save_path, show_plots, print_data, **kwargs))
  File "/runner/_work/intel-xpu-backend-for-triton/intel-xpu-backend-for-triton/benchmarks/key_benchmarks/triton_kernels_benchmark/benchmark_testing.py", line 179, in _run
    ret = self.fn(**x_args, **{bench.line_arg: y}, **bench.args, **kwrags)
  File "/runner/_work/intel-xpu-backend-for-triton/intel-xpu-backend-for-triton/benchmarks/key_benchmarks/flash_attention_fwd_benchmark.py", line 228, in benchmark
    benchmark_suit.assert_close(triton_fn(), torch_fn(), atol=atol, rtol=1e-3, err_msg="triton to torch")
  File "/runner/_work/intel-xpu-backend-for-triton/intel-xpu-backend-for-triton/benchmarks/key_benchmarks/flash_attention_fwd_benchmark.py", line 225, in <lambda>
    torch_fn = lambda: torch.nn.functional.scaled_dot_product_attention(
RuntimeError: XPU out of memory, please use `empty_cache` to release all unoccupied cached memory.

CI: https://github.com/intel/intel-xpu-backend-for-triton/actions/runs/10609254853/job/29404643614

Repro: use this poc branch feature/deprecate_benchmark_ipex

scripts/compile-triton.sh --venv
source .venv/bin/activate
scripts/test-triton.sh --attention

ESI-SYD commented 2 months ago

torch.xpu.empty_cache not helps: tracked in https://github.com/pytorch/pytorch/issues/135085

anmyachev commented 2 months ago

Probably https://github.com/pytorch/pytorch/pull/135818 relates to this issue

intel / intel-xpu-backend-for-triton

[FA][Upstream PT] `XPU out of memory` raised by FA kernel with upstream pytorch #2042