flash attention benchmark fails with changes to use upstream pytorch.
It should be a torch issue.
Traceback (most recent call last):
File "/runner/_work/intel-xpu-backend-for-triton/intel-xpu-backend-for-triton/benchmarks/key_benchmarks/flash_attention_fwd_benchmark.py", line 245, in <module>
benchmark.run(show_plots=False, print_data=True)
File "/runner/_work/intel-xpu-backend-for-triton/intel-xpu-backend-for-triton/benchmarks/key_benchmarks/triton_kernels_benchmark/benchmark_testing.py", line 249, in run
result_dfs.append(self._run(bench, save_path, show_plots, print_data, **kwargs))
File "/runner/_work/intel-xpu-backend-for-triton/intel-xpu-backend-for-triton/benchmarks/key_benchmarks/triton_kernels_benchmark/benchmark_testing.py", line 179, in _run
ret = self.fn(**x_args, **{bench.line_arg: y}, **bench.args, **kwrags)
File "/runner/_work/intel-xpu-backend-for-triton/intel-xpu-backend-for-triton/benchmarks/key_benchmarks/flash_attention_fwd_benchmark.py", line 228, in benchmark
benchmark_suit.assert_close(triton_fn(), torch_fn(), atol=atol, rtol=1e-3, err_msg="triton to torch")
File "/runner/_work/intel-xpu-backend-for-triton/intel-xpu-backend-for-triton/benchmarks/key_benchmarks/flash_attention_fwd_benchmark.py", line 225, in <lambda>
torch_fn = lambda: torch.nn.functional.scaled_dot_product_attention(
RuntimeError: XPU out of memory, please use `empty_cache` to release all unoccupied cached memory.
flash attention benchmark fails with changes to use upstream pytorch.
It should be a torch issue.
CI: https://github.com/intel/intel-xpu-backend-for-triton/actions/runs/10609254853/job/29404643614
Repro: use this poc branch
feature/deprecate_benchmark_ipex
Related: https://github.com/intel/intel-xpu-backend-for-triton/pull/1905