Closed vlad-penkin closed 2 days ago
The issue is still reproducible with the new Agama Rolling 881.19
The issue is still reproducible with open-linux-driver-ci-dev_igc-17139.
After https://github.com/intel/intel-xpu-backend-for-triton/pull/853, there are two more failures, likely due to the same problem:
FAILED operators/test_matmul.py::test_op[128-256-64-1-8-3-256-512-160-True-False-float32-float32-None-True-None-None] - AssertionError: Tensor-likes are not close!
FAILED operators/test_matmul.py::test_op[128-256-64-1-8-3-256-512-160-False-False-float32-float32-None-True-None-None] - AssertionError: Tensor-likes are not close!
- May I know what is corresponding agama version of 1.0.16510.18?
- May I know the target branch or commit? @whitneywhtsang @vlad-penkin
You can check the IGC version by dpkg -l | grep libigc1
. For this particular issue, it starts to fail in agama 881.12.
Please check if it passes on the pre-release driver 914.16 at Triton commit 61042a1031e97d2f0b39139ba324f8dc5e8294b3 with https://github.com/intel/intel-xpu-backend-for-triton/pull/1443.
_ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _
self = <triton.compiler.compiler.CompiledKernel object at 0x7f3a94586a10>
def _init_handles(self):
if self.module is not None:
return
device = driver.active.get_current_device()
# create launcher
self.run = driver.active.launcher_cls(self.src, self.metadata)
# not enough shared memory to run the kernel
max_shared = driver.active.utils.get_device_properties(device)["max_shared_mem"]
if self.metadata.shared > max_shared:
raise OutOfResources(self.metadata.shared, max_shared, "shared memory")
# TODO: n_regs, n_spills should be metadata generated when calling `ptxas`
self.module, self.function, self.n_regs, self.n_spills = driver.active.utils.load_binary(
self.name, self.kernel, self.metadata.shared, device) E RuntimeError: Triton Error [ZE]: 0x78000018
python/triton/compiler/compiler.py:376: RuntimeError ============================================================== warnings summary ============================================================== ../../mambaforge/envs/junhui-py310/lib/python3.10/site-packages/intel_extension_for_pytorch/cpu/tpp/init.py:1 /home/lijunhui/mambaforge/envs/junhui-py310/lib/python3.10/site-packages/intel_extension_for_pytorch/cpu/tpp/init.py:1: DeprecationWarning: pkg_resources is deprecated as an API. See https://setuptools.pypa.io/en/latest/pkg_resources.html import pkg_resources
-- Docs: https://docs.pytest.org/en/stable/how-to/capture-warnings.html ========================================================== short test summary info =========================================================== FAILED python/test/unit/operators/test_matmul.py::test_op[128-256-64-1-8-3-256-512-160-True-True-float32-float32-None-True-None-None] - RuntimeError: Triton Error [ZE]: 0x78000018 ======================================================= 1 failed, 1 warning in 12.77s ========================================================
- May I know what is corresponding agama version of 1.0.16510.18?
- May I know the target branch or commit? @whitneywhtsang @vlad-penkin
You can check the IGC version by
dpkg -l | grep libigc1
. For this particular issue, it starts to fail in agama 881.12. Please check if it passes on the pre-release driver 914.16 at Triton commit 61042a1 with #1443.
I followed the setup descripted here, and the 3 matmul tests are passing on 914.16 driver + 0.5.2 PTDB.
Could we automate these kinds of test, like: auto detect and switch drivers, CIs etc, then auto send the results via mail?
IGC version - 1.0.16510.18 Test variant - [128-256-64-1-8-3-256-512-160-True-True-float32-float32-None-True-None-None]