intel / intel-xpu-backend-for-triton

OpenAI Triton backend for Intel® GPUs
MIT License
141 stars 43 forks source link

Investigate `test_dot` failures on A770 #983

Open alexbaden opened 6 months ago

alexbaden commented 6 months ago
=============================================== short test summary info ================================================
FAILED language/test_core.py::test_dot[1-128-128-64-4-True-False-none-tf32-int8-int8-1_1] - RuntimeError: Triton Error [ZE]: 0x78000011
FAILED language/test_core.py::test_dot[1-32-128-64-2-False-False-none-tf32-int8-int8-1_0] - AssertionError:
FAILED language/test_core.py::test_dot[1-128-128-64-4-True-False-none-tf32-float16-float32-1_0] - RuntimeError: Triton Error [ZE]: 0x78000011
FAILED language/test_core.py::test_dot[1-32-128-64-2-False-False-none-tf32-int8-int8-1_1] - AssertionError:
FAILED language/test_core.py::test_dot[1-128-128-64-4-True-False-none-tf32-float16-float32-1_1] - RuntimeError: Triton Error [ZE]: 0x78000011
FAILED language/test_core.py::test_dot[1-128-128-64-4-True-False-none-tf32-float32-float32-1_0] - RuntimeError: Triton Error [ZE]: 0x78000011
FAILED language/test_core.py::test_dot[1-128-128-64-4-True-True-none-tf32-int8-int8-1_0] - RuntimeError: Triton Error [ZE]: 0x78000011
FAILED language/test_core.py::test_dot[1-128-128-64-4-True-False-none-tf32-float32-float32-1_1] - RuntimeError: Triton Error [ZE]: 0x78000011
FAILED language/test_core.py::test_dot[1-32-128-64-2-False-False-none-tf32-float16-float32-1_0] - AssertionError:
FAILED language/test_core.py::test_dot[1-32-128-64-2-False-False-none-tf32-float16-float32-1_1] - AssertionError:
FAILED language/test_core.py::test_dot[1-128-128-64-4-True-True-none-tf32-int8-int8-1_1] - RuntimeError: Triton Error [ZE]: 0x78000011
FAILED language/test_core.py::test_dot[1-128-128-64-4-False-True-none-tf32-int8-int8-1_0] - RuntimeError: Triton Error [ZE]: 0x78000011
FAILED language/test_core.py::test_dot[1-128-128-64-4-False-True-none-tf32-int8-int8-1_1] - RuntimeError: Triton Error [ZE]: 0x78000011
FAILED language/test_core.py::test_dot[1-128-128-64-4-True-True-none-tf32-float16-float32-1_0] - RuntimeError: Triton Error [ZE]: 0x78000011
FAILED language/test_core.py::test_dot[1-128-128-64-4-True-True-none-tf32-float16-float32-1_1] - RuntimeError: Triton Error [ZE]: 0x78000011
FAILED language/test_core.py::test_dot[1-128-128-64-4-True-True-none-tf32-float32-float32-1_0] - RuntimeError: Triton Error [ZE]: 0x78000011
FAILED language/test_core.py::test_dot[1-128-128-64-4-True-True-none-tf32-float32-float32-1_1] - RuntimeError: Triton Error [ZE]: 0x78000011
FAILED language/test_core.py::test_dot[1-128-128-64-4-False-True-none-tf32-float16-float32-1_0] - RuntimeError: Triton Error [ZE]: 0x78000011
FAILED language/test_core.py::test_dot[1-128-128-64-4-True-False-none-tf32-int8-int8-1_0] - RuntimeError: Triton Error [ZE]: 0x78000011
FAILED language/test_core.py::test_dot[1-128-128-64-4-False-True-none-tf32-float16-float32-1_1] - RuntimeError: Triton Error [ZE]: 0x78000011
FAILED language/test_core.py::test_dot[1-128-128-64-4-False-True-none-tf32-float32-float32-1_0] - RuntimeError: Triton Error [ZE]: 0x78000011
FAILED language/test_core.py::test_dot[1-128-128-64-4-False-True-none-tf32-float32-float32-1_1] - RuntimeError: Triton Error [ZE]: 0x78000011
FAILED language/test_core.py::test_dot[1-128-128-64-2-True-True-none-tf32-int8-int8-1_0] - RuntimeError: Triton Error [ZE]: 0x78000011
FAILED language/test_core.py::test_dot[1-128-128-64-4-False-False-none-tf32-int8-int8-1_0] - RuntimeError: Triton Error [ZE]: 0x78000011
FAILED language/test_core.py::test_dot[1-128-128-64-4-False-False-none-tf32-int8-int8-1_1] - RuntimeError: Triton Error [ZE]: 0x78000011
FAILED language/test_core.py::test_dot[1-128-128-64-2-True-True-none-tf32-int8-int8-1_1] - RuntimeError: Triton Error [ZE]: 0x78000011
FAILED language/test_core.py::test_dot[1-128-128-64-4-False-False-none-tf32-float16-float32-1_0] - RuntimeError: Triton Error [ZE]: 0x78000011
FAILED language/test_core.py::test_dot[1-128-128-64-2-True-True-none-tf32-float16-float16-1_0] - RuntimeError: Triton Error [ZE]: 0x78000011
FAILED language/test_core.py::test_dot[1-128-128-64-4-False-False-none-tf32-float16-float32-1_1] - RuntimeError: Triton Error [ZE]: 0x78000011
FAILED language/test_core.py::test_dot[1-128-128-64-4-False-False-none-tf32-float32-float32-1_0] - RuntimeError: Triton Error [ZE]: 0x78000011
FAILED language/test_core.py::test_dot[1-128-128-64-4-False-False-none-tf32-float32-float32-1_1] - RuntimeError: Triton Error [ZE]: 0x78000011
FAILED language/test_core.py::test_dot[1-128-128-64-2-True-True-none-tf32-float16-float16-1_1] - RuntimeError: Triton Error [ZE]: 0x78000011
FAILED language/test_core.py::test_dot[1-128-128-64-2-True-True-none-tf32-float16-float32-1_0] - RuntimeError: Triton Error [ZE]: 0x78000011
FAILED language/test_core.py::test_dot[1-64-128-128-4-True-True-none-tf32-float16-float32-1_0] - RuntimeError: Triton Error [ZE]: 0x78000011
FAILED language/test_core.py::test_dot[1-64-128-128-4-True-True-none-tf32-float16-float32-1_1] - RuntimeError: Triton Error [ZE]: 0x78000011
FAILED language/test_core.py::test_dot[1-64-128-128-4-True-True-none-tf32-float32-float32-1_0] - triton.runtime.errors.OutOfResources: out of resource: shared memory, Required: 98304, Hardware limit: 65536. Reduc...
FAILED language/test_core.py::test_dot[1-64-128-128-4-True-True-none-tf32-float32-float32-1_1] - triton.runtime.errors.OutOfResources: out of resource: shared memory, Required: 98304, Hardware limit: 65536. Reduc...
FAILED language/test_core.py::test_dot[1-128-128-64-2-True-True-none-tf32-float16-float32-1_1] - RuntimeError: Triton Error [ZE]: 0x78000011
FAILED language/test_core.py::test_dot[1-64-128-128-4-True-False-none-tf32-int8-int8-1_0] - RuntimeError: Triton Error [ZE]: 0x78000011
FAILED language/test_core.py::test_dot[1-64-128-128-4-True-False-none-tf32-int8-int8-1_1] - RuntimeError: Triton Error [ZE]: 0x78000011
FAILED language/test_core.py::test_dot[1-128-128-64-2-True-True-none-tf32-float32-float32-1_0] - RuntimeError: Triton Error [ZE]: 0x78000011
FAILED language/test_core.py::test_dot[1-128-128-64-2-True-True-none-tf32-float32-float32-1_1] - RuntimeError: Triton Error [ZE]: 0x78000011
FAILED language/test_core.py::test_dot[1-64-128-128-4-True-False-none-tf32-float16-float32-1_0] - RuntimeError: Triton Error [ZE]: 0x78000011
FAILED language/test_core.py::test_dot[1-64-128-128-4-True-False-none-tf32-float16-float32-1_1] - RuntimeError: Triton Error [ZE]: 0x78000011
FAILED language/test_core.py::test_dot[1-64-128-128-4-True-False-none-tf32-float32-float32-1_0] - triton.runtime.errors.OutOfResources: out of resource: shared memory, Required: 98304, Hardware limit: 65536. Reduc...
FAILED language/test_core.py::test_dot[1-64-128-128-4-True-False-none-tf32-float32-float32-1_1] - triton.runtime.errors.OutOfResources: out of resource: shared memory, Required: 98304, Hardware limit: 65536. Reduc...
FAILED language/test_core.py::test_dot[1-128-128-64-2-True-False-none-tf32-int8-int8-1_0] - RuntimeError: Triton Error [ZE]: 0x78000011
FAILED language/test_core.py::test_dot[1-64-128-128-4-False-True-none-tf32-float16-float32-1_0] - RuntimeError: Triton Error [ZE]: 0x78000011
FAILED language/test_core.py::test_dot[1-128-128-64-2-True-False-none-tf32-int8-int8-1_1] - RuntimeError: Triton Error [ZE]: 0x78000011
FAILED language/test_core.py::test_dot[1-64-128-128-4-False-True-none-tf32-float16-float32-1_1] - RuntimeError: Triton Error [ZE]: 0x78000011
FAILED language/test_core.py::test_dot[1-64-128-128-4-False-True-none-tf32-float32-float32-1_0] - triton.runtime.errors.OutOfResources: out of resource: shared memory, Required: 98304, Hardware limit: 65536. Reduc...
FAILED language/test_core.py::test_dot[1-64-128-128-4-False-True-none-tf32-float32-float32-1_1] - triton.runtime.errors.OutOfResources: out of resource: shared memory, Required: 98304, Hardware limit: 65536. Reduc...
FAILED language/test_core.py::test_dot[1-64-128-128-4-False-False-none-tf32-int8-int8-1_0] - RuntimeError: Triton Error [ZE]: 0x78000011
FAILED language/test_core.py::test_dot[1-64-128-128-4-False-False-none-tf32-int8-int8-1_1] - RuntimeError: Triton Error [ZE]: 0x78000011
FAILED language/test_core.py::test_dot[1-128-128-64-2-True-False-none-tf32-float16-float16-1_0] - RuntimeError: Triton Error [ZE]: 0x78000011
FAILED language/test_core.py::test_dot[1-128-128-64-2-True-False-none-tf32-float16-float16-1_1] - RuntimeError: Triton Error [ZE]: 0x78000011
FAILED language/test_core.py::test_dot[1-64-128-128-4-False-False-none-tf32-float16-float32-1_0] - RuntimeError: Triton Error [ZE]: 0x78000011
FAILED language/test_core.py::test_dot[1-64-128-128-4-False-False-none-tf32-float16-float32-1_1] - RuntimeError: Triton Error [ZE]: 0x78000011
FAILED language/test_core.py::test_dot[1-64-128-128-4-False-False-none-tf32-float32-float32-1_0] - triton.runtime.errors.OutOfResources: out of resource: shared memory, Required: 98304, Hardware limit: 65536. Reduc...
FAILED language/test_core.py::test_dot[1-64-128-128-4-False-False-none-tf32-float32-float32-1_1] - triton.runtime.errors.OutOfResources: out of resource: shared memory, Required: 98304, Hardware limit: 65536. Reduc...
FAILED language/test_core.py::test_dot[1-128-128-64-2-True-False-none-tf32-float16-float32-1_0] - RuntimeError: Triton Error [ZE]: 0x78000011
FAILED language/test_core.py::test_dot[1-128-128-64-2-True-False-none-tf32-float16-float32-1_1] - RuntimeError: Triton Error [ZE]: 0x78000011
FAILED language/test_core.py::test_dot[1-128-128-64-2-True-False-none-tf32-float32-float32-1_0] - RuntimeError: Triton Error [ZE]: 0x78000011
FAILED language/test_core.py::test_dot[1-128-128-64-2-True-False-none-tf32-float32-float32-1_1] - RuntimeError: Triton Error [ZE]: 0x78000011
FAILED language/test_core.py::test_dot[1-32-128-64-2-False-True-none-tf32-int8-int8-1_0] - AssertionError:
FAILED language/test_core.py::test_dot[1-32-128-64-2-False-True-none-tf32-int8-int8-1_1] - AssertionError:
FAILED language/test_core.py::test_dot[1-32-128-64-2-False-True-none-tf32-float16-float32-1_0] - AssertionError:
FAILED language/test_core.py::test_dot[1-32-128-64-2-False-True-none-tf32-float16-float32-1_1] - AssertionError:
FAILED language/test_core.py::test_dot[1-128-128-64-2-False-True-none-tf32-int8-int8-1_0] - RuntimeError: Triton Error [ZE]: 0x78000011
FAILED language/test_core.py::test_dot[1-128-128-64-2-False-True-none-tf32-int8-int8-1_1] - RuntimeError: Triton Error [ZE]: 0x78000011
FAILED language/test_core.py::test_dot[1-128-128-64-2-False-True-none-tf32-float16-float16-1_0] - RuntimeError: Triton Error [ZE]: 0x78000011
FAILED language/test_core.py::test_dot[1-128-128-64-2-False-True-none-tf32-float16-float16-1_1] - RuntimeError: Triton Error [ZE]: 0x78000011
FAILED language/test_core.py::test_dot[1-128-128-64-2-False-True-none-tf32-float16-float32-1_0] - RuntimeError: Triton Error [ZE]: 0x78000011
FAILED language/test_core.py::test_dot[1-128-128-64-2-False-True-none-tf32-float16-float32-1_1] - RuntimeError: Triton Error [ZE]: 0x78000011
FAILED language/test_core.py::test_dot[1-128-128-64-2-False-True-none-tf32-float32-float32-1_0] - RuntimeError: Triton Error [ZE]: 0x78000011
FAILED language/test_core.py::test_dot[1-128-128-64-2-False-True-none-tf32-float32-float32-1_1] - RuntimeError: Triton Error [ZE]: 0x78000011
FAILED language/test_core.py::test_dot[1-128-128-64-2-False-False-none-tf32-int8-int8-1_0] - RuntimeError: Triton Error [ZE]: 0x78000011
FAILED language/test_core.py::test_dot[1-128-128-64-2-False-False-none-tf32-int8-int8-1_1] - RuntimeError: Triton Error [ZE]: 0x78000011
FAILED language/test_core.py::test_dot[1-128-128-64-2-False-False-none-tf32-float16-float16-1_0] - RuntimeError: Triton Error [ZE]: 0x78000011
FAILED language/test_core.py::test_dot[1-128-128-64-2-False-False-none-tf32-float16-float16-1_1] - RuntimeError: Triton Error [ZE]: 0x78000011
FAILED language/test_core.py::test_dot[1-128-128-64-2-False-False-none-tf32-float16-float32-1_0] - RuntimeError: Triton Error [ZE]: 0x78000011
FAILED language/test_core.py::test_dot[1-128-128-64-2-False-False-none-tf32-float16-float32-1_1] - RuntimeError: Triton Error [ZE]: 0x78000011
FAILED language/test_core.py::test_dot[1-128-128-64-2-False-False-none-tf32-float32-float32-1_0] - RuntimeError: Triton Error [ZE]: 0x78000011
FAILED language/test_core.py::test_dot[1-128-128-64-2-False-False-none-tf32-float32-float32-1_1] - RuntimeError: Triton Error [ZE]: 0x78000011
FAILED language/test_core.py::test_dot[1-64-128-128-2-True-True-none-tf32-int8-int8-1_0] - RuntimeError: Triton Error [ZE]: 0x78000011
FAILED language/test_core.py::test_dot[1-64-128-128-2-True-True-none-tf32-int8-int8-1_1] - RuntimeError: Triton Error [ZE]: 0x78000011
FAILED language/test_core.py::test_dot[1-64-128-128-2-True-True-none-tf32-float16-float16-1_0] - RuntimeError: Triton Error [ZE]: 0x78000011
FAILED language/test_core.py::test_dot[1-64-128-128-2-True-True-none-tf32-float16-float16-1_1] - RuntimeError: Triton Error [ZE]: 0x78000011
FAILED language/test_core.py::test_dot[1-64-128-128-2-True-True-none-tf32-float16-float32-1_0] - RuntimeError: Triton Error [ZE]: 0x78000011
FAILED language/test_core.py::test_dot[1-64-128-128-2-True-True-none-tf32-float16-float32-1_1] - RuntimeError: Triton Error [ZE]: 0x78000011
FAILED language/test_core.py::test_dot[1-64-128-128-2-True-True-none-tf32-float32-float32-1_0] - triton.runtime.errors.OutOfResources: out of resource: shared memory, Required: 98304, Hardware limit: 65536. Reduc...
FAILED language/test_core.py::test_dot[1-64-128-128-2-True-True-none-tf32-float32-float32-1_1] - triton.runtime.errors.OutOfResources: out of resource: shared memory, Required: 98304, Hardware limit: 65536. Reduc...
FAILED language/test_core.py::test_dot[1-64-128-128-2-True-False-none-tf32-int8-int8-1_0] - RuntimeError: Triton Error [ZE]: 0x78000011
FAILED language/test_core.py::test_dot[1-64-128-128-2-True-False-none-tf32-int8-int8-1_1] - RuntimeError: Triton Error [ZE]: 0x78000011
FAILED language/test_core.py::test_dot[1-64-128-128-2-True-False-none-tf32-float16-float16-1_0] - RuntimeError: Triton Error [ZE]: 0x78000011
FAILED language/test_core.py::test_dot[1-64-128-128-2-True-False-none-tf32-float16-float16-1_1] - RuntimeError: Triton Error [ZE]: 0x78000011
FAILED language/test_core.py::test_dot[1-64-128-128-2-True-False-none-tf32-float16-float32-1_0] - RuntimeError: Triton Error [ZE]: 0x78000011
FAILED language/test_core.py::test_dot[1-64-128-128-2-True-False-none-tf32-float16-float32-1_1] - RuntimeError: Triton Error [ZE]: 0x78000011
FAILED language/test_core.py::test_dot[1-64-128-128-2-True-False-none-tf32-float32-float32-1_0] - triton.runtime.errors.OutOfResources: out of resource: shared memory, Required: 98304, Hardware limit: 65536. Reduc...
FAILED language/test_core.py::test_dot[1-64-128-128-2-True-False-none-tf32-float32-float32-1_1] - triton.runtime.errors.OutOfResources: out of resource: shared memory, Required: 98304, Hardware limit: 65536. Reduc...
FAILED language/test_core.py::test_dot[1-64-128-128-2-False-True-none-tf32-int8-int8-1_0] - RuntimeError: Triton Error [ZE]: 0x78000011
FAILED language/test_core.py::test_dot[1-64-128-128-2-False-True-none-tf32-int8-int8-1_1] - RuntimeError: Triton Error [ZE]: 0x78000011
FAILED language/test_core.py::test_dot[1-64-128-128-2-False-True-none-tf32-float16-float16-1_0] - RuntimeError: Triton Error [ZE]: 0x78000011
FAILED language/test_core.py::test_dot[1-64-128-128-2-False-True-none-tf32-float16-float16-1_1] - RuntimeError: Triton Error [ZE]: 0x78000011
FAILED language/test_core.py::test_dot[1-64-128-128-2-False-True-none-tf32-float16-float32-1_0] - RuntimeError: Triton Error [ZE]: 0x78000011
FAILED language/test_core.py::test_dot[1-64-128-128-2-False-True-none-tf32-float16-float32-1_1] - RuntimeError: Triton Error [ZE]: 0x78000011
FAILED language/test_core.py::test_dot[1-64-128-128-2-False-True-none-tf32-float32-float32-1_0] - triton.runtime.errors.OutOfResources: out of resource: shared memory, Required: 98304, Hardware limit: 65536. Reduc...
FAILED language/test_core.py::test_dot[1-64-128-128-2-False-True-none-tf32-float32-float32-1_1] - triton.runtime.errors.OutOfResources: out of resource: shared memory, Required: 98304, Hardware limit: 65536. Reduc...
FAILED language/test_core.py::test_dot[1-64-128-128-2-False-False-none-tf32-int8-int8-1_0] - RuntimeError: Triton Error [ZE]: 0x78000011
FAILED language/test_core.py::test_dot[1-64-128-128-2-False-False-none-tf32-int8-int8-1_1] - RuntimeError: Triton Error [ZE]: 0x78000011
FAILED language/test_core.py::test_dot[1-64-128-128-2-False-False-none-tf32-float16-float16-1_0] - RuntimeError: Triton Error [ZE]: 0x78000011
FAILED language/test_core.py::test_dot[1-64-128-128-2-False-False-none-tf32-float16-float16-1_1] - RuntimeError: Triton Error [ZE]: 0x78000011
FAILED language/test_core.py::test_dot[1-64-128-128-2-False-False-none-tf32-float16-float32-1_0] - RuntimeError: Triton Error [ZE]: 0x78000011
FAILED language/test_core.py::test_dot[1-64-128-128-2-False-False-none-tf32-float16-float32-1_1] - RuntimeError: Triton Error [ZE]: 0x78000011
FAILED language/test_core.py::test_dot[1-64-128-128-2-False-False-none-tf32-float32-float32-1_0] - triton.runtime.errors.OutOfResources: out of resource: shared memory, Required: 98304, Hardware limit: 65536. Reduc...
FAILED language/test_core.py::test_dot[1-64-128-128-2-False-False-none-tf32-float32-float32-1_1] - triton.runtime.errors.OutOfResources: out of resource: shared memory, Required: 98304, Hardware limit: 65536. Reduc...

These are likely related to issues running fused multiply add w/out dpas.

pbchekin commented 6 months ago

Where does it come from?

triton.runtime.errors.OutOfResources: out of resource: shared memory, Required: 98304, Hardware limit: 65536

Hardware limit: 65536 looks strange, A770 should have more than that. Where did you run the tests?

alexbaden commented 6 months ago

It's shared memory - A770 only has 64K, PVC has 128K. For some of these tests we will need to reduce the size / skip based on shape, but for others we will need DPAS or some other codegen supposedly because the unrolled mma puts too much register pressure on the GPU.

alexbaden commented 4 months ago

A770 is not using DPAS for any of the test_dot kernels - they are all fully unrolled scalar multiplies + adds. This results in very long kernels and out of resources error (mostly running out of registers, we think). To fix, we can either try and use DPAS on A770 (#991) or we could try not unrolling the loop to save on shared memory and register pressure - but this latter option may be unacceptably slow.