Closed pbchekin closed 4 months ago
RuntimeError: Triton Error [ZE]: 0x70000004
- No FAILED cases on Rolling Agama 881(jupyterhub) with PTDB 0.5.2
Please also validate with Agama 914.32 (in the newly added jupyterhub session).
- 110 FAILED in Language test on LTS Agama 803 (fox125) with PTDB 0.5.2
RuntimeError: Triton Error [ZE]: 0x70000004
Looks like at least some of the failed tests are in the lts skip list, for example:
language/test_core.py::test_dot[1-64-128-128-4-False-True-none-tf32-float16-float32-1_0]
Please re-validate with the lts skip list, or check if all failed tests are in the lts skip list.
110 FAILED in Language test on LTS Agama 803 (fox125) with PTDB 0.5.2
Sorry, when on LTS, skiplist should have been specified before execution. So:
TRITON_TEST_SKIPLIST_DIR=/PATH/TO/skiplist/lts ./scripts/test-triton.sh --ignore-errors
There are no FAILED cases.
Please also validate with Agama 914.32 (in the newly added jupyterhub session).
Summary:
Two question:
ls /opt/intel/oneapi/logs
still returns v=0.5.1
. When I re-install PTDB 0.5.2, it told me, PTDB 0.5.2 is already installed. So I assume PTDB is updated, Is there another way of verifying the PTDB version? If it truly is not PTDB 0.5.2, Rolling Agama 881/Agama 914.32 with PTDB 0.5.2 will be re-validate.CI runners have been updated to use PTDB 0.5.2 in #1644. No new failed tests with Agama 914.32 and PTDB 0.5.2.
we might need to verify the cases in skiplists only
I don't think it is really needed. After big updates like the recent one, just use your "enable unskip" mode to see if there are tests that no longer fail and update the skip lists respectively.
I'll find machines then for Rolling & LTS do the following: