Closed pbchekin closed 3 months ago
It seems these 4 cases has already in the default skiplist And from my local test, these 4 cases have been skipped:
# env: PVC with agama 821.35
source ./scripts/pytest-utils.sh
cd python/test/unit
TRITON_TEST_SUITE=language pytest -vvv -n 8 --device xpu language/ --ignore=language/test_line_info.py --ignore=language/test_subprocess.py
May be this has been solved somehow
Cases in the test-triton.sh have 5 result status: failed, passed, skipped, xfailed, warnings
Are we supposed to put failed
and skipped
into the skiplist OR just skipped
?
May be this has been solved somehow
This issue is to track 4 failing test cases with Agama 821.35. These test cases passed before, so it is a regression and it needs to be investigated.
Are we supposed to put
failed
andskipped
into the skiplist OR justskipped
?
There are two options to skip a test case:
pytest.skip
for the specific conditions.We probably want to use the latter method, because it allows to skip tests depending on the environment. For example, we can have a skip list for PVC with the rolling driver, PVC with the LTS driver, A770 with the rolling/LTS driver, and so on. We currently use both methods as a transitional step, and the plan is to use the skip list for new failures (regressions) and gradually replace pytest.skip
with adding test cases to the corresponding skip list.
@AshburnLee please verify if these tests fail with the latest rolling.
@AshburnLee please verify if these tests fail with the latest rolling.
They still fail on latest llvm-target branch with the current Rolling(821.35). On latest rolling? do I need to update the driver version? Or is there any platform with the latest Rolling that I can borrow from?
They still fail on latest llvm-target branch with the current Rolling(821.35).
Thanks. The latest rolling is 821.36, I think. It would be nice to check it as well. We want to keep this issue open until driver or tests fixed.
We want to keep this issue open until driver or tests fixed.
Oh, so we just track it, and no need to to find the commit that causes those 4 fails? We can do that, but building Triton from very early commits needs efforts and time.
Oh, so we just track it, and no need to to find the commit that causes those 4 fails? We can do that, but building Triton from very early commits needs efforts and time.
Right, just to track at the moment. I don't think it is a Triton commit that caused the failures, they started to fail when we updated GPU driver.
4 cases still fail on latest llvm-target branch with the current Rolling(821.35). 6/5/2024
4 cases still fail on latest llvm-target branch with the current Rolling(821.35). 6/12/2024
Plus additional 101 FAILED cases in test_dot
: RuntimeError: Triton Error [ZE]: 0x78000018
4 cases still fail on latest llvm-target branch with Rolling 881.19, No extra failed cases.
4 cases still fail on latest llvm-target branch with Rolling 881.19. 4 cases passed on latest llvm-target branch with Rolling 821. 6/17/2024
Got error while running test on 821.35:
/lib/python3.10/site-packages/intel_extension_for_pytorch/lib/libintel-ext-pt-gpu.so: undefined symbol: _ZNK4sycl3_V16device32ext_oneapi_supports_cl_extensionERKNSt7__cxx1112basic_stringIcSt11char_traitsIcESaIcEEEPNS0_3ext6oneapi12experimental10cl_versionE
4 cases got PASSED on 914(914.27)
Waiting for the new agama release
Errors:
See also https://github.com/intel/intel-xpu-backend-for-triton/actions/runs/8757168039/job/24035243552#step:12:23117