Open hiraditya opened 5 months ago
@llvm/issue-subscribers-backend-x86
Author: AdityaK (hiraditya)
cc @LuoYuanke @XinWang10 @yubingex007-a11y
There's another AMX codegen crash, similar to https://github.com/llvm/llvm-project/issues/90954. When I filed https://github.com/llvm/llvm-project/issues/90954 building the same file would non-deterministically fail with either the error from https://github.com/llvm/llvm-project/issues/90954 or the error in this issue
This also raises the question if anybody is building & running those tests with a CPU that enables AMX. If not, should we remove the tests?
We have more complicated tests as well as these running internally. These tests are used to catch bugs from community as demonstrated well by https://github.com/llvm/llvm-project/issues/90954. So I think we indeed need these tests.
This should be an unrelated issue. The AMX programming model has some constraints. Maybe the optimizations with Oz result in cases cannot be handled. I'll take a look when I got some buffer.
@phoebewang fair point, I think one issue is that https://github.com/llvm/llvm-project/issues/90954 at least has been an issue for a while time without anyone finding it. Would be great if the tests could be actively monitored, ideally with an upstream build bot to prevent this in the future. Breaking building the test-suite is quite inconvenient for contributors unfortunately.
Can we at least disable it by default until the issue has been fixed?
@phoebewang fair point, I think one issue is that #90954 at least has been an issue for a while time without anyone finding it. Would be great if the tests could be actively monitored, ideally with an upstream build bot to prevent this in the future. Breaking building the test-suite is quite inconvenient for contributors unfortunately.
I'm surprised there isn't one build bot monitoring it. But seems true, because I didn't receive a failure report after that patch. I took a look at the test patch again, it is a compilation only test, but requires an exactly "skylake-avx512" target. I think that's the reason why it's not being monitored. We should relax this requirement. We have this machine internally, but it has a longer period before the patch landed internally then trigger it.
Can we at least disable it by default until the issue has been fixed?
Yeah, we may change to skip the Oz
option firstly.
Using -Oz.cmake
$ cmake -DCMAKE_C_COMPILER=/usr/local/llvm-project/build/bin/clang-19 -C ../cmake/caches/Oz.cmake .. && make -j32 -k