This can be run by building the iree-test-deps target and then running ctest -R iree/tests/e2e/linalg_ext_ops/check_rocm_hip_top-k.mlir --output-on-failure. As far as I can tell, this only reproduces on MI250 cards based on the results of the CI. I have not accessed an MI250 machine yet to reproduce this myself, but it won't reproduce on MI300 cards.
The change in the PR causes linalg.fill ops to no longer be constant folded, so the difference in the PR is that %out_values and %out_indices stay as linalg.fill instead of being folded into a splat arith.constant.
run cmake --build ../iree-build --target iree-test-deps from iree directory
run ctest -R iree/tests/e2e/linalg_ext_ops/check_rocm_hip_top-k.mlir --output-on-failure from build directory
Additional Clues
When comparing the results of compilation for gfx942 and gfx90a, there are very few differences in the llvm ir dumped from --iree-hal-dump-executable-intermediates-to. Running a diff on the optimized llvm IR shows that the only difference is the kernel arguments being marked with inreg for gfx942:
In a recent PR, the TopK e2e test fails in CI: https://github.com/iree-org/iree/actions/runs/11107992173/job/30867743807?pr=18634
The following test is what fails:
This can be run by building the
iree-test-deps
target and then runningctest -R iree/tests/e2e/linalg_ext_ops/check_rocm_hip_top-k.mlir --output-on-failure
. As far as I can tell, this only reproduces on MI250 cards based on the results of the CI. I have not accessed an MI250 machine yet to reproduce this myself, but it won't reproduce on MI300 cards.The change in the PR causes linalg.fill ops to no longer be constant folded, so the difference in the PR is that
%out_values
and%out_indices
stay aslinalg.fill
instead of being folded into a splatarith.constant
.Repro Instructions
cmake --build ../iree-build --target iree-test-deps
from iree directoryctest -R iree/tests/e2e/linalg_ext_ops/check_rocm_hip_top-k.mlir --output-on-failure
from build directoryAdditional Clues
When comparing the results of compilation for
gfx942
andgfx90a
, there are very few differences in the llvm ir dumped from--iree-hal-dump-executable-intermediates-to
. Running a diff on the optimized llvm IR shows that the only difference is the kernel arguments being marked withinreg
forgfx942
:This gist has the resulting rocasm for each target chip: https://gist.github.com/Max191/10e96721ab25c9c14cba1a5cfd3f4db6