Open weilinquan opened 1 month ago
I don't have a setup to reproduce this anymore. Does the code per se work? Note that this was tested on a much older CUDA version so maybe things don't work anymore nowadays.
In this post I discussed a bit a more extended pass pipeline with more optimizations: https://github.com/spcl/open-earth-compiler/issues/46#issuecomment-1175400479
Maybe this works. However if the code is functional then I suspect there is some tool incompatibility.
I have compiled my example by such pass pipeline. oec-opt --stencil-shape-inference --convert-stencil-to-std --cse --parallel-loop-tiling='parallel-loop-tile-sizes=128,1,1' --canonicalize --test-gpu-greedy-parallel-loop-mapping --convert-parallel-loops-to-gpu --canonicalize --lower-affine --convert-scf-to-std --stencil-kernel-to-cubin ../test/Examples/test.mlir > temp.mlir mlir-translate --mlir-to-llvmir temp.mlir > temp.bc llc -O3 temp.bc -o temp.s clang -c temp.s -o temp.o nvcc --default-stream per-thread -allow-unsupported-compiler -ccbin clang main.cc temp.o -lcuda-runtime-wrappers -lcudart -lcuda Here are main.cc and test.mlir files in zip. Are there any steps wrong in my pipeline? I want to use ncu to profile more details. Thank you for your help! test.zip