spcl / open-earth-compiler

development repository for the open earth compiler
https://arxiv.org/abs/2005.13014
Other
76 stars 14 forks source link

Nsight-compute fail when profile the kernel performance #50

Open weilinquan opened 1 month ago

weilinquan commented 1 month ago

I have compiled my example by such pass pipeline. oec-opt --stencil-shape-inference --convert-stencil-to-std --cse --parallel-loop-tiling='parallel-loop-tile-sizes=128,1,1' --canonicalize --test-gpu-greedy-parallel-loop-mapping --convert-parallel-loops-to-gpu --canonicalize --lower-affine --convert-scf-to-std --stencil-kernel-to-cubin ../test/Examples/test.mlir > temp.mlir mlir-translate --mlir-to-llvmir temp.mlir > temp.bc llc -O3 temp.bc -o temp.s clang -c temp.s -o temp.o nvcc --default-stream per-thread -allow-unsupported-compiler -ccbin clang main.cc temp.o -lcuda-runtime-wrappers -lcudart -lcuda Here are main.cc and test.mlir files in zip. Are there any steps wrong in my pipeline? I want to use ncu to profile more details. Thank you for your help! test.zip

gysit commented 1 month ago

I don't have a setup to reproduce this anymore. Does the code per se work? Note that this was tested on a much older CUDA version so maybe things don't work anymore nowadays.

In this post I discussed a bit a more extended pass pipeline with more optimizations: https://github.com/spcl/open-earth-compiler/issues/46#issuecomment-1175400479

Maybe this works. However if the code is functional then I suspect there is some tool incompatibility.