It'd be useful to be able to easily identify optimization issues in runtime tools (nsight, tracy, perf, etc) when trying to triage performance on new inputs. Instead of having a dispatch_32_matmul that wasn't vectorized and needing to dig into the IR we could just name it dispatch_32_matmul_FAILED_VECTORIZATION (or whatever) and make it blindingly obvious. This doesn't scale to all kind of things but there are a few key paths that we know will cause performance issues. Changing symbol names immediately before executable serialization is pretty easy as all ordinals have been propagated to external code and doing so would just change the names used at runtime. Thinking of some hal.remark attr that could be added to entry points by codegen which then automatically gets merged in.
It'd be useful to be able to easily identify optimization issues in runtime tools (nsight, tracy, perf, etc) when trying to triage performance on new inputs. Instead of having a
dispatch_32_matmul
that wasn't vectorized and needing to dig into the IR we could just name itdispatch_32_matmul_FAILED_VECTORIZATION
(or whatever) and make it blindingly obvious. This doesn't scale to all kind of things but there are a few key paths that we know will cause performance issues. Changing symbol names immediately before executable serialization is pretty easy as all ordinals have been propagated to external code and doing so would just change the names used at runtime. Thinking of somehal.remark
attr that could be added to entry points by codegen which then automatically gets merged in.