Open fe1f8067-743b-4361-8fe5-f444cfe5d0be opened 5 years ago
May as well leave this one open to keep the discussion in one place.
Thanks Max for the docs improvement.
We can keep this bug open. However, the bug title should be changed to something like "add support for source level llvm-mca markers". Alternatively we can just resolve this bug and maybe raise a new one to track any progress on the other task about adding source level markers support.
Documentation updated in https://reviews.llvm.org/D63040.
Thanks for the quick reply. Yeah, it looks tricky.
Here's another idea, though. Source line mapping seems to mostly survive optimization. Would it be viable to just use that? The block for llvm-mca to analyze would be all instructions that come from source lines that are lexically between the two markers, as well as any other instructions that happen to be interleaved within them.
That still leaves the problem of providing some kind of marker builtin that doesn't interfere with optimization. But now it wouldn't have to be preserved in the instruction stream; its only purpose would be to record the line number on which it appeared, which seems easier to accomplish.
Thanks Max for reporting the issue with the source level markers suggested by the llvm-mca docs.
This is unfortunately a well known issue. The inline assembly used to inject mca markers inevitably interferes with some transformations (notably the auto-vectorizer).
To fix this issue, we need a different approach. As long as markes are inserted via inline assembly there is always the potential of affecting the optimizers.
In future, we should evaluate the possibility of adding special builtins to guard mca code regions. Those builtin would have to be treated like special "meta instructions" that don't produce any output and that don't have side-effects. I guess, the difficult part would be to teach passes how to preserve those builtins (assuming that it is always possible; it may not always be the case when dealing with loop transforms that restructure the loop-nest).
Alternatively we could also use an instrumentation-like approach to inject markers with the help of a pass that runs before codegen. That would also require a special source level syntax to let users mark loop bodies/code regions.
These are just ideas. Implementing this feature is not straightforward. Strictly speaking, the inline assembly approach was suggested as an alternative (sub-optimal) approach to compensate for the lack of such feature.
I can update the docs to mention this issue. However I don't have a short term workaround for it.
Extended Description
The docs for llvm-mca (https://llvm.org/docs/CommandGuide/llvm-mca.html#using-markers-to-analyze-specific-code-blocks) suggest using inline assembly to mark the region that llvm-mca should examine, e.g.
However, these directives prevent auto-vectorization.
Compiler Explorer demo, on Clang 8.0.0: https://godbolt.org/z/NSQchu
We should be able to use llvm-mca markers without affecting optimization and code generation.