llvm / llvm-project

The LLVM Project is a collection of modular and reusable compiler and toolchain technologies.
http://llvm.org
Other
28.96k stars 11.94k forks source link

Support source-level region markers #41518

Open fe1f8067-743b-4361-8fe5-f444cfe5d0be opened 5 years ago

fe1f8067-743b-4361-8fe5-f444cfe5d0be commented 5 years ago
Bugzilla Link 42173
Version 8.0
OS Linux
CC @adibiagio,@gregbedwell,@RKSimon

Extended Description

The docs for llvm-mca (https://llvm.org/docs/CommandGuide/llvm-mca.html#using-markers-to-analyze-specific-code-blocks) suggest using inline assembly to mark the region that llvm-mca should examine, e.g.

for(size_t index = 0; index < count; index++) { asm volatile("# LLVM-MCA-BEGIN"); result += source[index]; asm volatile("# LLVM-MCA-END"); }

However, these directives prevent auto-vectorization.

:8:3: remark: loop not vectorized: call instruction cannot be vectorized [-Rpass-analysis=loop-vectorize] __asm volatile("# LLVM-MCA-BEGIN sum_marked"); ^

:6:2: remark: loop not vectorized: read with atomic ordering or volatile read [-Rpass-analysis=loop-vectorize] for (size_t index = 0; index < count; index++) ^

Compiler Explorer demo, on Clang 8.0.0: https://godbolt.org/z/NSQchu

We should be able to use llvm-mca markers without affecting optimization and code generation.

fe1f8067-743b-4361-8fe5-f444cfe5d0be commented 5 years ago

May as well leave this one open to keep the discussion in one place.

adibiagio commented 5 years ago

Thanks Max for the docs improvement.

We can keep this bug open. However, the bug title should be changed to something like "add support for source level llvm-mca markers". Alternatively we can just resolve this bug and maybe raise a new one to track any progress on the other task about adding source level markers support.

fe1f8067-743b-4361-8fe5-f444cfe5d0be commented 5 years ago

Documentation updated in https://reviews.llvm.org/D63040.

fe1f8067-743b-4361-8fe5-f444cfe5d0be commented 5 years ago

Thanks for the quick reply. Yeah, it looks tricky.

Here's another idea, though. Source line mapping seems to mostly survive optimization. Would it be viable to just use that? The block for llvm-mca to analyze would be all instructions that come from source lines that are lexically between the two markers, as well as any other instructions that happen to be interleaved within them.

That still leaves the problem of providing some kind of marker builtin that doesn't interfere with optimization. But now it wouldn't have to be preserved in the instruction stream; its only purpose would be to record the line number on which it appeared, which seems easier to accomplish.

adibiagio commented 5 years ago

Thanks Max for reporting the issue with the source level markers suggested by the llvm-mca docs.

This is unfortunately a well known issue. The inline assembly used to inject mca markers inevitably interferes with some transformations (notably the auto-vectorizer).

To fix this issue, we need a different approach. As long as markes are inserted via inline assembly there is always the potential of affecting the optimizers.

In future, we should evaluate the possibility of adding special builtins to guard mca code regions. Those builtin would have to be treated like special "meta instructions" that don't produce any output and that don't have side-effects. I guess, the difficult part would be to teach passes how to preserve those builtins (assuming that it is always possible; it may not always be the case when dealing with loop transforms that restructure the loop-nest).

Alternatively we could also use an instrumentation-like approach to inject markers with the help of a pass that runs before codegen. That would also require a special source level syntax to let users mark loop bodies/code regions.

These are just ideas. Implementing this feature is not straightforward. Strictly speaking, the inline assembly approach was suggested as an alternative (sub-optimal) approach to compensate for the lack of such feature.

I can update the docs to mention this issue. However I don't have a short term workaround for it.