llvm / llvm-project

The LLVM Project is a collection of modular and reusable compiler and toolchain technologies.
http://llvm.org
Other
28.24k stars 11.66k forks source link

[llvm-exegesis] Latency test code breaking register dependencies #36916

Open RKSimon opened 6 years ago

RKSimon commented 6 years ago
Bugzilla Link 37568
Version trunk
OS Windows NT
CC @adibiagio,@legrosbuffle,@topperc,@gchatelet,@gregbedwell

Extended Description

I'm seeing llvm-exegesis report a number of instructions give far too fast latency values as they are repeating source registers, which permits breaking register dependencies in the frontend, this is most commonly for cases that are guaranteed to generate zero/allbits results.

Search Agner (http://www.agner.org/optimize/microarchitecture.pdf) for "Breaking dependency chains" and "Dependency-breaking instructions", these instruction types include:

X86 Instructions: XOR SUB SBB(depends on carry flag only) CMP

MMX/SSE/AVX Instructions: PXOR/XORPS/XORPD PANDN/ANDNPS/ANDNPD PSUBx PCMPEQx/PCMPGTx

Depending on the CPU this might include MMX/AVX1/AVX2 variants as well.

adibiagio commented 6 years ago

Hi Simon,

Thanks for reporting, and yes, we're aware of that.

Ideally we would model this in LLVM because the scheduler should know about this. IIUC right now the codegen already knows that it should emit xor eax,eax to zero a register but the PostRA scheduler does not use this information to do a better job.

For the record:

Now we have a TargetSubtarget hook to check if an instruction is a dependency-breaking instruction.

So we can improve the analysis in the post-RA scheduler.

I committed that change here: http://llvm.org/viewvc/llvm-project?view=revision&revision=342555

Basically:

you can call this method on the TargetSubtargetInfo, and obtain a mask of "broken dependencies"

bool isDependencyBreaking(const MachineInstr *MI, APInt &Mask) const;

You can use that information to bias the construction of the DAG for a scheduling region.

RKSimon commented 6 years ago

Andrea's x86 variant support (https://reviews.llvm.org/D47374) is close to being added to trunk, so you should be able to compare register specific cases through this once a model supports it.

llvmbot commented 6 years ago

Hi Simon,

Thanks for reporting, and yes, we're aware of that.

Ideally we would model this in LLVM because the scheduler should know about this. IIUC right now the codegen already knows that it should emit xor eax,eax to zero a register but the PostRA scheduler does not use this information to do a better job.

To measure the case when the two registers are different, we are planning to force the generation of longer sequence even when the instruction is self-serial. this would generate e.g. sub R0, R1; xor R1 R0 to measure the real latency of sub.

There are some stuff that will be impossible to measure: For example MOV32rr will not be able to use this trick.