Open RKSimon opened 6 years ago
Hi Simon,
Thanks for reporting, and yes, we're aware of that.
Ideally we would model this in LLVM because the scheduler should know about this. IIUC right now the codegen already knows that it should emit
xor eax,eax
to zero a register but the PostRA scheduler does not use this information to do a better job.
For the record:
Now we have a TargetSubtarget hook to check if an instruction is a dependency-breaking instruction.
So we can improve the analysis in the post-RA scheduler.
I committed that change here: http://llvm.org/viewvc/llvm-project?view=revision&revision=342555
Basically:
you can call this method on the TargetSubtargetInfo, and obtain a mask of "broken dependencies"
bool isDependencyBreaking(const MachineInstr *MI, APInt &Mask) const;
You can use that information to bias the construction of the DAG for a scheduling region.
Andrea's x86 variant support (https://reviews.llvm.org/D47374) is close to being added to trunk, so you should be able to compare register specific cases through this once a model supports it.
Hi Simon,
Thanks for reporting, and yes, we're aware of that.
Ideally we would model this in LLVM because the scheduler should know about this. IIUC right now the codegen already knows that it should emit xor eax,eax
to zero a register but the PostRA scheduler does not use this information to do a better job.
To measure the case when the two registers are different, we are planning to force the generation of longer sequence even when the instruction is self-serial. this would generate e.g. sub R0, R1; xor R1 R0
to measure the real latency of sub
.
There are some stuff that will be impossible to measure: For example MOV32rr will not be able to use this trick.
Extended Description
I'm seeing llvm-exegesis report a number of instructions give far too fast latency values as they are repeating source registers, which permits breaking register dependencies in the frontend, this is most commonly for cases that are guaranteed to generate zero/allbits results.
Search Agner (http://www.agner.org/optimize/microarchitecture.pdf) for "Breaking dependency chains" and "Dependency-breaking instructions", these instruction types include:
X86 Instructions: XOR SUB SBB(depends on carry flag only) CMP
MMX/SSE/AVX Instructions: PXOR/XORPS/XORPD PANDN/ANDNPS/ANDNPD PSUBx PCMPEQx/PCMPGTx
Depending on the CPU this might include MMX/AVX1/AVX2 variants as well.