Open 837951602 opened 1 year ago
We only have an EFLAGS register modeled. INC is modeled as writing EFLAGS, but not reading it. Because it preserves C it should technically read EFLAGS too. CodeGen never relies on INC not updating the C flag so would not generate the code seen here.
From a microarchitecture perspective, skylake renames C separately from OSPAZ. This allows the INC to execute early since it doesn't need to read the flag C to preserve it. There are some older microarchitectures that don't do this.
@adibiagio @RKSimon is there some way we can model this dependency in llvm-mca without affecting the EFLAGs register behavior in CodeGen?
@llvm/issue-subscribers-tools-llvm-mca
@llvm/issue-subscribers-backend-x86
@adibiagio @RKSimon is there some way we can model this dependency in llvm-mca without affecting the EFLAGs register behavior in CodeGen?
Not easily and something like this done just for MCA would be very brittle - I'd love to see EFLAGS remodeled so we can (optionally) update instructions to show which individual flags they read/write/clear/set/undef but that will take some time.
@adibiagio @RKSimon is there some way we can model this dependency in llvm-mca without affecting the EFLAGs register behavior in CodeGen?
Not easily and something like this done just for MCA would be very brittle - I'd love to see EFLAGS remodeled so we can (optionally) update instructions to show which individual flags they read/write/clear/set/undef but that will take some time.
Are there many instruction that leave a flag unmodified instead of undefined?
One weird one I can remember is that shifts don't update any flags on shifts of 0 and the overflow flag is only defined for shifts of 1.
As well as INC, RCL/RCR and the ADX instructions are ones that I know of - another part of the problem is that Intel + AMD haven't always matched UNDEF vs passthrough behaviour.
STC(modify C, preserve OSPAZ) and BT(modify C, preserve Z, undefined OSPA) breaks only reliance of carry flag. On my znver1, they do break.
.section .text
.globl main
main:
stc
adcq %r8, %r9
adcq %r10, %r11
adcq %r12, %r13
jmp main
[0,5] is executed on cycle 2 but relies on result from cycle 4.
Problem found while discussing https://stackoverflow.com/questions/76151320/