Open davidbolvansky opened 4 years ago
We currently only try to use flags from shifts if the code layout happens to put the shift immediately before the branch. This is handled by X86InstrInfo::optimizeCompareInstr.
We do match flags from ADD/SUB/AND/OR/XOR during isel and never isel the TEST. But if there is another constraint that prevents the the AND/SUB/AND/OR/XOR from being the last flag op before the branch, we can end up with two AND/SUB/AND/OR/XOR instructions. One for data and one for flags with copies of GPRs to make that work. We could maybe do the same for shifts at the risk of potentially creating 2 shifts sometimes. Shifts are port restricted on Intel CPUs and have historically had bad partial flag stall behavior so this could be worse in some cases.
For this specific example we should probably just remove the loop entirely and use bsr like icc does.
Extended Description
https://c9x.me/x86/html/file_module_x86_id_285.html
"The SF, ZF, and PF flags are set according to the result. If the count is 0, the flags are not affected. For a non-zero count, the AF flag is undefined."
So 'test' could be eliminated here.