Loosing optimization for X % C == 0


Bugzilla Link	50334
Version	12.0
OS	Linux
CC	@LebedevRI,@RKSimon,@rotateright

Extended Description

Since version 9, the code generated for say, y % 400 == 0, uses the modular inverse optimisation which performs just one multiplication (as opposed to two as it used to do in (y/400)*400 == y).

However, in a reasonable implementation of is_leap_year the code generated looses the optimisation and falls back to the old implementation. Here are C++ and corresponding assembly. (Notice that for y % 100 it has no issue in applying the optimisation.)

auto is_leap_year(unsigned y) { return y % 100 != 0 ? y % 4 == 0 : y % 400 == 0; }

is_leap_year(unsigned int): # @is_leap_year(unsigned int) imull $-1030792151, %edi, %eax # imm = 0xC28F5C29 rorl $2, %eax cmpl $42949673, %eax # imm = 0x28F5C29 jb .LBB2_2 andl $3, %edi testl %edi, %edi sete %al retq .LBB2_2: movl %edi, %eax imulq $1374389535, %rax, %rax # imm = 0x51EB851F shrq $39, %rax imull $400, %eax, %eax # imm = 0x190 # <--- Multiplication by 400 subl %eax, %edi testl %edi, %edi sete %al retq

Worse, when y has type type it doesn't even apply the old Granlund and Montgomery optimisation and resorts to an idiv instruction. It seems it prefers avoiding branching (using cmov). I doubt this is a good idea.

auto is_leap_year(int y) { return y % 100 != 0 ? y % 4 == 0 : y % 400 == 0; }

is_leap_year(int): # @is_leap_year(int) movl %edi, %eax imull $-1030792151, %edi, %ecx # imm = 0xC28F5C29 addl $85899344, %ecx # imm = 0x51EB850 rorl $2, %ecx cmpl $42949673, %ecx # imm = 0x28F5C29 movl $400, %ecx # imm = 0x190 movl $4, %esi cmovbl %ecx, %esi cltd idivl %esi # <-- Division by %esi which contains either 4 or 400 testl %edx, %edx sete %al retq

See the above and other relevant code here: https://godbolt.org/z/zdGPTT83q

You might also be interested in these (non scientific) benchmarks: https://quick-bench.com/q/zQp1vXKKpWFvH0Et7nxG6MHnNfI

llvm / llvm-project

Loosing optimization for X % C == 0 #49678

Extended Description