llvm / llvm-project

The LLVM Project is a collection of modular and reusable compiler and toolchain technologies.
http://llvm.org
Other
29.17k stars 12.03k forks source link

Division followed by modulo generates longer machine code than vice versa #23480

Open EdSchouten opened 9 years ago

EdSchouten commented 9 years ago
Bugzilla Link 23106
Version trunk
OS Linux
CC @emaste,@rotateright

Extended Description

Consider the following piece of C code:

include

struct tv { int64_t tv_sec; int32_t tv_usec; };

void convert1(uint64_t ts, struct tv *tv) { tv->tv_sec = ts / 1000000000; tv->tv_usec = (ts % 1000000000) / 1000; }

void convert2(uint64_t ts, struct tv *tv) { ts /= 1000; tv->tv_sec = ts / 1000000; tv->tv_usec = ts % 1000000; }

Essentially they are functions that convert a UNIX timestamp in nanoseconds to a struct timeval-like structure (with microseconds precision). Both functions should be identical.

Anyway, if I compare the machine code generated by Clang r233700 with -O3, it generates the following machine code:

0000000000000000 : 0: 55 push %rbp 1: 48 89 e5 mov %rsp,%rbp 4: 48 89 f8 mov %rdi,%rax 7: 48 c1 e8 09 shr $0x9,%rax b: 48 b9 53 5a 9b a0 2f mov $0x44b82fa09b5a53,%rcx 12: b8 44 00 15: 48 f7 e1 mul %rcx 18: 48 c1 ea 0b shr $0xb,%rdx 1c: 48 89 16 mov %rdx,(%rsi) 1f: 48 69 c2 00 ca 9a 3b imul $0x3b9aca00,%rdx,%rax 26: 48 29 c7 sub %rax,%rdi 29: 48 c1 ef 03 shr $0x3,%rdi 2d: 48 b9 cf f7 53 e3 a5 mov $0x20c49ba5e353f7cf,%rcx 34: 9b c4 20 37: 48 89 f8 mov %rdi,%rax 3a: 48 f7 e1 mul %rcx 3d: 48 c1 ea 04 shr $0x4,%rdx 41: 89 56 08 mov %edx,0x8(%rsi) 44: 5d pop %rbp 45: c3 retq

0000000000000000 : 0: 55 push %rbp 1: 48 89 e5 mov %rsp,%rbp 4: 48 89 f8 mov %rdi,%rax 7: 48 c1 e8 03 shr $0x3,%rax b: 48 b9 cf f7 53 e3 a5 mov $0x20c49ba5e353f7cf,%rcx 12: 9b c4 20 15: 48 f7 e1 mul %rcx 18: 48 89 d1 mov %rdx,%rcx 1b: 48 c1 e9 04 shr $0x4,%rcx 1f: 48 c1 ef 09 shr $0x9,%rdi 23: 48 ba 53 5a 9b a0 2f mov $0x44b82fa09b5a53,%rdx 2a: b8 44 00 2d: 48 89 f8 mov %rdi,%rax 30: 48 f7 e2 mul %rdx 33: 48 c1 ea 0b shr $0xb,%rdx 37: 48 89 16 mov %rdx,(%rsi) 3a: 48 ba db 34 b6 d7 82 mov $0x431bde82d7b634db,%rdx 41: de 1b 43 44: 48 89 c8 mov %rcx,%rax 47: 48 f7 e2 mul %rdx 4a: 48 c1 ea 12 shr $0x12,%rdx 4e: 69 c2 40 42 0f 00 imul $0xf4240,%edx,%eax 54: 29 c1 sub %eax,%ecx 56: 89 4e 08 mov %ecx,0x8(%rsi) 59: 5d pop %rbp 5a: c3

As a 30% increase in code size is not negligible, I thought it would make sense to file a bug. Maybe there room for an optimization here?

StephanTLavavej commented 2 years ago

mentioned in issue llvm/llvm-bugzilla-archive#38217

StephanTLavavej commented 2 years ago

mentioned in issue llvm/llvm-bugzilla-archive#37983