llvm / llvm-project

The LLVM Project is a collection of modular and reusable compiler and toolchain technologies.
http://llvm.org
Other
27.92k stars 11.52k forks source link

Performance problem with an integer division benchmark #20579

Open llvmbot opened 10 years ago

llvmbot commented 10 years ago
Bugzilla Link 20205
Version trunk
OS Windows XP
Reporter LLVM Bugzilla Contributor
CC @chfast,@DougGregor,@zygoloid,@TNorthover

Extended Description

While running a little benchmark on the LDC compiler (version 0.13.0 based on DMD v2.064 and LLVM 3.4.2), I have found it being significantly (like five times) slower than equivalent C code compiled with GCC 4.8.0 (using -O3 -std=c99, do not use -O2) on a 32 bit Windows system. Later others in the D newsgroup (http://forum.dlang.org/thread/chujnioihfjbqtjpshoz@forum.dlang.org ) have confirmed it's not a LDC problem, but a back-end problem.

The title of this issue is generic because I don't know the exact causes. Feel free to rename it.

The C code:

#include <stdio.h>
#include <stdbool.h>

const int t = 20;

bool isEvenlyDivisible(const int i, const int a, const int b) {
    if (i > b)
        return true;
    else
        return (a % i == 0) && isEvenlyDivisible(i + 1, a, b);
}

void run() {
    int i = 10;
    while (!isEvenlyDivisible(2, i, t))
        i += 2;
    printf("%d\n", i);
}

int main() {
    for (int i = 0; i < 5; i++)
      run();
    return 0;
}

The C-style D version:

import core.stdc.stdio;

enum int t = 20;

bool isEvenlyDivisible(in int i, in int a, in int b)
pure nothrow @safe {
    if (i > b)
        return true;
    else
        return (a % i == 0) && isEvenlyDivisible(i + 1, a, b);
}

void run() nothrow {
    int i = 10;
    while (!isEvenlyDivisible(2, i, t))
        i += 2;
    printf("%d\n", i);
}

void main() {
    for (int i = 0; i < 5; i++)
      run;
}
ec04fc15-fa35-46f2-80e1-5d271f2ef708 commented 10 years ago

Recent GCC peels off the divisibility by 2, 3, 4, and 5 checks. It removes the first, turns the second into a multiplication, turns the third into a bitwise check, and turns the fourth into a multiplication. (It still issues an idiv for divisibility by 6 onwards, but that's then an extremely uncommon case.)

(Older GCC just eliminated the redundant divisibility by 2 check.)

Similar performance can be obtained by manually peeling the loop:

bool isEvenlyDivisible(const int i, const int a, const int b) {
    if (i > b)
        return true;
    else
        return (a % i == 0) && isEvenlyDivisible(i + 1, a, b);
}

bool isEvenlyDivisible5(const int a, const int b) {
  return (a % 5 == 0) && isEvenlyDivisible(6, a, b);
}

bool isEvenlyDivisible4(const int a, const int b) {
  return (a % 4 == 0) && isEvenlyDivisible5(a, b);
}

bool isEvenlyDivisible3(const int a, const int b) {
  return (a % 3 == 0) && isEvenlyDivisible4(a, b);
}

bool isEvenlyDivisible2(const int a, const int b) {
  return (a % 2 == 0) && isEvenlyDivisible3(a, b);
}

void run() {
    int i = 10;
    while (!isEvenlyDivisible2(i, t))
        i += 2;
    printf("%d\\n", i);
}

LLVM also makes some rather poor loop unrolling decisions here: it doesn't unroll the innermost loop in isEvenlyDivisible, which would be somewhat profitable, and instead unrolls the outermost loop in main, which is entirely pointless (but at least it only hurts code size and not performance).

llvmbot commented 10 years ago

While running a little benchmark on the LDC compiler (version 0.13.0 based on DMD v2.064 and LLVM 3.4.2), I have found it being significantly (like five times) slower than equivalent C code compiled with GCC 4.8.0 (using -O3 -std=c99, do not use -O2) on a 32 bit Windows system.

To clarify: This also occurs with Clang 3.4.2 vs. GCC 4.9 on Linux x86_64 in a quick test, so it's not an LDC-specific problem.