nordlow / compiler-benchmark

Benchmarks compilation speeds of different combinations of languages and compilers.
MIT License
138 stars 18 forks source link

nim: improve build time by 1.5x #11

Closed timotheecour closed 3 years ago

timotheecour commented 3 years ago

/cc @xflywind

note

the good news is that it's mostly the backend cgen that takes time:

clang -o /tmp/z01 generated/c/main.c 2s

XDG_CONFIG_HOME= nim c --compileonly -o:/tmp/z04x --nimcache:/tmp/c07f --checks:off --stacktrace:off --opt:none --hints:off generated/nim/main.nim 1.18s

XDG_CONFIG_HOME= nim c -o:/tmp/z05 --nimcache:/tmp/c07f --checks:off --stacktrace:off --opt:none --hints:off generated/nim/main.nim 4.1s

note 1

with --hint:cc --listcmd it shows: clang -c -w -ferror-limit=3 -I/Users/timothee/git_clone/nim/Nim_devel/lib -I/Users/timothee/git_clone/nim/temp/compiler-benchmark/generated/nim -o /tmp/c08d/@mmain.nim.c.o /tmp/c08d/@mmain.nim.c 2.8s

note 2

https://github.com/nordlow/compiler-benchmark/issues/8#issuecomment-821872378

I've added support for check and debug builds support. Feel free to modify and propose pull requests as you wish. compiler-benchmark currenly only tests debug build performance.

the benchmark should allow reporting not just debug builds; for eg nim enables lots of checks by default which are optimized for development speed/improved debugging, but can slow down compilation times; languages shouldn't be penalized for having more debugging checks on by default :)

note 3

--gc:arc is about 1.2x slower; the cgen contains this:

N_LIB_PRIVATE N_NIMCALL(NI64, add_int64_n77_h5_main_23116)(NI64 x) {
    NI64 result;
    NI64 T1_;
NIM_BOOL* nimErr_;
{nimErr_ = nimErrorFlag();
    result = (NI64)0;
    T1_ = (NI64)0;
    T1_ = add_int64_n77_h4_main_23113(x);
    if (NIM_UNLIKELY(*nimErr_)) goto BeforeRet_;
    result = (NI64)((NI64)(x + T1_) + IL64(47509));
    goto BeforeRet_;
    }BeforeRet_: ;
    return result;
}

which has more instructions compared to gc:refc:

N_LIB_PRIVATE N_NIMCALL(NI64, add_int64_n77_h5_main_23116)(NI64 x) {
    NI64 result;
    NI64 T1_;
{   result = (NI64)0;
    T1_ = (NI64)0;
    T1_ = add_int64_n77_h4_main_23113(x);
    result = (NI64)((NI64)(x + T1_) + IL64(47509));
    goto BeforeRet_;
    }BeforeRet_: ;
    return result;
}

note 4

codegen generates:

N_LIB_PRIVATE N_NIMCALL(int, add_cint_n68_h73_main_20620)(int x) {
    int result;
    int T1_;
{   result = (int)0;
    T1_ = (int)0;
    T1_ = add_cint_n68_h72_main_20617(x);
    result = (NI32)((NI32)(x + T1_) + ((NI32) 42695));
    goto BeforeRet_;
    }BeforeRet_: ;
    return result;
}

if it generated the following simplified code isntead:

N_LIB_PRIVATE N_NIMCALL(int, add_cint_n68_h73_main_20620)(int x) {
    int T1_ = add_cint_n68_h72_main_20617(x);
    return (NI32)((NI32)(x + T1_) + ((NI32) 42695));
}

it would bring down clang compilation time by ~1.15x, not sure whether it's worth it since in practice other factors dominate (eg nim VM, IC etc)

ringabout commented 3 years ago

I can confirm the speedup.

nordlow commented 3 years ago

Thanks

timotheecour commented 3 years ago

@nordlow i don't understand the numbers in the benchmark:

Nim | Build | No | 1051.5 | 613.0 [C] | 38 | 1.4.6 | nim

i can't run the benchmark locally because of https://github.com/nordlow/compiler-benchmark/issues?q=is%3Aissue+is%3Aopen+sort%3Aupdated-desc (i'm on osx), but by doing it manally i get a 2X slowdown in terms of compilation times, not 613X, compared to C:

clang -o /tmp/z01 generated/c/main.c
2s

rm -rf /tmp/c07f # start from empty cache
XDG_CONFIG_HOME= nim c -o:/tmp/z05 --nimcache:/tmp/c07f --checks:off --stacktrace:off --opt:none --hints:off generated/nim/main.nim
4.1s

what am i missing?

ringabout commented 3 years ago

yeah on my PC

C++

| Lang-uage | Temp-lated | Check Time [us/func] | Build Time [us/func] | Run Time [us/func] | RSS Mem Max Usage | Exec Version | Exec Path | 
| :-------: | ---------- | :------------------: | :------------------: | :----------------: | :---------------: | :----------: | :-------: | 
| C++       | No         |  133.8 (  1.8 C++)   |    N/A               |    N/A             |       62,517,248  | 9.3.0        | `g++`     | 
| C++       | No         |   79.4 (  1.0 C++)   |    N/A               |    N/A             |       60,014,592  | 9.3.0        | `g++-9`   | 
| C++       | No         |  125.5 (  1.7 C++)   |    N/A               |    N/A             |       58,777,600  | 10.2.0       | `g++-10`  | 
| C++       | No         |   75.8 (  1.0 C++)   |    N/A               |    N/A             |       63,897,600  | 10.0.0-4ubuntu1 | `clang++-10` | 
| C++       | Yes        |  156.6 (  2.1 C++)   |    N/A               |    N/A             |       93,147,136  | 9.3.0        | `g++`     | 
| C++       | Yes        |  118.8 (  1.6 C++)   |    N/A               |    N/A             |       93,028,352  | 9.3.0        | `g++-9`   | 
| C++       | Yes        |  117.8 (  1.6 C++)   |    N/A               |    N/A             |       91,037,696  | 10.2.0       | `g++-10`  | 
| C++       | Yes        |  134.0 (  1.8 C++)   |    N/A               |    N/A             |       83,329,024  | 10.0.0-4ubuntu1 | `clang++-10` | 
| C++       | No         |    N/A               |  860.4 (  2.1 C++)   | 296 (  1.1 C++)    |      240,738,304  | 9.3.0        | `g++`     | 
| C++       | No         |    N/A               |  838.4 (  2.0 C++)   | 356 (  1.4 C++)    |      241,700,864  | 9.3.0        | `g++-9`   | 
| C++       | No         |    N/A               |  879.9 (  2.1 C++)   | 258 (  1.0 C++)    |      241,025,024  | 10.2.0       | `g++-10`  | 
| C++       | No         |    N/A               |  414.4 (  1.0 C++)   | 2757 ( 10.7 C++)   |      183,558,144  | 10.0.0-4ubuntu1 | `clang++-10` | 
| C++       | Yes        |    N/A               |  939.4 (  2.3 C++)   | 267 (  1.0 C++)    |      280,985,600  | 9.3.0        | `g++`     | 
| C++       | Yes        |    N/A               |  943.1 (  2.3 C++)   | 281 (  1.1 C++)    |      281,518,080  | 9.3.0        | `g++-9`   | 
| C++       | Yes        |    N/A               | 1010.1 (  2.4 C++)   | 281 (  1.1 C++)    |      278,781,952  | 10.2.0       | `g++-10`  | 
| C++       | Yes        |    N/A               |  494.9 (  1.2 C++)   | 4288 ( 16.6 C++)   |      230,973,440  | 10.0.0-4ubuntu1 | `clang++-10` | 

C

| Lang-uage | Temp-lated | Check Time [us/func] | Build Time [us/func] | Run Time [us/func] | RSS Mem Max Usage | Exec Version | Exec Path | 
| :-------: | ---------- | :------------------: | :------------------: | :----------------: | :---------------: | :----------: | :-------: | 
| C         | No         |   11.0 (  1.0 C)     |    N/A               |    N/A             |        6,529,024  | 0.9.27       | `tcc`     | 
| C         | No         |   55.0 (  5.0 C)     |    N/A               |    N/A             |       45,576,192  | 9.3.0        | `gcc`     | 
| C         | No         |   28.2 (  2.6 C)     |    N/A               |    N/A             |       37,679,104  | 7.5.0        | `gcc-7`   | 
| C         | No         |   23.2 (  2.1 C)     |    N/A               |    N/A             |       42,545,152  | 9.3.0        | `gcc-9`   | 
| C         | No         |   25.9 (  2.4 C)     |    N/A               |    N/A             |       43,794,432  | 10.2.0       | `gcc-10`  | 
| C         | No         |   66.5 (  6.1 C)     |    N/A               |    N/A             |       62,603,264  | 10.0.0-4ubuntu1 | `clang-10` | 
| C         | No         |    N/A               |    7.9 (  1.0 C)     | 243 (  1.0 C)      |        9,224,192  | 0.9.27       | `tcc`     | 
| C         | No         |    N/A               |  815.7 (103.7 C)     | 252 (  1.0 C)      |      225,079,296  | 9.3.0        | `gcc`     | 
| C         | No         |    N/A               |  686.3 ( 87.2 C)     | 242 (  1.0 C)      |      216,461,312  | 7.5.0        | `gcc-7`   | 
| C         | No         |    N/A               |  767.9 ( 97.6 C)     | 243 (  1.0 C)      |      225,349,632  | 9.3.0        | `gcc-9`   | 
| C         | No         |    N/A               |  799.5 (101.6 C)     | 245 (  1.0 C)      |      223,019,008  | 10.2.0       | `gcc-10`  | 
| C         | No         |    N/A               |  348.7 ( 44.3 C)     | 2245 (  9.3 C)     |      183,029,760  | 10.0.0-4ubuntu1 | `clang-10` | 

Nim

| Lang-uage | Temp-lated | Check Time [us/func] | Build Time [us/func] | Run Time [us/func] | RSS Mem Max Usage | Exec Version | Exec Path | 
| :-------: | ---------- | :------------------: | :------------------: | :----------------: | :---------------: | :----------: | :-------: | 
| Nim       | No         |  134.9 (  1.0 Nim)   |    N/A               |    N/A             |       62,787,584  | 1.5.1        | `nim`     | 
| Nim       | No         |    N/A               | 1252.4 (  5.0 Nim)   | 361 (  1.0 Nim)    |      382,427,136  | 1.5.1        | `nim`     | 
| Nim       | Yes        |    N/A               |  252.9 (  1.0 Nim)   | 367 (  1.0 Nim)    |      115,929,088  | 1.5.1        | `nim`     | 
ringabout commented 3 years ago

BTW Nim can use tcc, gcc, clang, js as backend too

nordlow commented 3 years ago

The Tiny C Compiler is being used as reference for C in the benchmark. Its incredibly fast.

nordlow commented 3 years ago

Nim now uses tcc as build backend when its found in the exe path.

ringabout commented 3 years ago

Cool!