nordlow / compiler-benchmark

Benchmarks compilation speeds of different combinations of languages and compilers.
MIT License
138 stars 18 forks source link

New Julia benchmark #28

Open PallHaraldsson opened 1 year ago

PallHaraldsson commented 1 year ago

First, you might want to benchmark Julia on master as is (or possibly next nightly, I just noticed yet one more improvement merged just now "Remove alloca from codegen").

I don't know if the issue with your very unusual benchmark is fixed. But Julia does use -O2 by default so you might also want to try running with -O0 (or --inline=no that I think is at least implied by the lowest level) or -O1, since there is no Julia debug/development-build mode, and that's the closest I can think of; Or even with --compile=min

At least if you see an improvement, there's also a further 25% improvement available (but you have to opt into this new Julia parser, it will be merged into Julia, but then also at first off by default):

https://github.com/JuliaLang/JuliaSyntax.jl/pull/228

I also wanted to point that out for you for D (or other) language.

nordlow commented 1 year ago

Note that the script is already using

JULIA_INTERPRET_FLAGS = ['--compile=min']  # See: https://github.com/JuliaLang/julia/issues/41360#issuecomment-872075102
nordlow commented 1 year ago

I reran with Julia master and got:

Lang-uage Temp-lated Check Time [us/fn] Compile Time [us/fn] Build Time [us/fn] Run Time [us/fn] Check RSS [kB/fn] Build RSS [kB/fn] Exec Version Exec Path
D No 7.6 (3.7x) 17.1 (10.7x) 21.2 (12.0x) 78 (4.0x) 4.4 (9.8x) 13.4 (30.1x) v2.103.0-rc.1-87-g7e84fb3333-dirty dmd
D No 5.0 (2.4x) 91.6 (57.1x) 92.4 (52.4x) 325 (16.6x) 4.8 (10.7x) 19.8 (44.3x) 1.30.0 ldmd2
D No 7.3 (3.5x) 232.8 (145.2x) 231.3 (131.0x) 64 (3.2x) 4.6 (10.3x) 19.2 (43.1x) 11.3.0 gdc
D Yes 19.9 (9.6x) 32.2 (20.1x) 36.0 (20.4x) 49 (2.5x) 12.6 (27.8x) 22.0 (49.4x) v2.103.0-rc.1-87-g7e84fb3333-dirty dmd
D Yes 10.5 (5.1x) 97.7 (61.0x) 100.0 (56.7x) 272 (13.9x) 12.9 (28.6x) 28.9 (64.8x) 1.30.0 ldmd2
D Yes 13.3 (6.5x) 244.7 (152.7x) 241.7 (136.9x) 62 (3.1x) 13.4 (29.6x) 28.8 (64.6x) 11.3.0 gdc
C No 2.1 (best) 1.6 (best) 1.8 (best) 20 (best) 0.5 (best) 0.4 (best) 0.9.27 tcc
C No 9.4 (4.6x) 293.4 (183.1x) 303.0 (171.6x) 36 (1.9x) 2.7 (6.0x) 13.6 (30.6x) 12.1.0 gcc
C No 5.9 (2.9x) 207.8 (129.7x) 203.7 (115.4x) 60 (3.1x) 2.7 (6.1x) 14.2 (31.7x) 9.5.0 gcc-9
C No 6.1 (3.0x) 217.8 (135.9x) 219.4 (124.3x) 37 (1.9x) 2.7 (6.1x) 14.2 (31.8x) 10.4.0 gcc-10
C No 6.7 (3.3x) 228.2 (142.4x) 221.2 (125.3x) 38 (1.9x) 2.6 (5.9x) 14.1 (31.7x) 11.3.0 gcc-11
C No 10.1 (4.9x) 298.7 (186.4x) 299.2 (169.5x) 23 (1.1x) 2.8 (6.2x) 13.6 (30.6x) 12.1.0 gcc-12
C No 18.1 (8.8x) 119.7 (74.7x) 120.6 (68.3x) 612 (31.2x) 2.1 (4.6x) sampling error 14.0.0-1 clang
C No 18.1 (8.8x) 115.6 (72.1x) 118.6 (67.2x) 545 (27.8x) 2.1 (4.6x) 9.4 (21.1x) 14.0.0-1 clang-14
C++ No 14.3 (7.0x) 233.5 (145.7x) 233.9 (132.5x) 38 (1.9x) 4.4 (9.7x) 14.0 (31.5x) 11.3.0 g++
C++ No 14.3 (6.9x) 229.4 (143.1x) 232.4 (131.7x) 34 (1.7x) 4.4 (9.7x) 14.1 (31.5x) 10.4.0 g++-10
C++ No 14.1 (6.8x) 228.5 (142.6x) 236.8 (134.1x) 37 (1.9x) 4.4 (9.7x) 14.0 (31.5x) 11.3.0 g++-11
C++ No 23.1 (11.2x) 315.3 (196.8x) 318.3 (180.3x) 65 (3.3x) sampling error 16.4 (36.8x) 12.1.0 g++-12
C++ No 26.0 (12.6x) 128.9 (80.4x) 127.9 (72.5x) 541 (27.6x) 2.2 (4.8x) 9.4 (21.1x) 14.0.0-1 clang
C++ No 25.2 (12.2x) 129.4 (80.7x) 132.7 (75.2x) 541 (27.6x) 2.2 (4.8x) 9.4 (21.1x) 14.0.0-1 clang-14
C++ Yes 30.5 (14.8x) 278.3 (173.6x) 277.9 (157.5x) 28 (1.4x) 8.0 (17.7x) 20.5 (46.0x) 11.3.0 g++
C++ Yes 30.9 (15.0x) 278.2 (173.6x) 279.6 (158.4x) 27 (1.4x) 8.0 (17.6x) 21.8 (48.9x) 10.4.0 g++-10
C++ Yes 29.1 (14.1x) 281.9 (175.9x) 280.7 (159.0x) 27 (1.4x) 8.0 (17.7x) 20.6 (46.1x) 11.3.0 g++-11
C++ Yes 41.7 (20.3x) 371.1 (231.6x) 366.9 (207.8x) 26 (1.3x) 8.0 (17.7x) 20.6 (46.1x) 12.1.0 g++-12
C++ Yes 40.0 (19.4x) 129.5 (80.8x) 134.5 (76.2x) 381 (19.5x) 4.0 (8.8x) 12.6 (28.3x) 14.0.0-1 clang
C++ Yes 39.1 (19.0x) 132.9 (82.9x) 136.4 (77.3x) 622 (31.7x) 4.0 (8.8x) 12.6 (28.3x) 14.0.0-1 clang-14
Ada No N/A N/A 943.7 (534.7x) 68 (3.5x) N/A 31.3 (70.2x) 12.1.0 gnat
Ada No N/A N/A 950.3 (538.4x) 69 (3.5x) N/A 31.4 (70.3x) 12.1.0 gnat-12
Go No 16.0 (7.8x) N/A N/A N/A 4.0 (8.9x) N/A 1.18.3 gotype
N/A N/A N/A N/A N/A N/A 6.5 (14.5x) 24.3 (54.4x) N/A N/A
N/A N/A N/A N/A N/A N/A 11.2 (24.8x) 23.5 (52.7x) N/A N/A
Go No N/A N/A 166.0 (94.0x) 132 (6.7x) N/A 28.3 (63.4x) 1.18.3 go
N/A N/A N/A N/A N/A N/A N/A 18.4 (41.1x) N/A N/A
N/A N/A N/A N/A N/A N/A N/A 50.3 (112.8x) N/A N/A
Zig No 22.5 (10.9x) N/A 531.6 (301.2x) 1150 (58.7x) 5.6 (12.5x) 34.8 (78.1x) 0.11.0-dev.2545+311d50f9d zig
Zig Yes 27.2 (13.2x) N/A 547.6 (310.2x) 1123 (57.3x) 5.6 (12.5x) 35.9 (80.5x) 0.11.0-dev.2545+311d50f9d zig
Rust No 73.5 (35.7x) N/A 230.6 (130.6x) 1474 (75.2x) 13.6 (30.1x) 29.7 (66.6x) 1.70.0-nightly rustc
Rust Yes 84.9 (41.2x) N/A 148.9 (84.4x) 1442 (73.6x) 15.7 (34.8x) 18.6 (41.6x) 1.70.0-nightly rustc
Nim No 36.7 (17.8x) N/A 80.5 (45.6x) 66 (3.3x) 4.2 (9.3x) 8.0 (18.0x) 1.4.6 nim
C# No N/A N/A 21.6 (12.2x) 384 (19.6x) N/A 4.4 (9.8x) 6.12.0.182 mcs
N/A N/A N/A N/A N/A N/A N/A 13.2 (29.6x) N/A N/A
OCaml No N/A N/A 445.5 (252.4x) 637 (32.5x) N/A 34.6 (77.5x) 4.13.1 ocamlopt
OCaml No N/A N/A 87.6 (49.6x) 907 (46.3x) N/A 17.7 (39.6x) 4.13.1 ocamlc
Julia No N/A N/A 410.5 (232.6x) N/A N/A 25.6 (57.4x) 1.10.0-DEV julia
Julia Yes N/A N/A 335.6 (190.1x) N/A N/A 25.4 (56.8x) 1.10.0-DEV julia

.

PallHaraldsson commented 1 year ago

Since the script is using --compile=min, then alternatively you could drop it to see if the default is better, or e.g. -O0.

Anyway, it's at least going in the right direction. And 1.8.0-DEV is of course very outdated, and I expect 1.9.0 to be released in a week or so, so it's time for 1.10.0-DEV.

nordlow commented 1 year ago

Using -O0 is slower than --compile=min. I checked.

nordlow commented 1 year ago

Closing this.

PallHaraldsson commented 1 year ago

Good to know about -O0 (also slower with the default -O2, or -O1?). You can get 25% faster parsing with JuliaSyntax.jl, but since it likely wasn't the bottleneck (your call to check, or decide to use that non-default option), I guess you can ignore it.

PallHaraldsson commented 1 year ago

They did fix constprop to be faster, but there's no way to drop that optimization completely. Doing away with it, or all opt, doesn't seem like a priority. Because you don't compile code that often. In Julia 1.9, packages are fully precompiled to assembly. [It would be an option to change your code to a package/module, but I don't think a module alone will do it, and I think you want to test the actual compilation time, not ways to get around it.]

You could at least update to the latest numbers, as you did in the table above, to the actual readme. I might look into this extra 25% speed, I understand if not a priority for you, not sure it is for me (i.e. for this benchmark).

PallHaraldsson commented 1 year ago

FYI: I can confirm with JuliaSyntax.jl (it's easy to use, but for the benchmark as is it's needs to be compiled into the sysimage) I get 21% faster.

Possibly you should try to compile the code for other languages too with optimizations on, i.e. -O2 (or -O3?) for fair comparison with Julia on its defaults? It might at least to be able to see two tables, add another for that.

PallHaraldsson commented 1 year ago

FYI "Add native UTF-8 Validation using fast shift based DFA #47880" was just merged and it seems 20x faster.

I'm not actually sure if the parser uses it, but instead of looking into it, we can see if the parser gets faster in the next nightly. So you may want to wait with publishing new results. [I only see the new parser calls isvalid for individual Char, not Strings, what would you think Dlang does?]

PallHaraldsson commented 1 year ago

Hi,

I think you have a long benchmark (or so I recall, maybe only after inlining). I think this might be relevant (to test on when merged to master):

https://github.com/JuliaLang/julia/pull/50756

nordlow commented 1 year ago

Can you perform the benchmark yourself?

PallHaraldsson commented 1 year ago

I can, and did (now that that PR was merged).

I do get 12% improvement over 1.9.2, which is though not the great improvement I was hoping for, nor did the PR help. I.e. I get similar on the beta, where I believe it's not in.

$ juliaup default dev
..
| Lang-uage | Temp-lated | Check Time [us/fn] | Compile Time [us/fn] | Build Time [us/fn] | Run Time [us/fn] | Check RSS [kB/fn] | Build RSS [kB/fn] | Exec Version | Exec Path | 
| :-------: | ---------- | :----------------: | :------------------: | :----------------: | :--------------: | :---------------: | :---------------: | :----------: | :-------: | 
| Julia     | No         | N/A                | N/A                  |  585.9 (1.2x)      | N/A              | N/A               |   31.9 (1.1x)     | 1.11.0-DEV   | julia     | 
| Julia     | Yes        | N/A                | N/A                  |  489.9 (best)      | N/A              | N/A               |   28.8 (best)     | 1.11.0-DEV   | julia     | 

vs. 554.7 on 1.9.2. I also tried all settings for JULIA_INTERPRET_FLAGS and JULIA_COMPILE_FLAGS. I.e. defaults are still much slower, though maybe some improvement there too.