Open certik opened 1 year ago
Results on my machine, Mac M1 2020 (8 GB):
GFortran with -ffast-math
% gfortran -O3 -march=native -ffast-math src/minpack.f90 examples/example_hybrd.f90 -o hybrd.gf
% time ./hybrd.gf
...
./hybrd.gf 0.12s user 0.01s system 19% cpu 0.641 total
% time ./hybrd.gf
...
./hybrd.gf 0.12s user 0.01s system 95% cpu 0.133 total
LFortran
% lfortran -c src/minpack.f90 && lfortran --fast examples/example_hybrd.f90 -o hybrd.lf
% time ./hybrd.lf
...
./hybrd.lf 0.22s user 0.00s system 29% cpu 0.766 total
% time ./hybrd.lf
...
./hybrd.lf 0.23s user 0.01s system 97% cpu 0.246 total
GFortran without -ffast-math
% gfortran -O3 -march=native src/minpack.f90 examples/example_hybrd.f90 -o hybrd.gf2
% time ./hybrd.gf2
...
./hybrd.gf2 0.19s user 0.01s system 27% cpu 0.692 total
% time ./hybrd.gf2
...
./hybrd.gf2 0.18s user 0.01s system 98% cpu 0.191 total
Compiler | Time(s) | Relative Speed (compared to LFortran) |
---|---|---|
GFortran with -ffast-math |
0.133 | 0.54 |
GFortran without -ffast-math |
0.191 | 0.78 |
LFortran (fast) | 0.246 | 1.0 |
I'm not sure if the results of lfortran
are correct because the lfortran
is built for macOS-x86_64
on my machine while I'm having macOS-arm64
.
@Smit-create yes, you are benchmarking the Rosseta that translates x64->arm64. I think your machine should give similar results to mine. Thanks for double checking it. Btw, you should disable the online communication, which will speedup the first run by about 0.5s for you: https://github.com/lcompilers/lpython#speed-up-integration-tests-on-macos.
Results on my machine, Ubuntu 22.04 (8GB):
GFortran with
-ffast-math
$ gfortran -O3 -march=native -ffast-math src/minpack.f90 examples/example_hybrd.f90 -o hybrd.gf $ time ./hybrd.gf .... real 0m0.196s user 0m0.171s sys 0m0.008s
LFortran
$ lfortran -c src/minpack.f90 && lfortran --fast examples/example_hybrd.f90 -o hybrd.lf $ time ./hybrd.lf ... real 0m0.842s user 0m0.834s sys 0m0.008s
GFortran without
-ffast-math
$ gfortran -O3 -march=native src/minpack.f90 examples/example_hybrd.f90 -o hybrd.gf2 $ time ./hybrd.gf2 real 0m0.474s user 0m0.469s sys 0m0.005s
LFortran, clang with
-ffast-math
$ lfortran -c src/minpack.f90 && lfortran --fast examples/example_hybrd.f90 --show-llvm > x.ll $ clang -O3 -march=native -ffast-math x.ll -L"/home/pranavchiku/lfortran/src/bin/../runtime" -Wl, rpath,"/home/pranavchiku/lfortran/src/bin/../runtime" -llfortran_runtime -lm -o hybrd.lf2 $ time ./hybrd.lf2 ... real 0m0.248s user 0m0.244s sys 0m0.004s
@Pranavchiku do I understand your timings correctly, that by using Clang's LLVM optimizer, you are able to get 0.248s vs 0.196s for GFortran? That's about 26% slower, which is amazingly good for LFortran (as a starting point), if true.
Yes, correct. This is a great speed at this stage.
I think every machine is different. On mine (macOS Ventura 13.3.1 Apple M1 8 GB) the maximum I can go to is 825.
LFortran commit - d27eff098
--- a/examples/example_hybrd.f90
+++ b/examples/example_hybrd.f90
@@ -10,7 +10,7 @@ program example_hybrd
implicit none
- integer,parameter :: n = 9
+ integer,parameter :: n = 825
integer,parameter :: ldfjac = n
integer,parameter :: lr = (n*(n+1))/2
LFortran
(lf) 21:46:12:~/lfortran_project/minpack % lfortran -c src/minpack.f90 && lfortran --fast examples/example_hybrd.f90 -o hybrd.lf
./hybrd.lf 0.64s user 0.01s system 99% cpu 0.646 total
GFortran
(arm-compilers) 21:49:19:~/lfortran_project/minpack % gfortran -O3 -march=native -ffast-math src/minpack.f90 examples/example_hybrd.f90 -o hybrd.gf
./hybrd.gf 0.35s user 0.01s system 99% cpu 0.361 total
(arm-compilers) 21:50:57:~/lfortran_project/minpack % gfortran -O3 -march=native src/minpack.f90 examples/example_hybrd.f90 -o hybrd.gf2
./hybrd.gf2 0.59s user 0.01s system 99% cpu 0.602 total
Clang
(lf) 21:52:23:~/lfortran_project/minpack % clang -O3 -ffast-math x.ll -L$HOME/lfortran_project/lfortran/src/runtime/ -llfortran_runtime -Wl,-rpath -Wl,$HOME/lfortran_project/lfortran/src/runtime/ -o hybrd.lf2
./hybrd.lf2 0.66s user 0.01s system 99% cpu 0.674 total
Versions
(lf) 21:55:31:~/lfortran_project/minpack % clang --version
Apple clang version 14.0.3 (clang-1403.0.22.14.1)
Target: arm64-apple-darwin22.4.0
Thread model: posix
InstalledDir: /Library/Developer/CommandLineTools/usr/bin
(lf) 21:55:32:~/lfortran_project/minpack % conda activate arm-compilers
(arm-compilers) 21:55:39:~/lfortran_project/minpack % gfortran --version
GNU Fortran (GCC) 11.0.1 20210403 (experimental)
Copyright (C) 2021 Free Software Foundation, Inc.
This is free software; see the source for copying conditions. There is NO
warranty; not even for MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.
One can apply the following patch:
Larger number fails due to https://github.com/lfortran/lfortran/issues/1624 on my computer. The best is to use the upstream modern minpack from https://github.com/fortran-lang/minpack.git, I am using commit c0b5aea9fcd2b83865af921a7a7e881904f8d3c2.
Results with GFortran 11.3.0 (from Spack) on Apple M1 Max:
And LFortran (latest master commit d27eff0987cafd590086d6a8ca7107b21e46820a):
The results seem to agree exactly and we are 1.8x slower. But we don't use "fast-math" (because I haven't figured out how to enable it in LLVM yet, patches welcome!), so we can also compare against GFortran without
-ffast-math
:Now we are only 7% slower.
One can also try compiling to LLVM and then use Clang, but I am not sure if all optimizations are enable correctly:
Since it seems even slower than via LFortran directly.