mdmaas / julia-numpy-fortran-test

Comparing Julia vs Numpy vs Fortran for performance and code simplicity
https://www.matecdev.com/posts/numpy-julia-fortran.html
MIT License
7 stars 2 forks source link

Fortran program is suboptimal #2

Closed clavigne closed 3 years ago

clavigne commented 3 years ago

Hi! I saw your post on HN and really enjoyed it! Julia is a great language and very promising for HPC.

I was kinda struck through by how poor the Fortran performance was compared to pure Numpy. I started digging around and I realized that there are a few pretty key performance improvements that can made to the Fortran version that are specifically not done by gfortran but that are done by both numpy and Julia.

The key issue is that the GNU Fortran version uses the GNU intrinsics for exp() and sqrt(). These are well known to be poorly optimized compared to the intel ones, which is what is used by most numpy distributions (and I assume, Julia). Instead of using ifort, which is commercial, I elected to call the Vector MKL functions directly. The current code can be compiled using a conda env with numpy in it.

Here are the single thread timings for the optimized fortran version

0.09577751159667969
0.08719778060913086
0.1524660587310791
0.24555754661560059
0.3789374828338623
0.5553531646728516
0.740034818649292
0.9681270122528076
1.1795861721038818
1.4268994331359863

and here are the corresponding numpy timings,

N=1000, time in Numpy:  0.06533169746398926 seconds
N=2000, time in Numpy:  0.20993351936340332 seconds
N=3000, time in Numpy:  0.4487905502319336 seconds
N=4000, time in Numpy:  0.787825345993042 seconds
N=5000, time in Numpy:  1.2224347591400146 seconds
N=6000, time in Numpy:  1.7508139610290527 seconds
N=7000, time in Numpy:  2.3738491535186768 seconds
N=8000, time in Numpy:  3.0882439613342285 seconds
N=9000, time in Numpy:  4.071347951889038 seconds
N=10000, time in Numpy:  5.099215745925903 seconds

The speedup to fortran is 3.6x for the 10,000 version and 3x throughout.

With four threads on the same machine, the speedup is about 2x.

0.04375338554382324
0.04431009292602539
0.07324886322021484
0.11355233192443848
0.1703178882598877
0.24642515182495117
0.33534884452819824
0.4547462463378906
0.5772113800048828
0.7108926773071289

I don't have Julia installed on this machine to compare with though, but if the performance trends are similar, this improved Fortran code should be pretty much as fast as the Julia version.

Thanks! Cyrille

mdmaas commented 3 years ago

Amazing job Cyrille! It's the first time I see this kind of F2PY+MKL combination, it's really neat! I'll update my tests and the post to reflect this.

There is a small cost to pay with the new syntax, though... I'm curious about what approach for Python integration are they promoting on Fortran-lang.