Closed clavigne closed 3 years ago
Amazing job Cyrille! It's the first time I see this kind of F2PY+MKL combination, it's really neat! I'll update my tests and the post to reflect this.
There is a small cost to pay with the new syntax, though... I'm curious about what approach for Python integration are they promoting on Fortran-lang.
Hi! I saw your post on HN and really enjoyed it! Julia is a great language and very promising for HPC.
I was kinda struck through by how poor the Fortran performance was compared to pure Numpy. I started digging around and I realized that there are a few pretty key performance improvements that can made to the Fortran version that are specifically not done by gfortran but that are done by both numpy and Julia.
The key issue is that the GNU Fortran version uses the GNU intrinsics for exp() and sqrt(). These are well known to be poorly optimized compared to the intel ones, which is what is used by most numpy distributions (and I assume, Julia). Instead of using ifort, which is commercial, I elected to call the Vector MKL functions directly. The current code can be compiled using a conda env with numpy in it.
Here are the single thread timings for the optimized fortran version
and here are the corresponding numpy timings,
The speedup to fortran is 3.6x for the 10,000 version and 3x throughout.
With four threads on the same machine, the speedup is about 2x.
I don't have Julia installed on this machine to compare with though, but if the performance trends are similar, this improved Fortran code should be pretty much as fast as the Julia version.
Thanks! Cyrille