There have been complaints about the slow execution of the code, which I share to an extend. Part of the problem is the IO, which we can only get around to a certain extend, especially, since there is a tradeoff with memory consumption, which is a big constrain on HPC environments. Another issue might be that we inhibit further implicit parallelization and vectorization by the compiler with calls to the clock routines (which are not pure) or input parameter checking (which can lead to program exits and is therefore not pure as well).
This might be a specific issue to discuss with HPC experts, even though I am not very certain on how much speed-up we could gain.
There have been complaints about the slow execution of the code, which I share to an extend. Part of the problem is the IO, which we can only get around to a certain extend, especially, since there is a tradeoff with memory consumption, which is a big constrain on HPC environments. Another issue might be that we inhibit further implicit parallelization and vectorization by the compiler with calls to the clock routines (which are not pure) or input parameter checking (which can lead to program exits and is therefore not pure as well).
This might be a specific issue to discuss with HPC experts, even though I am not very certain on how much speed-up we could gain.