xcompact3d / x3d2

https://xcompact3d.github.io/x3d2
BSD 3-Clause "New" or "Revised" License
3 stars 4 forks source link

Performance improvement in transeq in the OpenMP backend #66

Open semi-h opened 6 months ago

semi-h commented 6 months ago

I'm copying my comment in #27 so that we don't forget about it.

https://github.com/xcompact3d/x3d2/blob/2d906a5cafe6060b8021b66aea8ff360558c3968/src/omp/exec_dist.f90#L164-L185

I realised that here we're writing 3 field sized arrays into main memory unnecessarily. It is potentially increasing the runtime %20.

In the second phase of the algorithm here we pass a part of the du, dud, and d2u into der_univ_subs, and they're all rewritten in place. Then later we combine them in rhs for the final result. Ideally, we want du, dud, and d2u to be read once and rhs to be written only once. However because of the way der_univ_subs work, the updated data in du arrays after der_univ_subs call gets written in the main memory, even though we don't need this at all.

There are three ways we can fix this