Optimize smearing - Githubissues

Smearing is a severe bottleneck for the simulation for MPI runs. This is because there is a lot of global summations going on and therefore the whole subroutine is a serial run without even OpenMP parallelization.

This can be improved by doing two sweeps of smearing. In the first sweep, all effective cells that are fully contained within an MPI rank can be smeared in the classical way with OpenMP parallelization. At this point, we record the cells that have been smeared already.

In the second sweep, we smear the cells that were not smeared previously. These effective cells should all span over multiple MPI ranks so should be done in serial with global reduction.

With this method, the only MPI communication that happens is for the cells that span over multiple MPI ranks. In practice, most cells are fully contained within their own MPI domains except for the central cells so there should be hardly any communication.

This greatly improves the scaling of the code when smearing is switched on.

ryosuke-hirai / HORMONE

Optimize smearing #76