rodarima / cpic

Particle in Cell simulation of plasma in C
GNU General Public License v3.0
1 stars 1 forks source link

MFT solver takes most of the time with more than 1 process #13

Closed rodarima closed 4 years ago

rodarima commented 4 years ago

It seems the communications in the FFT may cause this issue. Can we investigate the create plan overhead compared to the FFT computation?

$ mpirun -n 2 ./cpic conf/simd.conf
...
stats iter=47 last=4.125259e-01 mean=4.302998e-01 std=1.540215e-02 sem=2.246635e-03 rsem=5.221092e-03 mem=46628 solver=3.916919e-01
stats iter=48 last=3.771588e-01 mean=4.299295e-01 std=1.545187e-02 sem=2.230286e-03 rsem=5.187562e-03 mem=46628 solver=3.912867e-01
stats iter=49 last=3.663834e-01 mean=4.288526e-01 std=1.704752e-02 sem=2.435359e-03 rsem=5.678780e-03 mem=46628 solver=3.902378e-01
Total time: 2.138171e+01 s
1.963117e+01 91.8% field_E
1.704385e-01  0.8% particle_x
1.004637e+00  4.7% particle_wrap
3.134956e-01  1.5% field_rho
2.668671e-01  1.2% particle_E
0.000000e+00  0.0% output_particles
0.000000e+00  0.0% output_fields
Simulation ends
Total time: 2.138590e+01 s
1.946058e+01 91.0% field_E
1.763841e-01  0.8% particle_x
1.109353e+00  5.2% particle_wrap
2.112792e-01  1.0% field_rho
2.796531e-01  1.3% particle_E
0.000000e+00  0.0% output_particles
0.000000e+00  0.0% output_fields
Simulation ends
$ mpirun -n 1 ./cpic conf/simd.conf
stats iter=47 last=7.367235e-02 mean=7.470654e-02 std=4.401337e-03 sem=6.420010e-04 rsem=8.593639e-03 mem=50224 solver=1.516419e-03
stats iter=48 last=7.382230e-02 mean=7.468499e-02 std=4.356821e-03 sem=6.288529e-04 rsem=8.420071e-03 mem=50224 solver=1.513608e-03
stats iter=49 last=7.412300e-02 mean=7.466739e-02 std=4.312960e-03 sem=6.161371e-04 rsem=8.251756e-03 mem=50224 solver=1.510907e-03
Total time: 3.733411e+00 s
1.557126e-01  4.2% field_E
2.507990e-01  6.7% particle_x
2.550165e+00 68.3% particle_wrap
3.151033e-01  8.4% field_rho
4.681872e-01 12.5% particle_E
0.000000e+00  0.0% output_particles
0.000000e+00  0.0% output_fields
Simulation ends
rodarima commented 4 years ago

It turns out the computation time is smaller as the number of processes increases. The creation of the plan dominates the solver time with more than 2 processes.

Processes  Stats
1          Solver comp/total: 9.393120e-04/1.369303e-03 = 6.859782e-01
2          Solver comp/total: 1.384648e-02/3.762607e-01 = 3.680022e-02
4          Solver comp/total: 2.705376e-02/2.438905e+00 = 1.109258e-02
rodarima commented 4 years ago

Creating the plan only at iteration -1 solves the problem, see 56c5f4228d85928ecd75c8c28f62af183f1a176e