Closed FractalArt closed 2 years ago
Hi @FractalArt, you're right, the communication overwhelms the computation so the code as is doesn't scale.
Try two things in tsunami.f90:
Increase grid size:
integer(int32), parameter :: grid_size = 1000
which was 100 originally, but you can try even larger. We kept it small in the book so that the example runs fast on single core. Increasing grid size increases the computation, which decreases the communication / computation ratio (you want this ratio as small as you can for parallel scaling).
Comment out the gather + print lines inside the time loop:
...
! gather to image 1 and write current state to screen
!gather(is:ie)[1] = h(ils:ile) ! there is a all-to-one communication here
sync all ! this sync is still important before we move to the next time step
!if (this_image() == 1) print *, n, gather
end do time_loop
This reduces the communication in each step. This gather operation is only for diagnostic purpose so removing it doesn't affect the result. Alternatively, to preserve some diagnostics, you can do the gather + print every 10th or 100th time step.
Let me know how this works out.
Ideally, this should have been explained in the book. We had a section about it but it didn't make the cut. However it should still be explained at least in the README of this repo. Do you agree?
Thank you for reading and reporting this.
Hi @milancurcic,
Thanks a lot for your quick reply. You're right, if I crank up the grid size I see the improvement, and as you say, the higher the grid size the bigger the impact of parallelization.
Regarding the explanation in the README, I have to say that I did not find it.
Regarding the explanation in the README, I have to say that I did not find it.
Yes, there isn't any right now, I meant I should make an effort to write an explanation there.
Hi,
I am reading your book and I have arrived at chapter 7 and was wondering whether you have any timings available for comparison. I was timing the code and was surprised to find that it runs faster on a single core than on four:
I think my coarray installation works since for the weather-buoy example, I do indeed see a speedup. Could it be that there is so much synchronization going on that it ruins the benefits of parallelization?
I am running on
Ubuntu 20.04.1 LTS
, withOpenCoarrays 2.9.0
(installed as described in Appendix A) on anIntel i7-8565U CPU @ 1.80GHz × 8
processor and I usegfortran 9.3.0
as a compiler.