Closed LonelyCat124 closed 2 years ago
I have an implementation now supporting distributed memory as well which I'll push once the VPN is working for me. I've not tested it with an up-to-date compiler, I will attempt on glados also once the VPN is working for me again.
Ok, this builds on Scafellpike with OpenMPI4 & gcc 9 so I'm happy the code produced is valid. I'll check correctness next.
Non-mpi version:
ua checksum = 0.52670513E+02
va checksum = 0.57100321E+03
psykal_omp manual version:
ua checksum = 0.52670513E+02
va checksum = 0.57100321E+03
MPI task version:
ua checksum = 0.52670513E+02
va checksum = 0.57100321E+03
Looks like its correct! Think this is ready for review.
Ok, I think this now works using PSyclone master branch. My only issue is that (on my laptop) the MPI build fails with an ICE in mpifort. I will test it on glados when my VPN is behaving properly.
@arporter @sergisiso do either of you have time to have a quick look over this and check the script seems appropriate? The non-MPI version is pretty simple, the MPI one has to do a lot of checking and I don't know if there's a better way to do it.
Hi @LonelyCat124, I've had a look and I think it's OK. I think it's the same problem we tackle in the OpenACC script where we want to find the biggest region that we can put inside an ACC Kernels block, see https://github.com/stfc/PSyclone/blob/b1d9c0107057af563a782aeb8f8becbe9906627d/examples/nemo/scripts/kernels_trans.py#L456-L499 I'm not sure what we have there is much better than what you have though.
We should probably provide some sort of generalised support for implementing this pattern. A user could provide a function that determines whether it's OK to include a given node in the region and get back a bunch of regions.
Yeah it looks pretty similar.
Shall I make an issue in PSyclone for that functionality?
Yes please!
Hi @LonelyCat124, presumably this now needs a full review? (Well, as 'full' as we do for PSycloneBench.)
I think its ready now yes! (I'll try to double check a few things if I can tomorrow)
@arporter I think I fixed the review changes, but I've either broken this Makefile or I've broken my PSyclone installation. Can you double check this still builds for you?
I think I fixed the pylint issue, and had to add an extra declaration of idx
so it was visible outside the loop.
I have implemented a script using the
OMPTaskloopTrans
andOMPTaskwaitDirective
.I think this correctly captures all of the dependencies, while minimising the number of statements required to do this. This is done by lines 31-62 of the code, and lines 68-70.
What this does, is use
forward_dependence
to find the next dependency for each loop, and find the index of this dependency in the schedule. It then loops through these dependencies. If there are other dependencies between X and its dependent which are covered by the creation of such a dependence, it is deleted (line 61-62). If another dependence will be created sooner, this dependence is removed.I'm sure there are some cases which this doesn't entirely minimise, so if anyone can think of those examples (or ways to implement tests of these?). I think this still handles a case such as:
which would turn into:
This requires https://github.com/stfc/PSyclone/pull/1368 to work right now.
The dependence analysis here is a prototype that will be used to build the
OMPTaskwaitTransformation
at a later stage. I am going to add comments to explain the process now too.