Open arporter opened 3 years ago
e.g. if we have some code:
my_array(:,:) = cp_air(array1(:,:)) * array2(:,:) + array3(:,:)
where cp_air
is an array-valued function that performs compute then this is hard to parallelise/ensure it remains GPU resident. (If we put ACC KERNELS inside cp_air
then we can't have it around this assignment.) Ideally we need to refactor this:
my_tmp(:,:) = cp_air(array1(:,:))
my_array(:,:) = my_tmp(:,:)*array2(:,:) + array3(:,:)
That then allows us to have KERNELS inside cp_air() and, separately, around the assignment.
Better still would be to in-line cp_air but we have to walk before we can run.
If I remember correctly, some of the existing kernel transformations (that replace intrinsics) have logic similar to the above. Perhaps the splitting out of an intrinsic/function should have its own transformation which can then be shared.
Yes, I said as much to Chris when we were discussing this. Hopefully there's some reuse possible.
Obviously, a step 0 for this work is to be able to identify array-valued functions. Currently, all RoutineSymbols that represent Fortran functions are given DeferredType (#1294).
This is going quite well. I need to handle the case where there's a name clash for a ContainerSymbol
as we can't simply rename it in that case.
Note to self: I need to identify when a routine accesses variables made available from some outer scoping region as this prevents inlining.
The first review and subsequent work on validate
has thrown up a number of things that I'm adding TODOs for this time around:
validate
will be much nicer once we can query the datatype
of any Reference
(#1799)Range
needs extending to return the PSyIR for the number of elements it representsThe initial implementation of InlineTransformation is now on master, but some remaining TODOs need to be fixed to close this issue.
This is also one of the major issues in OpenMP for GPU now, mostly in the sbcblk region. Some functions that will need to be inlined are: q_sat, sbc_dcy, gamma_moist, cd_neutral_10m, psi_h, psi_m
Although the basic inlining functionality is on master, it still requires that the routine to be inlined be in the same Container as the call site. That means we need module inlining working in native PSyIR (currently I think it only works for PSyKAl APIs).
Alternatively, now that we have the elemental attribute, we can a resolve the cp_air
symbol (which could be imported) for:
my_array(:,:) = cp_air(array1(:,:)) * array2(:,:) + array3(:,:)
and then arrayrange2loop can decide how to convert it into loop-form depending if it is elemental or not.
(Ideally PSyclone should be able to do both, and the transformation script just chose how to approach it)
Currently a CodedKern
has a get_kernel_schedule
method. To generalise the KernelModuleInlineTrans
transformation we need to support Call
as well. Call
has routine
which gives the corresponding RoutineSymbol
. In the work I'm doing now I propose to add a RoutineSymbol.get_schedule
method which is very like the ContainerSymbol.container
method. I could name it schedule
but, since it potentially does a lot of searching, I prefer the get_
prefix. I'd argue that the container
method could also do with a get_
prefix.
I agree with the get_
but maybe it could be Routine.Symbol.get_routine()
instead of "schedule" to make it easier for people not familiar with the PSyIR concepts.
(PS: Now I would also rename Container to Module for the same reason, and I think I am to blame for picking the wrong name)
(PS: Now I would also rename Container to Module for the same reason, and I think I am to blame for picking the wrong name)
Well, we do aspire to be language-neutral so I think Container is fine. At least 'routine' is a generic concept :-)
Chatting with Martin and Hugo, it turns out that CROCO doesn't have modules but they would still like to be able to do inlining. This would mean we would need to insert routines into the FileContainer instead (or, we just extend InlineTrans
to search for the Routine outside of the current Container?).
For example: sbcblk.f90, cp_air() function.
Things that still need doing:
--profile
command-line flag;