Open LonelyCat124 opened 3 years ago
Could you take a step back and give a quick example of what you want to detect in the first place, independent of existing calls? We have a new dependency analysis already implemented (mostly ;) ) - if I could quickly see what you want to detect I could provide some guidance (if the new one can already do what you want, or if it can be extended, or ... if we need something new).
The most obvious to me right now is the use case in #1415
In this, there is some number of OMPTaskloopDirective
nodes, which may have some forward dependence to another node. Usually these dependencies will be to another OMP[Task]?LoopDirective
, though there is no requirement on that. To be able to generate legal OpenMP code (i.e. no race conditions) it needs to be able to do the following:
OMPSingleDirective(nowait=True)
or OMPMasterDirective
subtrees.OMPParallelDirective
subtrees. I could just ignore these, but it would be preferable to be able to check this I think.OMPTaskwaitDirective
nodes to avoid race conditions. This is done by search for forward_dependences of each OMPTaskloopDirective and then removing superflous ones (since a single OMPTaskwaitDirective
satisfies all dependencies that span it).It seems like removing the self.sameParent(node)
has resolved my immediate issues (or at least, my tests are beginning to pass), but this may not continue if I do more complicated systems.
Edit: Also in theory I should do something different if there is no nogroup
clause on the OMPTaskloopDirective
but I've not yet looked into that.
I don't have enough time today (NGMS Champions meeting), but I think the new DA might be quite useful. It is documented at https://psyclone-dev.readthedocs.io/en/latest/psyir_symbols.html#variable-accesses (and yes, it's hard to find, I have a ticket to move that around). It works somewhat different from the old code that was focused on getting next accesses. Basically, you can get a list of all variable accesses in a region. So if you consider your two region, you could do something like this:
va1 = VariablesAccessesInfo(region1_psyir)
all_sigs1 = va1.all_signatures # See details in docs, for non-structured types a signature is basically just the variable name
va2 = VariablesAccessesInfo(region2_psyir)
all_sigs2 = va2.all_signatures
for sig1in all_sigs2:
if va1 not in all_sigs2:
# this variable is not at all used in the second loop and can be ignored
continue
# Otherwise this variable is used in both loops: get all access to this variable in each loop:
access1 = va1[sig1]
access2 = va2[sig1]
if access1.is_read_only() and access2.is_read_only()
# Variable is only read in both loops, it can be ignored
if access1.is_written():
# this variable is written in the first loop, and used (read or write, that shouldn't matter) in the 2nd loop
# we need a wait between
do taskwait here or set flag or whatever
.... # Similar checks for read followed by write, and write-write .... though I have the feeling in all these cases a wait would be required.
You get the idea. There are some convenience functions missing (I am working on that). The accesses are sorted and have a statement number, so you can test if there is a read before a write etc (that's one place where we need something more convenient).
Also a caveat: if statements are not 'properly' handled, the information is collected as if both branches are executed. Typically that's fine, but if you have something like (pseudo-code):
if something then
a = b
else
b = a
you will get that there is a write access to a
followed by a read access. At this stage you would have to manually detect and handle if-statements if required.
Am happy to provide more details. I will soon be getting back to this and start implementing more functionality. There is also a DepdendencyTool
class that provides some higher level functionality, that's the best place to look at some examples. I thought I had a bit more on a branch somewhere (related to fusing loops, i.e. exactly the case where we have to handle two loops - all examples in the DepndencyTools handle only a single loop iirc), but I can't find that atm :)
I think now that (fingers crossed) that draft PR is fully tested and "complete", I'll take a look at the new dependence analysis and see if that fits better.
I had a look, and I think I can implement a forward_dependence
style algorithm using a tree-walk for my own use-case.
One interesting type of Node that PSyclone has no concept of right now is a barrier, which in my opinion is a useful object for for parallel code dependence analysis. An easy example of how barriers change dependencies is this use case:
#pragma omp parallel
{
#pragma omp single nowait
{
#pragma omp taskloop nogroup
for(i=0; i < n; i++){
a[i] = b[i] + 2;
}
}
#pragma omp for
for(i = 0; i < n; i++){
a[i] *= 2;
}
}
This code has a race condition, because nothing handles the dependency between the taskloop and the parallel for loop. There are two ways to handle this dependency (I think? I'm trying to double check that the first solution works):
nowait
clause from the single pragma. This adds an implicit barrier (which I think satisfies the dependency, i.e. the taskloop
s dependency now points at either the single or None)taskwait
or ?barrier
?) at either the end of the single
region (taskwait
) or after the end of the single
region (barrier
)A potentially useful Mixin class might be a Barrier
which has special rules associated with it (e.g. for any set of dependency arguments a Barrier
counts as a forward_dependency
or backward_dependency
for all of them). This is not useful for low-level loop dependencies (i.e. for operations like fusing loops), but is potentially a valuable restriction for asynchronous parallel constructs (e.g. tasks and maybe some GPU operations).
Edit: For the openMP example above, only removing the nowait
, removing the nogroup
or adding an explicit omp barrier
resolves the dependency, a taskwait
does not. I'm not sure whether this is specified in OpenMP or is just an artefact of the compiler I'm using.
@hiker in your example you do for sig1 in all_sigs2
- should that be so or should it be for sig1 in all_sigs1
?
My guess is it should do:
for sig1 in all_sigs1:
if sig1 not in all_sigs2:
continue
...
Or do i misunderstand how this works?
Ok, I have an implementation using this, but it is oversensitive right now as it catches things like loop variables (which are private, so are false dependencies). I probably need instead to loop through the children of my nodes and get the lists of all their non-Loop/etc. children's accesses and join those together in some way. I guess something like
vai = VariablesAccessInfo()
for child in taskloop.walk(Node):
if not isinstance(child, Loop) and child is not taskloop:
vai.merge(VariablesAccessInfo(child))
I will try doing something like this and see how it works.
Edit: This seems to work...:o
@hiker in your example you do
for sig1 in all_sigs2
- should that be so or should it befor sig1 in all_sigs1
?My guess is it should do:
for sig1 in all_sigs1: if sig1 not in all_sigs2: continue ...
Or do i misunderstand how this works?
You are right, that's Indeed a typo. I didn't test the code :)
Ok, I have an implementation using this, but it is oversensitive right now as it catches things like loop variables (which are private, so are false dependencies). I probably need instead to loop through the children of my nodes and get the lists of all their non-Loop/etc. children's accesses and join those together in some way. I guess something like
vai = VariablesAccessInfo() for child in taskloop.walk(Node): if not isinstance(child, Loop) and child is not taskloop: vai.merge(VariablesAccessInfo(child))
I will try doing something like this and see how it works.
Edit: This seems to work...:o
I can't commit about the taskloop, directive are not (yet) supported: the DA was meant to be used to detect (eg.) which omp directives to use. My best idea for this would indeed be to create code sections, and assemble the data access for each section.
Here what I do for loop variables (in the dependency tools), when analysing variables to see if they prevent loop parallelization:
# Get the list of all loop indices
loop_vars = [loop.variable.name for loop in loop.walk(Loop)]
# Now check all variables used in the loop
for signature in var_accesses.all_signatures:
# This string contains derived type information, e.g.
# "a%b"
var_string = str(signature)
# Ignore all loop variables - they look like reductions because of
# the write-read access in the loop:
if var_string in loop_vars:
continue
...
I also should point out that the new DA is at a somewhat 'lower' level than the 'forward_dependence` approach: this DA gets all implicit arguments that PSyclone adds for LFRic (which can sometimes be surprising). I intend to make this optional later, once all of LFRic is using PSyIR.
Summary:
self.sameParent(node)
check inforward_dependency()
limits the available dependency structure (and can easily cause invalid dependencies for models such as OpenMP).node.forward_dependence( (OMPTaskloopDirective, OMPLoopDirective, Loop) )
and it would find the first dependency for whichinstanceof(forward_dependent, (OMPTaskloopDirective, OMPLoopDirective, Loop) )
returnsTrue
.2&3 would satisfy the requirements I can think of for OpenMP at least.
Applying loop transformations seems to lose dependency information, e.g.
This is lost due to before being deeper in the tree than schedule1.children[1], if I apply the tloop to both nodes:
This still doesn't work (
before.forward_dependence()
is still None).However, if I do
This works fine again (because they're at the same level? I assume).
If I apply a the same transformation to
schedule1.children[1]
the forward_dependence switches to point to the new node correctly.The issue I'm then having, is that when applying further transformations to these nodes, the dependency information disappears:
The last assert fails here, and also here:
I think this is just due to me using apply wrong - it works for
apply(schedule.children)
In the former case, this is safe for these specific transformations, as there are now barriers between a and b. However, it is easy to create an example where this is not safe:
I tried looking through the code to see why this dependency disappears (for the case where I apply to schedule.children[0:1]), and after the later transformations are applied, inside
forward_dependence()
this line (node.py:692) findsdependent_arg = cu_fld
which is as expected (I'm not sure why the other args are None but not an issue for now)The two nodes (
self
andnode
) both have depth 7, are not the same node, and are bothOMPTaskloopDirective
nodes, however this breaks because the schedule containing those nodes are not the same, and thus theself.sameParent(node)
check fails, though as far as OpenMP cares they exist within the same barrier so are dependent (if you ran this code with those transformations without the taskloop transformations you would already have a race condition).My proposal on how to fix this would be (as a temporary fix) probably to replace the
self.sameParent(node)
with a check to see if they are contained within the same child of the root (or child of child of the root? Essentially if they share a highest-level non-Schedule
ancestor, be that an Invoke, OMPParallelDirective, ACCParallelDirective, etc.). I don't know what else this might break though, why does it check at the moment if they have the same parent?What I need ideally for dependency tracking is the next-non-child and non-ancestor node which has a shared dependency, i.e. for some tree:
B's next dependents could be any of E, G, J or K C's next dependents could be any of D, E, F, G, H, J, K, L
I was trying to work out if any requirement on >= depth would be helpful, but I don't think it would since we could have:
And in this case LOOP_A has depth 3 whilst LOOP_B has higher depth. It would probably be useful to be able to specify the type of node we care about, e.g.
node.forward_dependence( (OMPLoopTrans, OMPTaskloopTrans, ...) )
which finds the next of a specific type of instance to look for (which helps restrict what depth things can appear at, so e.g. anOMPLoopDirective
doesn't return a followingOMPParallelDirective
if you restrict it, but rather whichever of theOMPParallelDirective
's first children causes that dependency).