stfc / PSyclone

Domain-specific compiler and code transformation system for Finite Difference/Volume/Element Earth-system models in Fortran
BSD 3-Clause "New" or "Revised" License
107 stars 28 forks source link

Improve next/previous_access functions in Reference #2704

Open LonelyCat124 opened 2 months ago

LonelyCat124 commented 2 months ago

We need to enhance the next_access and previous_access for what we need for various issues related to NG-Arch. We'll use a dataflow analysis approach to do this.

LonelyCat124 commented 2 months ago

Dataflow graph will take a Symbol and ScopingNode as inputs, and I believe should produce a directed graph of accesses, either as a map or otherwise - TBC.

LonelyCat124 commented 2 months ago

I was going to make a map using a Node as the accessor but we disallowed that a while ago when we added node equality - does anyone have an alternative idea or do I need to make a custom datatype here? @sergisiso @arporter

Edit: I could use abs_position as the accessor but its a bit messy maybe. Edit2: I can use position for some things, but abs_position is non-reversible which seems bad. I think a custom datatype might be the only real option here.

arporter commented 2 months ago

You could use id(node) but that too isn't very nice. I'd drive it from the use case - how do we expect the dataflow graph to be used? In fact, I'd go so far as to say you might want to draft a section on it in the User/Developer Guide first - that will crystallise these things and help us make sure we're going in the right direction.

LonelyCat124 commented 2 months ago

The dataflow graph's use case (as we currently know) is for a given Reference we want to find the next_accesses and previous_accesses to the symbol (?, maybe symbol isn't sufficient for StructureReferences) of this Reference. So really we need to be able to do 2 things:

  1. For a given node (which we could use id(node) or abs_position or position and its equivalent here) be able to lookup the next_accesses for a give node for a DefinitionUseChain. This is the easy case and can use any of the above I think.
  2. For a given node we need to be able to work out the previous_accesses for a DefinitionUseChain. This is more difficult, as we need to look through the chain/graph/dict and find all the nodes that have an edge to our node. Looking in reverse means we can't just use id/abs_position since thy're not reversible and so having a custom graph structure (or a python inbuilt library if its part of core python) seems necessary for this case. I'll see whats available.

Edit: Short of a large library to import, which I suspect we want to avoid, the "standard" approach seems to be a dict of sets. :(

LonelyCat124 commented 2 months ago

One option I guess would be a "wrapper" object that contained a node (and maybe some other info if useful) that would be hashable automatically and solves the issue.

LonelyCat124 commented 2 months ago

Ok, so a first design idea (as I understand the book Sergi sent me) is to divide the code into "basic blocks". A basic block is a region of code with no control flow - for our purposes I'm counting a Call as not being control flow as we don't need to worry so much about the details of a Call.

A Call counts as a read + write to anything that isn't a local variable, and counts as a read + write to a local variable that is an argument.

So as far as I understand, code like this would be broken down into 4 basic blocks:

subroutine break_down()
# Initialise variables
do i = 1, 100
   a(i) = 1
   b(i) = 2
end do
c(1) = a(3)
if (a(3) > b(3)) then
    b(3) = a(3)
else
   a(3) = b(3)
end if
c(2) = a(3) 
end subroutine break_down

The 4 basic blocks would be:

  1. The Loop
  2. The c(1) = a(3) assignment
  3. The IfElseBlock - this might need to be one basic block per condition/else statement, resulting in 1 to N many basic blocks depending on the IfElseBlock's structure (else ifs will be complex)
  4. The final c(2) = a(3) assignment.

Since we are only going to care about a specific input Reference when looking at next_access or previous_access, for each basic block I think we want to care about the following things:

  1. Does the basic block read from the symbol before writing to it? In this case it is a "user". In addition we should probably keep track of all of the reads (I think, though I'm not sure we will need them).
  2. If the basic block writes from the symbol then we keep track of the last write in the region. We additionally keep track of each "killed" write in the region (a write is killed when there is a follow-up write).

Once we have these basic blocks, we should be able to merge these results together using our knowledge of the control flow, e.g. this equation subject to handling the flow: image

The difficulty of this seems to be that nothing has helped resolve the control flow (which is the hard bit I think), but it does give us an easy way to find all the blocks that we need to consider.

I assume we need to:

  1. Treat Loops as "optional control flow" - so we reach into them but don't allow Loops to kill references.
  2. If blocks are also "optional control flow" unless they have a non-elseif branch? I'm not sure how easy it is to check the latter issue.

Once we have all of the blocks before/after the Reference we care about finding the next/previous access(es) to, we can start merging the following basic blocks in until we find one that fully kills the reference, and then we can just return all of the first accesses to the relevant symbol/signature/? in each of the basic blocks that we needed to merge. In principle we could just compute basic blocks 1 by 1 until we have "killed" the accesses.

Working backwards is a bit more complex but once we have an implementation for forward I think we'd be able to do something that seems reasonable.

Let me know if this makes sense/if I've missed any other cases/control flow.

arporter commented 2 months ago

I think that makes sense. Thanks to canonicalisation, the PSyIR does not have the concept of elseif so you're OK there.

arporter commented 2 months ago

There is some complexity: if the reference we care about is to a Symbol with global scope (i.e. imported from a module) then we'd have to assume that a Call to an impure Routine could modify it.

LonelyCat124 commented 2 months ago

I think that makes sense. Thanks to canonicalisation, the PSyIR does not have the concept of elseif so you're OK there.

Yes - so I think for out purposes here though a block of

else
  if X:

is equivalent to just an elseif in terms of whether we are guarantee to pass through a block or not - it may not matter though.

There is some complexity: if the reference we care about is to a Symbol with global scope (i.e. imported from a module) then we'd have to assume that a Call to an impure Routine could modify it.

Yeah - i think assuming Calls count as read/write References to anything that isn't a locally defined variable should be sufficient for this case.

hiker commented 2 months ago

Just pointing out that a lot of the required information is in the VariableAccessInfo class (each access has a reference to the node - you would have to go up in the tree to get the actual statement, but that should be easy enough).

E.g. for the next access a very rough outline (I am aware the devil is in the details):

  1. You collect all access of the routine/code blocks you are interested in.
  2. Then given a Reference, you find that reference in the VarAccessInfo.
  3. You search all following accesses in the VarAccessInfo:
    1. If this next accesses is a sibling (i.e. same parent), this should be the next access (I believe this should work with statements in if and else blocks, they are not siblings).
    2. If the next accesses is 'deeper' down in the tree, there is a control statement. Handle it as you outlined
    3. If the next accesses is 'higher' up in the tree, then the block you are looking for is inside a control statement and needs to be handled.

I would need some time to think this through if it covers all cases correctly.

Also, if statements can likely be simplified and then ignored: if a variable is used in the if and else block, it becomes an unconditional access (i.e. you get the if and else access as next statement, you don't need to search any further).

I have written a very simple dataflow creator (that does not take control flows into account), in case that this is helpful:

joerg@Joerg-Surface:~/work/psyclone$ cat dataflow.py 
#! /usr/bin/python3

from psyclone.core.access_type import AccessType
from psyclone.psyir.frontend.fortran import FortranReader
from psyclone.psyir.backend.fortran import FortranWriter
from psyclone.core import VariablesAccessInfo
from psyclone.psyir.nodes import Statement

def find_previous_write(var_info, location, read_var):
    '''Find the statement that previously writes
    to variable 'read_var' in the list of all accesses in var_info before
    the specified location.
    '''
    all_accesses = var_info[read_var].all_accesses
    i = len(all_accesses)-1
    while i >=0:
        if not all_accesses[i].access_type in (AccessType.all_write_accesses()):
            i -= 1
            continue
        if all_accesses[i].location < location:
            return all_accesses[i].node.ancestor(Statement)
        i -= 1
    return None

code = """
subroutine foo(a, b)
real, intent(inout) :: a
real, intent(inout) :: b
real :: c, d, e, f
c = a + 1.0
e = a**2
f = cos(e)
d = c + 2.0
c = d * a
b = c + d
call bar(c, b)
b = b + c
end subroutine foo
subroutine bar(x, y)
real, intent(in) :: x
real, intent(inout) :: y
!x = x + 1.0
y = exp(x**2)
end subroutine bar
"""
reader = FortranReader()
writer = FortranWriter()
psyir= reader.psyir_from_source(code)
varinfo = VariablesAccessInfo(psyir.children[0])

print("digraph {")

# Handle each variable
for var in varinfo:
    accesses = varinfo[var]
    for written in accesses.all_write_accesses:
        statement = written.node.ancestor(Statement)
        # Now get all variables used in this statement:
        all_accessed = VariablesAccessInfo(statement)
        for read_var in all_accessed:
            # Ignore the variable with the write access we
            # are currently looking at:
            if not all_accessed.is_read(read_var):
                continue
            # If we have a write access to a variabe, but it's not
            # the variable we are currently analysing, ignore it
            # (happens if we call a subroutine with several variables written)
            if all_accessed.is_written(read_var) and \
                read_var != var:
                continue
            # Now we have a variable that is read in the current
            # statement. Find if and where it was previously
            # written:
            prev = find_previous_write(varinfo, written.location, read_var)
            if prev is None:
                # No previous write found, just use the name of the var as node
                print(f'{read_var} -> "{writer(statement).strip()}" [label="{read_var}"]')
            else:
                print(f'"{writer(prev).strip()}" -> "{writer(statement).strip()}" [label="{read_var}"]')

print("}")

Create a graph using

./dataflow.py  >xx
dot -Tsvg  xx >xx.svg

Output: dataflow

sergisiso commented 2 months ago

Regarding "elseif", I think it is not relevant here, what @LonelyCat124 meant I believe is that we can guarantee (not optional) a read if it happens in both sides of a branch (body and else_body). The elseif is syntactic sugar for two nested ifs. The will act the same, as if you guarantee the two branches of the inner if touch it then you can continue to guarantee that the two outer paths of the if touch it.

Once we have all of the blocks before/after the Reference we care about finding the next/previous access(es) to, we can start merging the following basic blocks in until we find one that fully kills the reference, and then we can just return all of the first accesses to the relevant symbol/signature/? in each of the basic blocks that we needed to merge. In principle we could just compute basic blocks 1 by 1 until we have "killed" the accesses.

What I don't understand is this bit (although it may be better to see the implementation). But why do we need the concept of "basic blocks"? Can't we just search (deep-first) the subtree starting at that point for the next reference (until we find a if or loop).

Then I was going to say that this seems very similar to the references_acesses method, but @hiker beat me to it :)

sergisiso commented 2 months ago

@hiker I guess the question is what would reference_accesses do with if, loops and calls. Would it mark any call a "possible use" to a global variable? I think it does not, as it only returns references. I believe this is why we needed a new (slightly different) method.

hiker commented 2 months ago

@hiker I guess the question is what would reference_accesses do with if, loops and calls. Would it mark any call a "possible use" to a global variable? I think it does not, as it only returns references. I believe this is why we needed a new (slightly different) method.

Yes, atm a call only marks all actual parameters as read-write. Do we really want to mark all global variables as read/write? The driver extraction queries all functions called (based on static code analysis) and determines which ones are read or written, so it would be easy to collect all accesses and mark them up as they are actually used (ideally with an option to enable/disable this?)

I just want to avoid that we are re-inventing the wheel. E.g. to handle if statements, it might be sufficient to add a 'possible/certain' flag to each access. A write in an if or else body would be marked as possible, and only if writes in both trees happen the if statement could get the access marked as 'certain'. Then you can just select the next statement by collecting all followup writes until you find a 'certain' one.

LonelyCat124 commented 2 months ago

Yes, atm a call only marks all actual parameters as read-write. Do we really want to mark all global variables as read/write?

For what we're planning to use this for we think we need to because we don't know what happens inside a call (to non-locally declared variables as Fortran doesn't even have to follow its own intents) so we have to be safe for data dependencies for things on device which can just change out of context so we can't do things like asynchonous kernels.

I'll take a look at using the VariablesAccessInfo as part of this/the main part of this then - we need to be able to go from one node to many result nodes though which I didn't think was currently supported.

hiker commented 2 months ago

VarInfo stores all accesses of a set of variables, and for each accesses also a pointer to the original PSyIR node (typically the reference, I think the one exception being loops, since the loop variable is only stored as symbol).

For calls, check CallTreeUtils().get_in_out_parameters(node_ist, collect_non_local_symbols=True), that will return a list of all variables accessed in the a call node, including the module name from where they are (within the limits of static analysis, e.g. procedure pointer, OO overwriting of methods. Generic interfaces atm. return the union of all possible calls, which we might improve at some stage I hope)

LonelyCat124 commented 2 months ago

The main thing with calls (and nested call trees and so on) is that we found there's no requirement for a nested call to follow intent rules of their parents and compilers allow this so we have to handle it somehow (regardless of whether its technically compliant with the standard or not). Mostly we don't want to have to delve down nested call trees to find this info right now because the use-cases should be ok to allow Call nodes to be inouts for all non-local variables without causing massive performance issues. Its definitely not optimal and is being overly cautious but it will be correct and we think produce performant-enough Fortran for NG-Arch, and we can improve it later if we need to.

hiker commented 2 months ago

Sure. FWIW, the CallTreeUtil do not rely on intent declaration, they use the module manager to find the source code and analyse it. Well, try to ... it's sometimes a problem with UM since it doesn't follow the coding style. Andy has done a great job in making this code more flexible, but I think I am still now and again seeing issues with psyclone not finding the right file for a module. I have a better solution for that planned (#2681 - while it puts fab in the title, it can work without fab).