stfc / PSyclone

Domain-specific compiler and code transformation system for Finite Difference/Volume/Element Earth-system models in Fortran
BSD 3-Clause "New" or "Revised" License
107 stars 28 forks source link

PSyIR Next Steps #723

Open sergisiso opened 4 years ago

sergisiso commented 4 years ago

This is an overview issue with multiple ideas we have recently talked about regarding PSyIR. I think it is useful to consider them in conjunction in this issue but of course each of the changes needs it's own smaller issue number.

At the moment we have the following node hierarchy (| - ) and each support the children as indicated by (<-).

Node
| - Container <- Unordered list of [Container | KernelSchedule]*
| - Schedule  <- OrderedList of [Statements]* (Or CFG of Statements?)
    | - InvokeSchedule
    | - KernelSchedule
| - Statements
    | - Assignment <- LHS: DataNode, RHS: DataNode
    | - Return
    | - IfBlock <- condition: DataNode, ifbody: Schedule, elsebody: Schedule
    | - Loop <- start: DataNode, stop: DataNode, step: DataNode, body: Schedule
    | - Psy Data Node <- body: Schedule
        | - Extract Node
        | - Profile Node
    | - Kern
        |- CodedKern
        |- InlinedKern
        |- BuiltIn
    | - Directive
    | - GlobalSum
    | - HaloExchange
| - DataNode
    | - Literal
    | - Operation
        | - UnaryOperations <- operand1: DataNode
        | - BinaryOperations <- operand1: DataNode, operand2: DataNode
        | - NaryOperations <-  [operand: DataNode]+
    | - Reference
        | - ArrayReference <- Ordered list of [subscripts: DataNode | Range]+

(inherits from Statement and Datanode)
    | - Codeblock 
(?)
    | - Range <- start: DataNode, stop: DataNode, step: DataNode

Several things need to be thought out or updated:

  1. Parent-children relationship #294: 1.1. We need DataNode and Statement abstract classes to simplify parent-child type checking. 1.2. Statements can only be child of Schedule simplifies things. (Dag and dependence analysis done just in this level?) 1.3. Specify which are the valid children type for each node. 1.4. We have the create() method for bottom-up creation. Some __init__ also allow bottom-up with the 'children=' parameter. Should this parameter be universal, or removed? #645. 1.5. If connections where updated at the same time, how should @parent.setter, addchild(), @children.setter behave? Steal children? Fail if already has a connection? #294

  2. High-level IR (more abstract psyclone concepts) / language-level psyir (what a generic backend can convert to code) 2.1. We need function/subroutine calls. Then Kern, GlobalSum and HaloExchange will be lowered to a call. 2.2. Decide which subset of generic language level nodes that all backends need to implement. We need a pass/transformation/backend to lower any high-level concept to language level IR.

  3. Multi-level symbol tables. 3.1. A symbol table per schedule? Then 1.2 helps. Move statement in some schedule -> no symtab updates, move statement to inner or outer schedule -> requires symtab updates. 3.2. I like the MLIR concept of IsolatedFromAbove Symbols. (Not a dependency in the parents schedule dag) 3.3. Use (unnamed?) symbols tagged @loop_var @omp_idx which are marked as IsolatedFromAbove. The backend can then reuse a top-level var to produce more readable code used inside each sub-schedule. 3.4. Should symbols not isolatedFromAbove be copied in the inner symbol table/ be a reference to the outer symbol table/ not be in the symbol table and the search should recurse up to the next symbol table/ have an aggregated view of all symbol table hierarchy?

  4. I am not sure what Directive is as it looks like an statement, but it can never be separated from its companion statement (in fact it loses its meaning without it). Also it seems back-end specific instead of a concept (e.g. #omp parallel for \ for(…){…} is the same concept as Kokkos :: parallel_for ( … , …);.

  5. Continue move from gen_code() to back-end, when (2) is done this should be easier. We need to solve #523 and see if a container other than raw language arrays is more appropriate for C++ PSyIR arrays.

arporter commented 4 years ago

1.3 - I'm happy with this suggestion although the PsyData functionality will require some refactoring.

Instead of 'high-level' PSyIR, how about 'domain-specific'?

rupertford commented 4 years ago

PR #726 addressed some of the issues raised in this issue but not all of them. Therefore I'm reopening this issue (which was automatically closed when I merged the PR).

sergisiso commented 4 years ago

@rupertford I probably should have created a more specific sub-issue for PR #726 but anyway I will keep track of what have already been done by striking the enumeration points in the issue description that are already implemented in master.