stfc / PSyclone

Domain-specific compiler and code transformation system for Finite Difference/Volume/Element Earth-system models in Fortran
BSD 3-Clause "New" or "Revised" License
102 stars 27 forks source link

Prototype new API for NEMO #27

Closed arporter closed 5 years ago

arporter commented 7 years ago

Discussions with NEMO developers have highlighted the fact that they really do not want to change their source code. In this issue we will investigate the feasibility (or otherwise) of adapting PSyclone to process 'raw' Fortran conforming to the NEMO coding standards.

arporter commented 7 years ago

I've created branch nemo_trial_api for this work.

arporter commented 7 years ago

Without the PSyKAl separation of concerns we have no PSy layer to generate. However, this also means that we can't apply transformations to the PSy layer and that is currently central to what PSyclone does. We will need to generate a Schedule for the single layer that we now have.

arporter commented 7 years ago

In PSyKAl, it is the PSy layer that contains loops over the mesh and all code related to parallelism. Therefore, it may be better to think of the NEMO code as a hand-written PSy layer from which we will construct a representation for manipulation by PSyclone.

arporter commented 7 years ago

I've added the tra_adv.F90 kernel from github.com/arporter/NEMO-DSL under examples/nemo/eg1. My parse_nemo() routine identifies 'kernels' (2D or 3D loop nests) and replaces the associated objects in the AST with a new, NEMOKernel object. This incorporates both the loop structure and the Fortran statements contained within the loop.

arporter commented 7 years ago

Work on NEMOKernel so that it now stores the loop bounds and the AST objects associated with the body of the innermost loop. Improved the tofortran method so that it now outputs correct Fortran for both the DO loops and the loop body.

arporter commented 7 years ago

At this point I can take the AST produced by fparser2 and replace the loop nests with my own kernel objects. However, I don't yet do any processing of the contents of the loop nest.

In order to produce correct OpenMP (for instance), I need to know the 'intents' of the various variables used within the loop.

arporter commented 7 years ago

Code now uses Habakkuk to identify those variables which are shared and those which are private (in an OpenMP context).

To progress this any further we need to actually generate additional Fortran in the form of OpenMP directives around each kernel.

We also need to add support for the use of array syntax which is very common in NEMO. e.g.:

zwx(:,:,1) = 0.0e0

or,

DO jk = 2, jpk-1   
   zwx(:,:,jk) = tmask(:,:,jk) * ( mydomain(:,:,jk-1) - mydomain(:,:,jk) )
END DO

This could be done by as a first pass over the code/AST with any such examples replaced with explicit DO loops.

arporter commented 7 years ago

In order to transform code in the way that PSyclone does, we need a higher-level AST. In particular, the one produced by fparser2 has no information on child->parent relationships. (i.e. it's not possible to go back up the tree, only down). Ideally we need some structure that is in some way consistent with the existing PSyclone Transformations support.

rupertford commented 7 years ago

I separated PSyclone and fparser1 from each other using fgenerator. The main reason for this was that PSyclone primarily needed to generate code and fparser1 did not support this, although fgenerator also includes rudimentary support for code modification (as that is all we needed) and I always thought of fgenerator as being a code generator and a code modifier.

I think the fgenerator approach is quite a powerful solution and always thought that we would also end up using fparser2 via fgenerator for both code generation and code manipulation.

Essentially fgenerator provides (or at least should provide) the higher level ast you are talking about. It also isolates you from a particular implementation. Having said that, I think there would be a reasonable amount of design and implementation to get a single fgenerator api working for both existing and new code (at least I'm pretty sure that it is difficult for fparser1, but perhaps it would be easier with fparser2).

So, in short, I am suggesting we use your requirements in this issue to inform how we might extend fparser to support what you need.

arporter commented 7 years ago

Code now produces a (crude) high-level view of the tracer kernel:

NEMOSchedule[]
    CodeBlock[<class 'fparser.Fortran2003.Call_Stmt'>]
    NEMOKern3D[]
    CodeBlock[<class 'fparser.Fortran2003.Assignment_Stmt'>]
    NEMOKern2D[]
    Loop[type='levels',field_space='None',it_space='None']
        CodeBlock[<class 'fparser.Fortran2003.Assignment_Stmt'>]
    CodeBlock[<class 'fparser.Fortran2003.Call_Stmt'>]
    Loop[type='tracers',field_space='None',it_space='None']
        NEMOKern3D[]
        CodeBlock[<class 'fparser.Fortran2003.Assignment_Stmt'>]
        NEMOKern3D[]
        CodeBlock[<class 'fparser.Fortran2003.Assignment_Stmt'>]
        NEMOKern3D[]
        NEMOKern3D[]
        NEMOKern3D[]
        NEMOKern3D[]
        CodeBlock[<class 'fparser.Fortran2003.Assignment_Stmt'>]
        Loop[type='levels',field_space='None',it_space='None']
            CodeBlock[<class 'fparser.Fortran2003.Assignment_Stmt'>]
        CodeBlock[<class 'fparser.Fortran2003.Assignment_Stmt'>]
        NEMOKern3D[]
        NEMOKern3D[]
        CodeBlock[<class 'fparser.Fortran2003.Assignment_Stmt'>]
        NEMOKern3D[]
        CodeBlock[<class 'fparser.Fortran2003.Assignment_Stmt'>]
        NEMOKern3D[]
    CodeBlock[<class 'fparser.Fortran2003.Call_Stmt'>]
    Loop[type='levels',field_space='None',it_space='None']
        Loop[type='lat',field_space='None',it_space='None']
            Loop[type='lon',field_space='None',it_space='None']
                CodeBlock[<class 'fparser.Fortran2003.Write_Stmt'>]
    CodeBlock[<class 'fparser.Fortran2003.Close_Stmt'>]
arporter commented 7 years ago

Begin work on generating Kernels from implicit loops (that use Fortran array syntax). This now gives:

NEMOSchedule[]
    CodeBlock[<class 'fparser.Fortran2003.Call_Stmt'>]
    NEMOKern[3D]
    CodeBlock[<class 'fparser.Fortran2003.Call_Stmt'>]
    NEMOKern[2D]
    CodeBlock[<class 'fparser.Fortran2003.Call_Stmt'>]
    Loop[type='levels',field_space='None',it_space='None']
        CodeBlock[<class 'fparser.Fortran2003.Assignment_Stmt'>]
    CodeBlock[<class 'fparser.Fortran2003.Call_Stmt'>]
    Loop[type='tracers',field_space='None',it_space='None']
        NEMOKern[3D]
        NEMOKern[Implicit]
        NEMOKern[Implicit]
        NEMOKern[3D]
        NEMOKern[Implicit]
        NEMOKern[Implicit]
        NEMOKern[3D]
        NEMOKern[3D]
        NEMOKern[3D]
        NEMOKern[3D]
        NEMOKern[Implicit]
        NEMOKern[Implicit]
        Loop[type='levels',field_space='None',it_space='None']
            NEMOKern[Implicit]
        NEMOKern[Implicit]
        NEMOKern[3D]
        NEMOKern[3D]
        NEMOKern[Implicit]
        CodeBlock[<class 'fparser.Fortran2003.Assignment_Stmt'>]
        NEMOKern[3D]
        CodeBlock[<class 'fparser.Fortran2003.Assignment_Stmt'>]
        NEMOKern[3D]
        CodeBlock[<class 'fparser.Fortran2003.Assignment_Stmt'>]
    CodeBlock[<class 'fparser.Fortran2003.Call_Stmt'>]
    Loop[type='levels',field_space='None',it_space='None']
        Loop[type='lat',field_space='None',it_space='None']
            Loop[type='lon',field_space='None',it_space='None']
                CodeBlock[<class 'fparser.Fortran2003.Write_Stmt'>]
    CodeBlock[<class 'fparser.Fortran2003.Call_Stmt'>]

This reveals one issue - the fact that we have an implicit i-j loop within an explicit loop over levels. This currently results in us identifying a kernel within the loop over levels. In actuality, this should be a 3D kernel.

arporter commented 6 years ago

I think the next step is to move closer to doing things 'properly' by introducing tests for the NEMO 0.1 API...

arporter commented 6 years ago

Have begun adding tests. Coverage is 76%. Am starting to think about code generation which is the nub of this issue. In all of the other APIs, the gen_code() method of a given class adds nodes into the f2pygen AST (e.g. DeclarationGen). These nodes will generate the necessary Fortran. However, the NEMO API uses fparser2 to parse the supplied code and the content of a NEMOKern is stored as a series of fparser2 objects (typically Fortran2003.Assignment_Stmts or similar). We currently have no way of generating f2pygen objects from these fparser2 objects.

arporter commented 6 years ago

I can imagine implementing some sort of load method for e.g. f2pygen.AssignGen that takes a Fortran2003.Assignment_Stmt as argument and uses it to populate the object. This is attractive in the short term because the other objects we are using (in the NEMO work) are already based on f2pygen (e.g. the Loop objects). However, it feels like a backwards way of doing things - ideally we would be moving towards having everything in terms of fparser2 and phasing-out use of fparser. This would mean re-implementing f2pygen using fparser2 classes. f2pygen is currently 920 lines long.

arporter commented 6 years ago

The prototype code that I have so far currently makes no attempt to reproduce the enclosing Program or Module structure that the input Fortran had. We will need to do this as we want to manipulate existing code rather than create something from scratch.

arporter commented 6 years ago

PSyclone transformations work on the Schedule associated with an Invoke. This is why we construct a higher-level AST than that produced by the parser.

arporter commented 6 years ago

Started to try and make a new AST using fparser2 objects. However, the __new__ method requires an instance of the parser with a suitable Fortran string. This explains why the existing f2pygen library creates objects in the way that it does. I could copy what is done there for the time being but it would be nicer if I didn't have to do that. Could I add an init method that does what I want? When is __init__ called as opposed to __new__?

arporter commented 6 years ago

Altered translate_ast() so that it updates nodes in the fparser2 AST with parent information. This then permits us to modify that AST, e.g. when inserting an OMP directive. I've prototyped this kind of thing by inserting an IF-THEN block around a loop nest. fparser2 currently has no support for Comments and thus no support for Directives.

arporter commented 6 years ago

I'm envisaging a document-view type of pattern where the (slightly enhanced) fparser2 AST is the Document and the higher-level Schedule manipulated by PSyclone is a View. The generated Fortran code would then be another View of the same Document.

arporter commented 6 years ago

Closing this issue since we have decided to adopt CLAW for performing manipulation of existing Fortran code (#114).

arporter commented 6 years ago

In the XCodeML version, I implemented new, gen_xml() methods that effectively updated the (XML) Schedule to match the transformed schedule. (Normally we call gen_code() because we are creating a Fortran AST from scratch.) In this version I need to trigger an update to the stored fparser2 AST.

arporter commented 6 years ago

In a flash of inspiration I've named the new method update(). For the majority of Nodes it will do nothing but for those inserted by a Transformation it needs to modify the underlying fparser2 AST. I have successfully implemented this for !$omp parallel do (although I don't yet handle the private/shared variables).

arporter commented 6 years ago

I've begun bringing the existing tests up-to-date. This mainly means fixing python3 issues. I also need to bring Habakkuk up-to-date as that's used for kernel analysis.

arporter commented 6 years ago

Implicit loops are tricky. Should I modify the fparser2 AST as I go and turn it into an explicit loop while building the PSyclone Schedule or should I leave modifying the actual AST until code-generation time? Either way I will have to modify the AST so I might as well implement that now.

arporter commented 6 years ago

All tests now pass on my machine. They won't on Travis though because this branch needs the master branch of fparser.

arporter commented 6 years ago

I've added in traldf_iso as a second example. Although this runs through OK it does reveal a few issues:

arporter commented 6 years ago

(Some of) the code that demonstrates these issues is:

  DO jk = 1, jpkm1
    DO jj = 1, jpj, 1
      DO ji = 1, jpi, 1
        zdk1t(ji, jj) = (ptb(ji, jj, jk, jn) - ptb(ji, jj, jk + 1, jn)) * wmask(ji, jj, jk + 1)
      END DO
    END DO
    IF (jk == 1) THEN
      zdkt(:, :) = zdk1t(:, :)
    ELSE
      zdkt(:, :) = (ptb(:, :, jk - 1, jn) - ptb(:, :, jk, jn)) * wmask(:, :, jk)
    END IF
    DO jj = 1, jpjm1
      DO ji = 1, fs_jpim1
        zabe1 = pahu(ji, jj, jk) * e2_e1u(ji, jj) * e3u_n(ji, jj, jk)
        zmsku = 1. / MAX(wmask(ji + 1, jj, jk) + wmask(ji, jj, jk + 1) + wmask(ji + 1, jj, jk + 1) + wmask(ji, jj, jk), 1.)
        zcof1 = - pahu(ji, jj, jk) * e2u(ji, jj) * uslp(ji, jj, jk) * zmsku
        zftu(ji, jj, jk) = (zabe1 * zdit(ji, jj, jk) + zcof1 * (zdkt(ji + 1, jj) + zdk1t(ji, jj) + zdk1t(ji + 1, jj) + zdkt(ji, jj))) * umask(ji, jj, jk)
      END DO
    END DO
    DO jj = 2, jpjm1
      DO ji = fs_2, fs_jpim1
        pta(ji, jj, jk, jn) = pta(ji, jj, jk, jn) + zsign * (zftu(ji, jj, jk) - zftu(ji - 1, jj, jk) + zftv(ji, jj, jk) - zftv(ji, jj - 1, jk)) * r1_e1e2t(ji, jj) / e3t_n(ji, jj, jk)
      END DO
    END DO
  END DO

The most obvious problem that this code illustrates is that we identify the body of the triply-nested loop as a kernel when it is not because it is not perfectly nested.

arporter commented 6 years ago

Actually, that was not the problem (the issue was my lazy identification of loops to which to apply OpenMP to in the example script). The real issue is the kernels contained within the if-block. In order to reason about the Schedule here (and when looking at Algorithms in LFRic), we need to have if-blocks as recognised entities within our representation. I've therefore begun adding support for this.

arporter commented 6 years ago

I've moved the business of creating the PSyclone AST into a single, ASTProcessor mix-in class. This provides the process_nodes method that can be used wherever needed. At the moment this has to work-around the fact that the fparser2 AST contains no parent information.

rupertford commented 5 years ago

PR #209 has been merged to master. Closing this issue.