stfc / PSyclone

Domain-specific compiler and code transformation system for Finite Difference/Volume/Element Earth-system models in Fortran
BSD 3-Clause "New" or "Revised" License
104 stars 27 forks source link

(Closes #2125) Fix bugs in SymbolTable deep copy and InlineTrans when array bounds reference formal arg #2633

Closed arporter closed 2 months ago

arporter commented 3 months ago

Fixes bugs in SymbolTable deep copy for for symbol initial values and shape specification. Also fixes a bug in InlineTrans when the lower bound of an array formal argument is specified by a reference to another formal argument. Finally, adds the force option to InlineTrans to permit it to ignore CodeBlocks within the target routine.

codecov[bot] commented 3 months ago

Codecov Report

All modified and coverable lines are covered by tests :white_check_mark:

Project coverage is 99.85%. Comparing base (029f69f) to head (565c2c3).

Additional details and impacted files ```diff @@ Coverage Diff @@ ## master #2633 +/- ## ========================================== - Coverage 99.86% 99.85% -0.02% ========================================== Files 352 352 Lines 48512 47308 -1204 ========================================== - Hits 48447 47239 -1208 - Misses 65 69 +4 ```

:umbrella: View full report in Codecov by Sentry.
:loudspeaker: Have feedback on the report? Share it here.

arporter commented 3 months ago

This is ready for a first look now. One for @sergisiso, @hiker or @LonelyCat124. I've triggered the integration tests.

arporter commented 3 months ago

(I am wondering whether having the "force" option to InlineTrans mean "ignore CodeBlocks" is a good idea: perhaps it should be more specific such as "permit-codeblocks"?)

arporter commented 3 months ago

Note to self: there's inevitably going to be missed coverage and I need to move quite a lot of testing out of symboltable_test and into individual tests of the various new relink methods.

arporter commented 3 months ago

The visitor approach works out very neatly but I take your point about decoupling this functionality from the Nodes themselves. There's also quite a lot to be said for keeping the same method name for datatypes and nodes - it will make the code a bit clearer I think. Although it would be nice to remove the table argument for the Node method I don't think this is practical here as all the Nodes we care about are 'dangling' (in the shape of ArrayType or initial_value expressions of Symbols).

I'll try my 2nd bullet point then... :-)

arporter commented 3 months ago

The Node.update_symbols_from(table) approach was almost as neat as the visitor so I've gone with it and renamed all the relink methods accordingly.

arporter commented 2 months ago

Just need to tidy up/remove the SymbolTable tests now that functionality has been removed from the deep_copy method.

arporter commented 2 months ago

I think this is ready for another look now. It has grown a lot so probably needs something of a fresh start.

arporter commented 2 months ago

I need to update the tests for Loop to explicitly cover the new replace_symbols_using method.

arporter commented 2 months ago

Failure is in LFRic adjoint example, poly1d_w3_reconstruction_kernel_mod. This has a loop with variable stencil and when we attempt to update this we are accidentally trying to use the symbol of the same name that is used in the metadata (i.e. imported from argument_mod). I need to see why the new symbol table doesn't have the local stencil definition in it.

It's because it is the wrong symbol table - replace_symbols_from itself recurses and so is being called on Loop but with the symbol table of the parent scoping node.

sergisiso commented 2 months ago

Failure is in LFRic adjoint example, poly1d_w3_reconstruction_kernel_mod. This has a loop with variable stencil and when we attempt to update this we are accidentally trying to use the symbol of the same name that is used in the metadata (i.e. imported from argument_mod). I need to see why the new symbol table doesn't have the local stencil definition in it.

It's because it is the wrong symbol table - replace_symbols_from itself recurses and so is being called on Loop but with the symbol table of the parent scoping node.

I was expecting this to happen and was surprised that we didn't hit this sooner, I guess loop variables is the only use case where we use nested scope variables in practice (and it needs a clash to fail because not founds are not updated). But I think this is good because fixing it will (I think) make trivial to solve #2424

I agree the solution is to change the symbol table to the inner scope when reversing symbols. But currently you are doing it at _refine_copy which means that other calls to replace_symbols_using will still have this problem. I am wondering if this behaviour is expected always for replace_symbols_using? So, if we have:

subrotine test
   integer :: a
   integer :: b = 1

  if condition
      ! PSyIR declares a shadowed locally-scoped a'
      a' = 1
      if condition2
          ! PSyIR declares a shadowed locally-scoped b'
          b' = 2
          a = a' + b'
     end if
  end if
end subroutine test

and we do, routine.replace_symbols_using(SymbolTable<contains a and b>) I would not expect this to change a' and b' to use the routine-scoped versions, as this will change the meaning of the code.

This can be done overriding the ScopingNode::replece_symbols_using, instead of modifying its _refine_copy

def replace_symbols_using(self, table):
    for child in self.children:
         child.replace_symbols_using(self.symbol_table)

What do you think?

Note that in my example I cheekily added an statement containing both a and a'. Neither what I am proposing nor the current solution solves this. And a copy will change the semantics of this PSyIR tree! This case is rare, and the PR big enough, so it does not need to be solved here, but if it is not at least it should be documented.

arporter commented 2 months ago

Finally, this is ready for another look. It's certainly nicer than it was before after the latest update so thanks for the suggestions.

sergisiso commented 2 months ago

Also check if the sphinx-link-check issue is relevant and launch an integration test since I see none of the last ones were complete.

arporter commented 2 months ago

Link check failure seems to have sorted itself out. I've triggered the integration tests again. Ready for another look, CI-permitting.

arporter commented 2 months ago

LFRic 'everything-everywhere-all-at-once' integration test failed due to hitting max. recursion depth. Problem occurs in backend when processing (inlined?) kernel mm_diagonal_kernel_code_r_double for reconstruct_w3_field_alg_mod_psy.

arporter commented 2 months ago

The problem was that in Reference.replace_symbols_using() I had added a call to replace_symbols_using on the Symbol itself. This then hit an infinite recursion if the Symbol had an initial value defined by a PSyIR expression. Since it is the responsibility of the SymbolTable to update its symbols, I've simply removed the offending call and updated a failing test.

arporter commented 2 months ago

Integration tests were green and coverage is all OK. Really ready for review...