stfc / fparser

This project maintains and develops a Fortran parser called fparser2 written purely in Python which supports Fortran 2003 and some Fortran 2008. A legacy parser fparser1 is also available but is not supported. The parsers were originally part of the f2py project by Pearu Peterson.
https://fparser.readthedocs.io
Other
64 stars 29 forks source link

(Closes #313) permit intrinsic shadowing #418

Closed arporter closed 10 months ago

arporter commented 1 year ago

Looking at the test failures and (lack of) code changes, it's clear that this branch needs refining so that it only permits a mismatch in arguments if an intrinsic has been (or could be?) overridden.

arporter commented 1 year ago

The trouble is that (in NEMOVAR), there isn't always an explicit import of the overridden intrinsic. Obviously, if we find a wildcard import and the tentative intrinsic match has an invalid number of arguments then we know it can't be an intrinsic and must be from elsewhere. However, if it happens to have the same number of arguments as the intrinsic then we will always match an Intrinsic, even if those arguments are of the wrong type. (In future we could check the type of the arguments.)

codecov[bot] commented 1 year ago

Codecov Report

All modified and coverable lines are covered by tests :white_check_mark:

Comparison is base (5c3808d) 91.96% compared to head (f7ebb67) 91.99%.

Additional details and impacted files ```diff @@ Coverage Diff @@ ## master #418 +/- ## ========================================== + Coverage 91.96% 91.99% +0.02% ========================================== Files 85 85 Lines 13644 13681 +37 ========================================== + Hits 12548 12586 +38 + Misses 1096 1095 -1 ```

:umbrella: View full report in Codecov by Sentry.
:loudspeaker: Have feedback on the report? Share it here.

arporter commented 1 year ago

This has been needed for NEMOVAR for a while: currently we are over zealous in checking the number of arguments to intrinsics and fell foul of the fact that NEMOVAR overrides DOT_PRODUCT with something that can accept more args than the Fortran intrinsic can. This PR alters the parser's behaviour so that if the number of arguments don't match and there's somewhere that a symbol could be coming from then we don't match an intrinsic. I had to extend SymbolTable with a new wildcard_imports property to do this. One for @sergisiso or @rupertford.

arporter commented 1 year ago

I hadn't considered submodules so thanks for spotting that. Having done some reading I see that (as you'd expect) they have access to all the symbols made available by the 'parent' module. I'll have to allow for that.

arporter commented 1 year ago

I've hit a problem with submodules which I think will be common to any situation where we need to do a context-sensitive match. At the moment, any match method knows only the string it is trying to match - it doesn't know what node (if any) will be the parent of the one it is creating. This is a problem here because I was going to allow intrinsic shadowing in a submodule on the basis that we assume the Fortran is valid and therefore the necessary symbol must be being brought into scope in the parent module (which we can't see because it is in a separate file).

arporter commented 1 year ago

I can solve my immediate problem by extending the SymbolTable so that (as in the PSyIR) it keeps a reference to the node in the parse tree with which it is associated. This is probably not a bad thing to do and is considerably lighter weight than changing the way we construct nodes. I'll do it as a separate PR though so we can have a proper discussion.

rupertford commented 1 year ago

Would your symbol table implementation help here? I'm not sure how hard it would be to create a list of nodes during a hierarchical match. Give the current implementation I would expect it to be difficult - assuming that is what you are after.

arporter commented 1 year ago

It will definitely let me determine whether we are currently within a submodule - I had a little play and it was easy. Really we need an ancestor method too but that's just syntactic sugar.

rupertford commented 1 year ago

Great. I think the problem with ancestor is that we create the nodes as we recurse back up, not down - but perhaps there is a simple solution.

arporter commented 1 year ago

Great. I think the problem with ancestor is that we create the nodes as we recurse back up, not down - but perhaps there is a simple solution.

Ah! You are of course correct. I only checked for the immediate parent (which works) but the grandparent isn't created yet :-(

arporter commented 1 year ago

Perhaps we revert to the assumption that the Fortran is correct so that if the number of arguments is invalid then we simply don't match an Intrinsic?

rupertford commented 1 year ago

When would this allow incorrect code to get through?

arporter commented 1 year ago

When would this allow incorrect code to get through?

If the Fortran code was incorrect (so meant to call an intrinsic but supplied the wrong number of args) then we wouldn't spot it - it would get matched as a call or array access instead.

Since we don't check argument types, we will also incorrectly match an intrinsic call when a Fortran compiler would complain if the wrong type was passed to an intrinsic.

rupertford commented 1 year ago

And why won't the symbol table approach help as we build this as we go along so should see some info? We could be less clever for submodules. We could have a switch which says whether to try to determine shadowing or not.

arporter commented 1 year ago

If we are parsing the following code:

  submodule (foobar) bar

    contains
    subroutine my_sub
      real :: a
      a = dot_product(1,2,3)
    end subroutine my_sub
  end

then at the point when we are trying to match dot_product, the SymbolTable approach means that we know we are within a subroutine but that's all we know. To know more than that we probably need some new data structure that keeps track of our current position in the matching hierarchy (i.e. the chain of classes we are currently trying to match). That's probably not that hard to do but feels like possibly a retrograde step c.f. #191.

There is another alternative and that is to move towards what a compiler proper would do by resolving all Symbols. In this case, we would need to find the foobar module and construct its symbol table.

rupertford commented 1 year ago

Hi @arporter, some thoughts.

Could we not have some more context from the symbol table by capturing more within it, i.e. know that the subroutine is within a submodule, or do we not know that yet due to the recursion? I'm not saying it would solve the problem, I was just wondering.

I agree the proposal in in #191 sounds great, it is just a question of effort (and trust in the new implementation I guess). Do you think that it would make sense for us to take this on, given there is no funding associated with this?

We could optionally require module information (in some standard form) from other source code if it could not be found in addition to the current implementation I suppose, but I don't think it should be mandatory otherwise we can no longer parse any code in isolation, which is the main use of the parser.

arporter commented 1 year ago

Good news! I must have got confused when I said that storing the scoping node in the SymbolTable didn't help because I was wrong. I've also added a method to the SymbolTable that reports whether it's possible for a Symbol to be brought into scope from elsewhere. With those changes I have the tests working, including the one with a submodule.

arporter commented 1 year ago

This PR has got somewhat larger: the SymbolTable now keeps a reference to the parse tree node with which it is associated. I've also extended the 'lookup' functionality so that it includes symbols imported from other modules. I've also added a "all_symbols_resolved" query method that establishes whether any symbols could be being brought into scope from elsewhere. Ready for another look.

arporter commented 12 months ago

@rupertford happily (for me) it turns out this is in fact waiting on a review from yourself :-)

arporter commented 11 months ago

Ready for another look now...

arporter commented 10 months ago

Thanks for spotting that missed line - I'd been fooled by Codecov reporting success. Happily that line was associated with #170 (which is now fixed) and so I've been able to simplify it. Ready for another look.