Closed arporter closed 10 months ago
The trouble is that (in NEMOVAR), there isn't always an explicit import of the overridden intrinsic. Obviously, if we find a wildcard import and the tentative intrinsic match has an invalid number of arguments then we know it can't be an intrinsic and must be from elsewhere. However, if it happens to have the same number of arguments as the intrinsic then we will always match an Intrinsic, even if those arguments are of the wrong type. (In future we could check the type of the arguments.)
All modified and coverable lines are covered by tests :white_check_mark:
Comparison is base (
5c3808d
) 91.96% compared to head (f7ebb67
) 91.99%.
:umbrella: View full report in Codecov by Sentry.
:loudspeaker: Have feedback on the report? Share it here.
This has been needed for NEMOVAR for a while: currently we are over zealous in checking the number of arguments to intrinsics and fell foul of the fact that NEMOVAR overrides DOT_PRODUCT with something that can accept more args than the Fortran intrinsic can. This PR alters the parser's behaviour so that if the number of arguments don't match and there's somewhere that a symbol could be coming from then we don't match an intrinsic. I had to extend SymbolTable with a new wildcard_imports
property to do this.
One for @sergisiso or @rupertford.
I hadn't considered submodules so thanks for spotting that. Having done some reading I see that (as you'd expect) they have access to all the symbols made available by the 'parent' module. I'll have to allow for that.
I've hit a problem with submodules which I think will be common to any situation where we need to do a context-sensitive match. At the moment, any match
method knows only the string it is trying to match - it doesn't know what node (if any) will be the parent of the one it is creating. This is a problem here because I was going to allow intrinsic shadowing in a submodule on the basis that we assume the Fortran is valid and therefore the necessary symbol must be being brought into scope in the parent module (which we can't see because it is in a separate file).
I can solve my immediate problem by extending the SymbolTable so that (as in the PSyIR) it keeps a reference to the node in the parse tree with which it is associated. This is probably not a bad thing to do and is considerably lighter weight than changing the way we construct nodes. I'll do it as a separate PR though so we can have a proper discussion.
Would your symbol table implementation help here? I'm not sure how hard it would be to create a list of nodes during a hierarchical match. Give the current implementation I would expect it to be difficult - assuming that is what you are after.
It will definitely let me determine whether we are currently within a submodule
- I had a little play and it was easy. Really we need an ancestor
method too but that's just syntactic sugar.
Great. I think the problem with ancestor is that we create the nodes as we recurse back up, not down - but perhaps there is a simple solution.
Great. I think the problem with ancestor is that we create the nodes as we recurse back up, not down - but perhaps there is a simple solution.
Ah! You are of course correct. I only checked for the immediate parent (which works) but the grandparent isn't created yet :-(
Perhaps we revert to the assumption that the Fortran is correct so that if the number of arguments is invalid then we simply don't match an Intrinsic?
When would this allow incorrect code to get through?
When would this allow incorrect code to get through?
If the Fortran code was incorrect (so meant to call an intrinsic but supplied the wrong number of args) then we wouldn't spot it - it would get matched as a call or array access instead.
Since we don't check argument types, we will also incorrectly match an intrinsic call when a Fortran compiler would complain if the wrong type was passed to an intrinsic.
And why won't the symbol table approach help as we build this as we go along so should see some info? We could be less clever for submodules. We could have a switch which says whether to try to determine shadowing or not.
If we are parsing the following code:
submodule (foobar) bar
contains
subroutine my_sub
real :: a
a = dot_product(1,2,3)
end subroutine my_sub
end
then at the point when we are trying to match dot_product
, the SymbolTable approach means that we know we are within a subroutine but that's all we know. To know more than that we probably need some new data structure that keeps track of our current position in the matching hierarchy (i.e. the chain of classes we are currently trying to match).
That's probably not that hard to do but feels like possibly a retrograde step c.f. #191.
There is another alternative and that is to move towards what a compiler proper would do by resolving all Symbols. In this case, we would need to find the foobar
module and construct its symbol table.
Hi @arporter, some thoughts.
Could we not have some more context from the symbol table by capturing more within it, i.e. know that the subroutine is within a submodule, or do we not know that yet due to the recursion? I'm not saying it would solve the problem, I was just wondering.
I agree the proposal in in #191 sounds great, it is just a question of effort (and trust in the new implementation I guess). Do you think that it would make sense for us to take this on, given there is no funding associated with this?
We could optionally require module information (in some standard form) from other source code if it could not be found in addition to the current implementation I suppose, but I don't think it should be mandatory otherwise we can no longer parse any code in isolation, which is the main use of the parser.
Good news! I must have got confused when I said that storing the scoping node in the SymbolTable didn't help because I was wrong. I've also added a method to the SymbolTable that reports whether it's possible for a Symbol to be brought into scope from elsewhere. With those changes I have the tests working, including the one with a submodule.
This PR has got somewhat larger: the SymbolTable now keeps a reference to the parse tree node with which it is associated. I've also extended the 'lookup' functionality so that it includes symbols imported from other modules. I've also added a "all_symbols_resolved" query method that establishes whether any symbols could be being brought into scope from elsewhere. Ready for another look.
@rupertford happily (for me) it turns out this is in fact waiting on a review from yourself :-)
Ready for another look now...
Thanks for spotting that missed line - I'd been fooled by Codecov reporting success. Happily that line was associated with #170 (which is now fixed) and so I've been able to simplify it. Ready for another look.
Looking at the test failures and (lack of) code changes, it's clear that this branch needs refining so that it only permits a mismatch in arguments if an intrinsic has been (or could be?) overridden.