stfc / fparser

This project maintains and develops a Fortran parser called fparser2 written purely in Python which supports Fortran 2003 and some Fortran 2008. A legacy parser fparser1 is also available but is not supported. The parsers were originally part of the f2py project by Pearu Peterson.
https://fparser.readthedocs.io
Other
61 stars 29 forks source link

Error handling/messages in fparser2 #52

Closed rupertford closed 5 years ago

rupertford commented 6 years ago

At the moment fparser2 will parse a lot of invalid fortran. Let's take the open statement as a simple example as I've just reviewed a change to it.

open(&^£) is parsed as the correctness of the type of a unit is not checked open(50, 51) is parsed as the number of unnamed arguments is not checked open(file='fred', 50) is parsed as there is no check that the unnamed argument should be first in the list open(50, unit=50) is parsed as fparser2 does not know that you either have a named unit or an unnamed unit open(50, asynchronous='err') is parsed as fparser2 does not know the expected values for asynchronous.

The question I would like to raise is whether we actually want/need fparser2 to raise nice errors if and when it is provided with invalid fortran. One option would be to limit the scope of fparser2 to parsing valid fortran code (at least for the time being) and rely on the user to check their code is valid using standard compilers beforehand.

I shall not flavour the discussion with my preference ... yet.

Thoughts ...

TeranIvy commented 6 years ago

Parser plus checking for valid Fortran seems to me like building a compiler. Some of the potential issues are taking care about backwards compatibility and constant updates because of missing valid features.

Still, I am not sure how to provide meaningful errors if fparser2 does not know what is valid - I guess this depends on what is the primary purpose of fparser2. If we go for validity checks, I think that separating that functionality from the text processing part would make it easier to extend.

hiker commented 6 years ago

I would be happy enough with the restriction that it only accepts valid Fortran - it will save a lot(!) of work, and the saved time can be spent on other things ;)

rupertford commented 6 years ago

OK, I'll now say what I think ...

I also don't think it is acceptable for fparser2 to accept invalid fortran. So, if we start adding appropriate tests for things that are invalid (as in the open example above) we will know the type of error we are looking for and the context (as we are coding it) so it should not be difficult to raise an appropriate exception.

rupertford commented 6 years ago

I also think we should start having basic unit tests for all of the above situations in a structured way (e.g a file for each statement type), but that goes hand in hand with adding code for it in any case.

TeranIvy commented 6 years ago

This is from the short-lived issue #93 before I realised this exists:

The current error handling in fparser2 when it cannot parse a Fortran string is via NoMatchError in the match method of each class which returns empty (None) object. If a class has tostr/tofortran method with checks for None objects then NoMatchError is reported there. There is ParseError as well, however it does not appear to be used in fparser2 at all.

There has been already some discussion in PR #72 about using ParseError with more informative message instead of NoMatchError in the match method for each class, but there were concerns that it might break the usual way of handling things (see this and subsequent comments).

An informative error message, including a line number, would be more useful and the task of this issue is to come up with the approach.

MatthewHambley commented 6 years ago

First we should determine if moving to ParseError does break things. If it does we could just update the places which are catching a NoMatchError to look for the new exception.

Lack of match seems like an internal implementation detail. This doesn't mean it is inappropriate to throw such an exception but it probably shouldn't make it out of the parser.

Another potential solution is to consider class hierarchies. Maybe NoMatchError is a child of ParseError or the other way round. Some more analysis is needed.

rupertford commented 6 years ago

From a user perspective I think it makes sense for there to be 3 types of error 1) a parse and/or syntax error, 2) an internal error and 3) an input error. These are what we tend to get from a compiler. Are there any more?

I agree with @MatthewHambley that internal control such as NoMatchError should not make it out to the user, however the information it provides (context for an error) should.

I agree with @TeranIvy that a parse and/or syntax error should provide useful information on where the error was, ideally with line number and offending text, again as existing compilers tend to do. Some good news on this is that the information we need is available when we have a NoMatchError, at least when the match is a reader object (rather than a string). In issue #92, PR #94 I have used this to improve the output (IMHO) from a NoMatchError exception.

rupertford commented 6 years ago

For input values the consensus seems to be to use the existing ValueError is there is an error in a value, or some combination of values. :-).

For internal errors I propose : InternalError (who could have predicted that).

If we know that some feature is not yet supported and have a placeholder for it I propose : UnsupportedError.

rupertford commented 6 years ago

I propose that we have a SyntaxError rather than a ParseError as that seems more appropriate.

To try to back this up I quote wikipedia on parsers: "A parser is a software component that takes input data (frequently text) and builds a data structure – often some kind of parse tree, abstract syntax tree or other hierarchical structure, giving a structural representation of the input while checking for correct syntax. "

rupertford commented 6 years ago

Here is a simple example of where we are at the moment with syntax errors (once PR #94 goes on master) and this is compared with gfortran for reference. As you can see the actual information being output by fparser is not too bad for this example (ignoring the fact we have not caught the exception) ...

rupert@ubuntu:~/proj/fparser/src$ cat error.f90 
progra error
end program error

rupert@ubuntu:~/proj/fparser/src$ gfortran error.f90 
error.f90:1:0:

 progra error
 1
Error: Unclassifiable statement at (1)
error.f90:2:17:

 end program error
                 1
Error: Syntax error in END PROGRAM statement at (1)
rupert@ubuntu:~/proj/fparser/src$ python runme4.py 
Traceback (most recent call last):
  File "runme4.py", line 9, in <module>
    program = myparser(reader)
  File "/home/rupert/proj/fparser/src/fparser/two/Fortran2003.py", line 254, in __new__
    raise NoMatchError(errmsg)
fparser.two.Fortran2003.NoMatchError: at line 1
>>>progra error

rupert@ubuntu:~/proj/fparser/src$

We could potentially try to be more specific about the token that does not match (i.e. progra) but I'm not sure how easy that would be to implement. In general it might mean modifying user written match methods which could be a lot of work.

We could aim for an output similar to gfortran i.e.

<filename>:<line number>:<line position>:
<offending line as text>
syntax error

Note, I also think that all exceptions that are caught should be directly output to stdout/stderr rather than being logged.

Any thoughts or suggestions?

MatthewHambley commented 6 years ago

In order to give more specific error messages we might need to generalise the matching of strings out of user written code and into data driven code like the rest of the parser. This represents some amount of work but I don't have a good feeling on how much.

As for output, the logging system allows stdout and/or stderr to be chosen as output streams so it is not necessary to forgo the one in order to obtain the other. Or do you mean we should not be allowing the Python run-time to spew all over the console? In which case I agree.

rupertford commented 6 years ago

Hi @MatthewHambley et al.

At the moment it looks relatively easy to output filename, line number and offending text so I recommend we start with that. I think I can implement that relatively quickly - in fact, barring a bug it is already mostly there on master.

We could then look at potentially adding more specific information (e.g. which particular type of match failed within a line and/or line position) separately.

As for output I don't think we should be logging errors, they should be handled with exceptions in my view. @arporter pointed out that logging could/should be used for information about what choices the code is making which I'm happy with i.e. things that do not cause the parser to fail. Does that make sense?

rupertford commented 5 years ago

Basic error handling is now working on master and specific information is being added where possible. There is still an outstanding problem with outputting specific information when code fails to match more than one rule (so which one should be reported). However, I think this issue can be closed.