Handle comments in fparser2

bonfus commented 6 years ago

Is it possible to parse single blocks, i.e. parse the result of

r = FortranFileReader("some_file.f90")
line_or_comment = r.get_item()
# how to parse line_or_comment ?

Thanks

arporter commented 6 years ago

In general, it's not possible to parse such a thing because you have no context for it. However, if you know what you're expecting line_or_comment to contain you can do something like:

    reader = FortranStringReader(line)
    item = next(reader)
    if not cls.match(item.get_line()):
       raise ValueError('%r does not match %s pattern' % (line, cls.__name__))
    stmt = cls(item, item)

where cls might be e.g. Call. I suggest looking at the tests to see whether they contain anything helpful.

bonfus commented 6 years ago

Ok, that makes sense. Just to give you the whole picture, what I'm trying to achieve is a sort of templating mechanism based on some "special" comments. Unfortunately at the moment the Fortran2003 parser disregards comments (and directives too!) therefore I'm forced to jump between the parser and the reader to keep track of the comments position (and avoid loosing openmp directives). This, of course, is very ugly.

arporter commented 6 years ago

Ah, I see. A better solution (albeit more work in the short term) might be to extend the parser so that it (optionally?) doesn't throw-away comments. Or, possibly better, that it recognises directives.

rupertford commented 6 years ago

I would start with keeping comments and then extend to separating directives from comments as appropriate as directives are comments :-)

Another option would be to use fparser rather than fparser2 as it does keep comments. However I am sure you have your reasons for using fparser2 over fparser and we are gradually migrating to this later version as well.

bonfus commented 6 years ago

Thanks, I'll try to figure out if I'm able to modify fparser2 to preserve comments in the parser and how long it would take me to code it.

arporter commented 6 years ago

That would be great - thanks very much!

bonfus commented 6 years ago

It was quite tricky to understand how comments may fit into the hierarchical logic of the Fortran2003 parser. Actually I didn't figure that out so I just added comments as class members since their abstraction is already done at the reader level.

I have a working version in this fork: https://github.com/bonfus/fparser

and the following test code:

from fparser.Fortran2003 import Program
from fparser.readfortran import FortranStringReader

fortran_code ="""! A comment about the program
program ma

!a comment about a use statement
use bla, ONLY: ciao

integer :: i(3)
!a comment about variable b
integer :: b(4) ! another comment about variable b

!a comment about the assignement
i(0)=2

! the final comment
end program ma"""

reader = FortranStringReader(fortran_code)

def traverse_content(c):
    if hasattr(c, "content"):
        traverse_content(c.content)
    else:
        if type(c) is list:
            for l in c:
                traverse_content(l)
        else:
            print(c, c.comments)

p = Program(reader)
traverse_content(p)

produces

PROGRAM ma [Comment('! A comment about the program',(1, 1))]
USE :: bla, ONLY: ciao [Comment('',(3, 3)), Comment('!a comment about a use statement',(4, 4))]
INTEGER :: i(3) [Comment('',(6, 6))]
INTEGER :: b(4) [Comment('!a comment about variable b',(8, 8))]
i(0) = 2 [Comment('! another comment about variable b',(9, 9)), Comment('',(10, 10)), Comment('!a comment about the assignement',(11, 11))]
END PROGRAM ma [Comment('',(13, 13)), Comment('! the final comment',(14, 14))]

I'm not sure if this approach may ever fit your idea of fparser.

arporter commented 6 years ago

I hadn't realised that the reader already handled comments. That's good news. I think that it would be good if a comment was treated in the same way as other quantities in the Fortran source and thus appeared as an object in the resulting AST. This will then give us a way of building-up support for directives in future. Like you, I'm not entirely sure how to do this (I'm still learning about the internals of fparser) but I will have a play and see how I get on.

Will the changes you've made to return/keep the content of comments allow you to make progress in the short term?

arporter commented 6 years ago

I'm having a play with introducing Comment objects to fparser2. Currently I have problems because telling the reader not to ignore comments also means that it no longer ignores blank lines. Those blank lines are then causing problems higher up and several tests fail.

bonfus commented 6 years ago

Trying to answer your questions:

Will the changes you've made to return/keep the content of comments allow you to make progress in the short term?

Yes! My implementation is clearly not the right one (one problem above all, the comments after the last parsed fortran statement are lost) but it gives me exactly the information that I need, i.e. the list of the comments preceding valid statement.

I'm having a play with introducing Comment objects to fparser2.

I'm trying your version too. It still fails in some cases but your logic would perfectly fit my needs too. Thanks for your effort!

arporter commented 6 years ago

You're welcome! My version is very much a work-in-progress and I commit as I go so it may well break unexpectedly for you at the moment. I've just tweaked it so that it doesn't get upset about blank 'comment' lines. However, there are still 5 test failures. Some look involved but some are simply because the re-generated code puts what were in-line comments on a new line. Given that in-line comments can't contain directives (as far as I know) a temporary 'fix' might be to throw them away. Ideally, an in-line comment would be stored as a child of the statement which it appears alongside, however, as I write that, I can see complications from continued statements, e.g.:

my_value = my_fn(a) + 8* & ! This is an in-line comment
            my_fn(b)       ! that is continued on a second line

or, worse still, when the comment is continued but the Fortran statement isn't:

my_value = my_fn(a) + my_fn(b)  ! This is an in-line comment which we
                                ! continue on a 2nd line for no apparent reason

In this case there's no way of knowing whether the second comment 'belongs' with the first or not.

arporter commented 6 years ago

I've hit what seems to be a significant issue that is shown up when parsing something like:

! A comment
program my_program
  write(*,*) "Hello"
end program

At the highest level, we use fparser2 to parse a file like so:

reader = FortranFileReader(filename)
ast = fparser.Fortran2003.Program(reader)

However, doing this for the Fortran example above results in ast just being a Fortran2003.Comment with the rest of the program lost. The question really is where does such a comment belong in the AST and what should be the root of the AST?

bonfus commented 6 years ago

This is indeed the first problem that I met (and the reason why I abandoned this approach while trying to formulate a quick solution). I wanted to try if your branch solved this problem but you anticipated me.

The question really is where does such a comment belong in the AST and what should be the root of the AST?

The question that you are posing is indeed more or less what I had in mind in one of my first posts. The workaround that I choose is linking all comments to the first fortran statement met in the program. What is still missing in my implementation are comments following the last fortran statement ('end module', 'end subroutine', 'end program', etc) because I couldn't find a quick way to go beyond the last (significant) line. Clearly also files containing only comments produce errors. Once more, thanks for your effort.

arporter commented 6 years ago

Panic over. I realised that fparser2 will happily cope with the situation where a file contains more than one program unit (program, subroutine, etc.). These are all then children of the root Program node. We can therefore have Comment as a child of this node too. I've achieved this by adding Comment to the subclasses lists of Program_Unit and Main_Program0. All tests now pass apart from the one with the in-line comment.

arporter commented 6 years ago

I now have all of the fparser2 tests working. Have spent ages fixing a bug that caused us to lose the first line of a free-format file if it was a comment. Other tests now fail because at some point I've changed the code so that we always throw-away blank lines (irrespective of whether we're keeping comments). In retrospect that doesn't seem like a good change so I will see if I can work out how to undo it...

bonfus commented 6 years ago

This is superb. Thank you again. As soon as this final fix is ready I will run the parser on the Fortran code that I'm targeting and I'll let you know if I find other problems.

bonfus commented 6 years ago

Just to let you know that line 374 fails in my python3 based installation https://github.com/stfc/fparser/blob/29e24371304d540e669b23d54b2b38e369c50b6f/src/fparser/readfortran.py#L370-L376

But I think that the import will be eventually substituted with the cls variable. Of course the workaround from fparser import Fortran2003 works fine.

arporter commented 6 years ago

Thanks for pointing this out - I mostly develop with Python2 but Travis runs the test suite for both 2 and 3. I've done the workaround for now but maybe Comment should be a sub-class of Line? I shall have a think.

arporter commented 6 years ago

I finally have all tests passing (and PSyclone's test suite is still happy too). However, I have some doubts about the elegance/correctness/robustness of my approach. e.g. Base.__new__(cls, ...) can now return a Comment object irrespective of what cls actually is. That is not very nice and probably confusing for any newcomer to the code. In turn, that behaviour requires special-casing at several points in the code in case we did get a Comment rather than the thing we expected. If I've understood the code correctly then I think the 'right' thing to do would be to add Comment as a valid subclass of all appropriate classes and let the existing machinery take care of it. I'll cut an experimental branch and try that...

arporter commented 6 years ago

This change has had the unfortunate side-effect that any in-line comments now get thrown away. This is because, in the process of parsing a line containing such a comment, the comment is stripped off and put back in the reader's FILO buffer. If there's no match for the current class with the parsed line then the parsed line is also put back on the FILO buffer. When we then subsequently move on to try and match with other classes, the fact that the comment (now the 2nd element in the FILO buffer) was originally in-line is lost - it gets treated as a normal comment.

arporter commented 6 years ago

Maybe I should abandon the idea of treating in-line comments differently?

arporter commented 6 years ago

Having in-line comments become full-line comments is actually how the parser behaved before I started this work. It therefore seems reasonable to stick with this behaviour.

bonfus commented 6 years ago

I did not actually check, but if I remember correctly in the Comments object you may have information about line span, correct? This may help in-lining comments if needed.

arporter commented 6 years ago

Thanks @bonfus. You're right - there is such information. However, I'm going to stick with just making in-line comments into full-line comments (since this was how fparser behaved originally).

rupertford commented 6 years ago

PR #71 has been merged to master. Closing this issue.

stfc / fparser

Handle comments in fparser2 #68