Comment hierarchy issue

antoine-morvan commented 1 year ago

Hello,

I am walking an AST, looking for specific comments in the tree. I encountered some unexpected behavior regarding the hierarchy of the comments. This impacts the printer too.

My sample program looks like this, with 3 comments outside the do loop, and one comment inside the loop.

program test
    integer :: arg1
    integer :: iterator
    !comment_out 1
    arg1 = 10
    !comment_out 2
    do iterator = 0,arg1
        !comment_in
        print *, iterator
    end do
    !comment_out 3
end program test

I applied the following script to expose the unexpected behavior:

#!/usr/bin/env python3
# -*- coding: utf-8 -*-

ARG_FORTRAN_FILE_INPUT_PATH = "test_fail.f90"
ARG_FORTRAN_STANDARD = 'f2008'

from fparser.common.readfortran import FortranFileReader
from fparser.two.parser import ParserFactory
from fparser.common.readfortran import FortranFileReader
from fparser.two.utils import walk
from fparser.two.Fortran2003 import Comment

print("## Parsing input")
reader = FortranFileReader("{0}".format(ARG_FORTRAN_FILE_INPUT_PATH), ignore_comments=False)
f_parser = ParserFactory().create(std=ARG_FORTRAN_STANDARD)
parse_tree = f_parser(reader)
print("## Visiting AST")
cmtlist = walk(parse_tree, (Comment))
for cmt in cmtlist:
    if cmt.items[0]:
        print("## ---" + cmt.items[0])
        print(type(cmt.parent))
        print(cmt.parent)
print("## Done")

and got the following output:

## Parsing input
## Visiting AST
## ---!comment_out 1
<class 'fparser.two.Fortran2003.Implicit_Part'>
!comment_out 1
## ---!comment_out 2
<class 'fparser.two.Fortran2003.Block_Nonlabel_Do_Construct'>
!comment_out 2
  DO iterator = 0, arg1
  !comment_in
  PRINT *, iterator
END DO
## ---!comment_in
<class 'fparser.two.Fortran2003.Block_Nonlabel_Do_Construct'>
!comment_out 2
  DO iterator = 0, arg1
  !comment_in
  PRINT *, iterator
END DO
## ---!comment_out 3
<class 'fparser.two.Fortran2003.Execution_Part'>
arg1 = 10
!comment_out 2
  DO iterator = 0, arg1
  !comment_in
  PRINT *, iterator
END DO
!comment_out 3
## Done

We can observe that the comment_out 2 is considered as a children of the do loop, whereas I expect it to be at the same level as comment_out 3, that is as part of the execution part.

I've run this with fparser2 version 0.0.16.

We can also observe that the fparser2 script fparser2 --std=f2008 test_fail.f90 prints a wrong indentation when parsing this sample program:

PROGRAM test
  INTEGER :: arg1
  INTEGER :: iterator
  !comment_out 1
  arg1 = 10
  !comment_out 2
    DO iterator = 0, arg1
    !comment_in
    PRINT *, iterator
  END DO
  !comment_out 3
END PROGRAM test

Best.

arporter commented 1 year ago

Thanks for the report and reproducer. We've just hit this problem in a different context so I'm keen to take a look. Just struggling for time at the moment as it's the summer holidays.

arporter commented 1 year ago

The problem is in the BlockBase.match method which consumes any preceding comments, includes and directives before moving on to process the loop (https://github.com/stfc/fparser/blob/3ede17259e8854e9fd3976b80324a95bf60ca107/src/fparser/two/utils.py#L590-L595). Do @rupertford or @reuterbal know whether this is because we want to associate directives with the loop? Or is it something that was there originally?

arporter commented 1 year ago

To answer my own question, git blame shows me that it was like that 4 years ago (albeit, it only handled comments back then). The question is, if I change this, am I going to break the directives support?

rupertford commented 1 year ago

If I remember correctly, #358 fixes this.

reuterbal commented 1 year ago

FWIW: I don't think it is because of directives but just an artifact of the way it was implemented? In fact, we explicitly deal with this association of comments when we convert to our IR and don't mind seeing the need for that being gone;-)

stfc / fparser

Comment hierarchy issue #366