stfc / fparser

This project maintains and develops a Fortran parser called fparser2 written purely in Python which supports Fortran 2003 and some Fortran 2008. A legacy parser fparser1 is also available but is not supported. The parsers were originally part of the f2py project by Pearu Peterson.
https://fparser.readthedocs.io
Other
62 stars 29 forks source link

Allow generic preprocessor directives #215

Closed mlange05 closed 4 years ago

mlange05 commented 4 years ago

In addition to include statements which are recognised as Include_Stmt AST nodes, it would be very useful to allow generic pre-processor directives (eg. #define or #idfef) to be internalised under some generic Directive AST node to prevent parser failures on source files that have not been pre-processed yet. This would allow source-to-source translator tools to deal with non-conformant directives downstream and thus become less intrusive to existing code.

If this is a problem for other use cases, it might also be possible to specialise this behaviour with a flag, something like

reader = FortranStringReader(fcode, ignore_comments=False, allow_directives=True)

I'm happy to provide a PR - just looking for some confirmation that this is ok and possibly some pointers where to start.

rupertford commented 4 years ago

I think it is fine to add in this support as long as it is understood that the parser will essentially treat any cpp directives as comments and therefore the code without the cpp directives must be valid Fortran. This is already the case for Fortran include statements (as explained in the fparser documentation).

rupertford commented 4 years ago

In terms of implementation, nothing needs to be done to the reader ...

rupert@ubuntu:~$ python
Python 2.7.15+ (default, Nov 27 2018, 23:36:35) 
[GCC 7.3.0] on linux2
Type "help", "copyright", "credits" or "license" for more information.
>>> from fparser.common.readfortran import FortranStringReader
>>> code = '''
... subroutine test(a)
... integer a
... #include "xxx"
... #define YYY
... #ifdef YYY
... print *, "hello"
... #endif
... end
... '''
>>> reader = FortranStringReader(code)
>>> reader.next()
Line('subroutine test(a)',(2, 2),None,None,<reader>)
>>> reader.next()
Line('integer a',(3, 3),None,None,<reader>)
>>> reader.next()
Line('#include "xxx"',(4, 4),None,None,<reader>)
>>> reader.next()
Line('#define YYY',(5, 5),None,None,<reader>)
>>> reader.next()
Line('#ifdef YYY',(6, 6),None,None,<reader>)
>>> reader.next()
Line('print *, "hello"',(7, 7),None,None,<reader>)
>>> reader.next()
Line('#endif',(8, 8),None,None,<reader>)
>>> reader.next()
Line('end',(9, 9),None,None,<reader>)
>>> reader.next()
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
  File "/home/rupert/proj/PSyclone/external/fparser/src/fparser/common/readfortran.py", line 758, in next
    item = self._next(ignore_comments)
  File "/home/rupert/proj/PSyclone/external/fparser/src/fparser/common/readfortran.py", line 829, in _next
    raise StopIteration
StopIteration
>>> 
rupertford commented 4 years ago

To add support in the parser I would take a look at what has been done to add support for an include node and copy that. The two files to look at are fparser/two/Fortran2003.py and fparser/two/utils.py. I would suggest extending the add_comments_includes to also try to match cpp directives and then create a cpp node (or separate nodes if you prefer).

rupertford commented 4 years ago

A downside of the suggested approach is that there will be no connection between #ifdef, #else, #endif, they will be treated as independent directives and there will be no validity checking e.g. that an #ifdef has a matching #endif. However, I suspect this could be added in the future using some primitive logic in the classes.

rupertford commented 4 years ago

If there were a generic cpp class then the rule would have to support anything after the directive as an optional value e.g. #endif would be able to have a value after it because #define requires a value. To check for this it would make sense to match cpp directives separately. However, you could argue that it is not fparser's job to check the validity of cpp directives so it probably does not matter if the matching is loose.

rupertford commented 4 years ago

@mlange05, if you're happy with the above suggestions and the limitations then it would be great if you could add in support. If you prefer I would be happy to implement support as I added the include support and it is essentially an extension of that.

hiker commented 4 years ago

TBH I see a few disadvantages: the moment we appear to support PP directives, we might need to deal with things like:

1) what if a code uses a #defined value in a Fortran expression (#define PI 3.14) . This will cause problems with our upcoming symbol table. 2) What if some parameters are #ifdef'ed out, or uses #include for parameters

   call sub(   &
#ifdef HAS_SOMETHING
       something,
#endif 
#include parameter_list_for_sub
      more_paramters)

3) What if the kernel name depends on a PP directive?

Iirc 2) and 3) were used by older UM versions (can't remember the details, but UM did something convoluted to different implementations of a subroutine, and select at compile time using #ifdef which implementation to use). ROMS also uses 2) for some parameters

Having the PP directives in PSyR could cause some issues with dependency analysis, since it would not know that only one part of the code is actually used.

We also have to be careful with documenting this, since the expectations of the user might be that psyclone is PP-directive aware (ie. somehow evaluates them??)

Do we have a use case for this? The only one I can think of is if PSyclone should be used to modify a source code once (e.g. add OpenMP), but then keep on maintaining and building the code the traditional way, and I believe that is not really a way we would want to support (skipping long discussion here which we had at the UK MO in summer).If PSyclone is used in building, as it is intended to, files can easily be preprocessed.

rupertford commented 4 years ago

Hi @hiker. I agree we have no use case for this in PSyclone at the moment. However @mlange05 was suggesting this to support a different tool chain that would make use of fparser.

In answer to your examples above, they would produce a parser error as the code without the cpp directives is not valid Fortran. As you say, we would need to make it clear what is supported, but we already do that for the include statement.

hiker commented 4 years ago

Oops - my bad, I didn't notice that this was fparser, I thought it was PSyclone - I'll take it all back :)

rupertford commented 4 years ago

Still valid points @hiker

mlange05 commented 4 years ago

@rupertford Thanks for you input and suggestion. I've had a first play, but nothing ready to push yet, but I'll try and get something more meaningful done next week. @hiker many thanks for you input, I'd not considered these use cases, but they are definitely valid and need consideration.

martin-schlipf commented 4 years ago

I would also be interested to parse Fortran code with preprocessor statements. Has there be any progress on this?

rupertford commented 4 years ago

I'd be happy to pick this up if @mlange05 has too much on - they work them hard at ECMWF :-).

martin-schlipf commented 4 years ago

Do you think it is possible to cover the following use cases? They do not reduce to valid Fortran when the preprocessor statements are removed.

#define RETURN_ON_ERROR(ierr) if (ierr /= 0) return
call mpi_barrier(mpi_comm_world, ierr); RETURN_ON_ERROR(ierr)

#ifdef USE_COMPLEX
#define VAR_TYPE complex
#else
#define VAR_TYPE real
#endif
VAR_TYPE :: var
mlange05 commented 4 years ago

@rupertford @martin-schlipf Hi, sorry for dropping the ball on this a little. I have a partial implementation in a fork here but I got stuck trying to catch #ifdef .. else .. #endif regions. I was a little torn between trying to catch this with regex vs. creating a small sub-grammar when the day job caught up with me.

I'll try and find some more time once the end-of-year rush is over. :wink:

martin-schlipf commented 4 years ago

I implemented most of the preprocessor macros in a branch on my fork. You can see what kind of macros I can parse in the test_preprocess.py file. However now I'm stuck to get this working in the complete Program(reader) parser.

reuterbal commented 4 years ago

I picked up the preprocessor work from @mlange05. Using @martin-schlipf 's excellent work I created a branch that combines both attempts and yields a working version that parses preprocessor directives.

I have based it on the C99 draft, Sec 6.10, implementing all directives except for #pragma as this is also not understood by Fortran compilers (as far as I know). Directives are handled similarly to Include_Stmt, being accepted everywhere. For that, I extended the add_comments_includes function. All preprocessor classes are currently separated into another file. Originally I intended to make support for those optional, allowing to enable it in the same way as choosing the Fortran standard:

parser = ParserFactory().create(std='f2003', cpp='c99')

but I didn't find a satisfying way how to propagate this to the relevant places in Fortran2003.py and utils.py. If you have any ideas, I'm happy to add this.

Something I had to drop was the integration of #ifdef ... #elif ... #else ... #endif regions into a construct. This does work as long as the region does not span multiple Fortran blocks but this can happen quite often (at least in our code base). The bits for that are still in the code and commented out, in case its ever needed.

If you're happy with this I can create a PR

martin-schlipf commented 4 years ago

Do you think it is possible to cover the following use cases? They do not reduce to valid Fortran when the preprocessor statements are removed.

#define RETURN_ON_ERROR(ierr) if (ierr /= 0) return
call mpi_barrier(mpi_comm_world, ierr); RETURN_ON_ERROR(ierr)

#ifdef USE_COMPLEX
#define VAR_TYPE complex
#else
#define VAR_TYPE real
#endif
VAR_TYPE :: var

@reuterbal I just want to point out that the above will fail when parsing. To be fair, it never worked on my branch either and I stopped, because it would be a complicated task to get it working properly. I thought about keeping track of the Cpp_Macro_Identifier (in your branch), but if you have code that uses #include that will not work either. Perhaps you can add a method to the ParserFactory that allows one to initialize a list of macros. Example:

f2003_parser = ParserFactory().create(std='f2003')
f2003_parser.set_cpp_macro_identifiers(('USE_COMPLEX', 'VAR_TYPE'))
code = """#ifdef USE_COMPLEX
#define VAR_TYPE complex
#else
#define VAR_TYPE real
#endif
VAR_TYPE :: var"""
reader = get_reader(code)

But you can see how that would get complex very quick, which is why I did not pursue this route further.

reuterbal commented 4 years ago

@martin-schlipf these are valid points and really anything that does not reduce to valid Fortran when directives are removed will not work - however, this was a basic requirement by @rupertford from the very beginning:

I think it is fine to add in this support as long as it is understood that the parser will essentially treat any cpp directives as comments and therefore the code without the cpp directives must be valid Fortran. This is already the case for Fortran include statements (as explained in the fparser documentation).

My objective (at least at the moment) would be only to keep directives as AST nodes but not to interpret them - or putting it differently: avoid preprocessing in those cases where it is not absolutely mandatory.

We'd be very much interested in having a full preprocessing support but in my eyes mixing both parsing of Fortran and interpreting preprocesser directives does not make sense as you will always run into an edge case that will not be well-defined. After all, preprocessor directives are meant for a different compilation phase and to be resolved before any Fortran standard compliance is to be assumed by a compiler.

Thinking a bit ahead towards the two-phase parser that fparser might become in the future: I could imagine having an optional third phase in the beginning that interprets directives to produce pure Fortran code, making use of user-provided macro definitions (very much similar to GCC's -D ... -U ... -N ... flags). But that's a very different approach then.

rupertford commented 4 years ago

@reuterbal, yes the implementation in your branch (based on the previous work from @martin-schlipf and @mlange05) is very much what I was thinking. If you could make a PR I would be happy to review. Please forgive me if I request a few changes :-)

rupertford commented 4 years ago

I suggest we create a new issues to discuss a) @martin-schlipf's use cases, b) the possibility of linking ifdef ... else constructs and c) integrating cpp-functionality into fparser

reuterbal commented 4 years ago

I have created new issues for cases b) and c) but am somewhat unsure how to phrase @martin-schlipf 's use cases in an abstract way. I'll have to think some more about that tomorrow...

Another feature that is somewhat related would be to recognize predefined macro names similar to intrinsic function names. I have opened #242 for that.