stadelmanma / tree-sitter-fortran

Fortran grammar for tree-sitter
MIT License
30 stars 15 forks source link

More f77 features #87

Closed ZedThree closed 11 months ago

ZedThree commented 1 year ago

A few more Fortran77, obsolescent, and deleted features, and some bug fixes:

ZedThree commented 1 year ago

With these changes, the successful parse rate for the corpus I've gathered is:

Total parses: 3053; successful parses: 3010; failed parses: 43; success percentage: 98.59%

Pretty good!

stadelmanma commented 1 year ago

This one might need a rebase on master, great work. Do the failing files have preprocessor directives or other language features?

ZedThree commented 1 year ago

I've tried to ignore files with preprocessor directives in. There's maybe a couple of language features we're missing, like coarrays -- I might try to add those in today if I get chance. There's a few things we don't get quite right (I've just found out use mod, only: is a valid statement), and there are quite a few files that I think are either just invalid fortran (sometimes deliberately) or are meant to be included in other files (for example, just consist of type definitions, or variable declarations).

ZedThree commented 1 year ago

I think this is about it for the F77 and earlier stuff. There's a couple of really annoying edge cases that I've not managed to work out and won't have time to look at for awhile:

  1. labelled do statements: these can have shared termination statements which can in principle be any one of a number of normal statements, but in practice are either continue or end do:
  do 10 i = 1, 10
    do 10, j = 1, 10
      foo(i, j) = i * j
10 continue

  do 20 i = 1, 10
    do 20, j = 1, 10
      foo(i, j) = i * j
20 end do

I cannot work out how to support both of these. In this PR I've made the choice to parse the first as two separate statements (i.e. not nested), and just not deal with the second one. Making the terminating statement optional causes tree-sitter to parse it greedily, and make the first do be closed by the 20 end do.

I don't really see how you can parse this without keeping track of the label itself and using that to end the corresponding do statement.

I did briefly also try to just allow standalone <label> end do statements, but that caused more problems than it solved.

  1. statement functions:
ABS1( X ) = ABS( REAL( X ) ) + ABS( AIMAG( X ) )

These look identical to assignment statements.

ZedThree commented 1 year ago

@stadelmanma Are you ok if I go ahead and merge this? I can then update the fixed-form version to point to this repo, and then we can get both into https://github.com/grantjenks/py-tree-sitter-languages to make it easier to use in other projects.

stadelmanma commented 1 year ago

That all sounds great, merge away when ready.