Open stadelmanma opened 5 years ago
Here's a couple of repos containing examples highlighting specific Fortran features:
https://github.com/scivision/fortran2018-examples https://bitbucket.org/gyrokinetics/fortran-features/src/main/
For fixed form, you might consider LAPACK and BLAS
I took the free form list above, the two repos I mentioned, plus the following known users of Ford:
https://gitlab.com/lfortran/compiler_tester https://github.com/fortran-lang/fpm https://github.com/QcmPlab/HoneyTools https://github.com/cibinjoseph/naturalFRUIT https://github.com/cibinjoseph/C81-Interface https://github.com/cp2k/dbcsr https://github.com/kevinhng86/faiNumber-Fortran https://github.com/ylikx/forpy https://github.com/D3DEnergetic/FIDASIM https://github.com/jacobwilliams/bspline-fortran https://github.com/szaghi/VTKFortran https://github.com/szaghi/FLAP https://github.com/toml-f/toml-f https://github.com/jacobwilliams/json-fortran https://github.com/fortran-lang/stdlib
And the following most-starred Github repos using Fortran:
https://github.com/wrf-model/WRF https://github.com/mapmeld/fortran-machine https://github.com/wavebitscientific/functional-fortran https://github.com/modern-fortran/neural-fortran
Running tree-sitter parse '../fortran_examples/**/*.f90' --quiet --stat
gives:
This doesn't include files that need preprocessing, or free form files. Pretty good though!
It should be possible to write something that will spit out the failing source. I'll have a look at doing that.
EDIT: Looking at other file extensions:
It looks like a chunk of preprocess-required files are parseable, but most fixed form files aren't
That sounds right. Tree-sitter is meant to be error tolerant so that a single error doesn’t cause the entire parse to fail since it was initially designed for use with code editors. Did you run this on the master branch or a temporary one with some of your fixes merged in?
This is using #76, plus a couple of other minor fixes I haven't pushed yet.
I found a few more interesting repos, and I've deleted some repeated vendored dependencies, along with some obviously non-standard Fortran files that are really templates for various custom preprocessors.
With #79, I now get:
I also wrote some Python to print the first error in each .f90 file under a directory:
from ast import literal_eval
from subprocess import run
from pathlib import Path
import re
def print_context(filename, start, end, context=2):
contents = filename.read_text().splitlines()
start_context = max(0, start[0] - context)
end_context = min(len(contents), end[0] + context + 1)
print(f"{44 * '='}")
print(f"{filename}: {start[0]+1}:{end[0]+1}")
if start_context == 0 and end_context == len(contents):
print("WHOLE FILE")
return
large = (end_context - start_context) > 12
if large:
end_context = start_context + 12
print()
print("\n".join(contents[start_context: start[0]]))
print(contents[start[0]].strip("\n"))
print(f"{start[1] * ' '}^{(end[1] - start[1]) * '~'}")
print("\n".join(contents[start[0]+1: end_context]))
if large:
print("...")
print()
ERROR_RE = re.compile(r"\[\d+, \d+\]")
def parse_line(line):
filename, _, error_bit = line.split("\t")
filename = Path(filename.strip())
start, end = ERROR_RE.findall(error_bit)
return filename, literal_eval(start), literal_eval(end)
def print_errors_for_dir(dir_name):
command = f"tree-sitter parse '../fortran_examples/{dir_name}/**/*.f90' --quiet"
lines = run(command, text=True, capture_output=True, shell=True).stdout.splitlines()
for line in lines:
print_context(*parse_line(line))
The vast majority of the files that are left actually have preprocessor directives in them, even though their file extension is .f90
.
With #81, we can successfully parse more than 90% of .f90
files in this corpus. There's a few real edge cases left, but the majority of failures are now either due to preprocessor directives or invalid Fortran (for example, the file is meant to be include
d in another file)
I removed flibs
from my corpus as it has too many files using a custom preprocessor. WRF
also uses a custom preprocessor for at least one of its submodules, so I ignored files containing KPP_REAL
. Lots of projects seem happy to put preprocessor directives in .f90
files, but they're easily ignored with grep -LE "^#"
. That still leaves a few files that are written to be included in other files, and so aren't valid translation units. I've not worked out how to ignore them systematically yet.
It's a pity the parser isn't designed to take any options, it might be nice to be able to parse standalone snippets.
Anyway, here's the corpus I'm currently using:
And the current success rate:
I think the remaining features or edge cases are:
/
in edit descriptors: 5 format("Array sizes =", i4, "MB. Clock resolution = ", f6.3, " ms."/)
backspace
, rewind
, pause
call append_chunk(lun_filecount,(/ real(file_count) /), '(f10.0)', ascii_fmt)
character(1, tfc) :: space = tfc_" "
go to (1, 2, 3), n
elseif(n-1.eq.0) then
CHARACTER FMAT*22
do 11 iter=1,itmax
And maybe one or two others that are less obvious.
I think the shortcoming in the CLI of the parser are just due to it intending to be a library. A small wrapper script that utilizes it would allow us some additional flexibility.
I have a Python script that utilizes the Java tree sitter language to translate parts of Java into Python that could be repurposed. I figure for parsing “snippets” our best bet would be to wrap them in a PROGRAM block and hope for the best.
Great idea!
Once I have all the key features I know of implemented I will want to do broad testing against real Fortran code bases to see if there was anything I missed. Once I have support for fixed form Fortran I'll do the same there as well.
Eventually, with code author permission (or if the license allows it) I'll store some good source code files in the examples directory.
Free Form: https://github.com/firemodels/fds https://sourceforge.net/projects/flibs/ https://github.com/astrofrog/fortranlib https://github.com/Unidata/netcdf-fortran https://github.com/jacobwilliams/json-fortran https://github.com/jerryd/gtk-fortran https://github.com/certik/fortran-utils https://github.com/andreww/fox
Fixed Form: https://github.com/stadelmanma/netl-ap-map-flow