Reducing permissiveness of parser

aidanheerdegen commented 8 months ago

I am using f90nml to pull namelists out of a model log file.

A description and context are available here:

https://github.com/aekiss/run_summary/issues/32

TL;DR the (MOM5) stdout contains the following text:

``` &OCEAN_SHORTWAVE_GFDL_NML USE_THIS_MODULE = T, READ_CHL = T, CHL_DEFAULT = 8.000000000000000E-002, ZMAX_PEN = 1000000.00000000 , SW_FRAC_TOP = 0.000000000000000E+000, DEBUG_THIS_MODULE = F, ENFORCE_SW_FRAC = T, OVERRIDE_F_VIS = T, SW_MOREL_FIXED_DEPTHS = F, OPTICS_FOR_UNIFORM_CHL = F, OPTICS_MOREL_ANTOINE = F, OPTICS_MANIZZA = T / NOTE from PE 0: ==>Note: USING shortwave_gfdl_mod. =>Note: Using shortwave penetration with GFDL formulaton & Manizza etal optics. NOTE from PE 0: ==>Note: Reading in chlorophyll-a from data file for shortwave penetration. =>Note: computing solar shortwave penetration. Assume stf has sw-radiation field included. Hence, solar shortwave penetration effects placed in sw_source will subtract out the effects of shortwave at k=1 to avoid double-counting. ==>Note: Setting optical model coefficients assuming nonuniform chl distribution. &OCEAN_SPONGES_TRACER_NML USE_THIS_MODULE = F, DAMP_COEFF_3D = F / ```

The parser interprets the & in the purely descriptive middle paragraph as the start of a namelist and parses the rest of the text like so:

Details

``` manizza: ':': null '=': - '>' - Note - ':' - Setting - optical - model - coefficients - assuming - nonuniform - chl - distribution. penetration.: - '>' - Note - ':' - computing - solar - shortwave - penetration. - Assume - stf - has - sw-radiation - field - included. - Hence - solar - shortwave - penetration - effects - placed - in - sw_source - will - subtract - out - the - effects - of - shortwave - at k: - 1 - to - avoid double-counting.: null ```

I know this is a tough ask, and not exactly a well supported use-case, but I'd really like it if I could ask the parser to be more strict. The paragraph it had interpreted as a namelist group doesn't have a closing \ for example.

Would it be possible, or desirable, to have a --strict mode, or similar, that required what I would describe as "well-formed" namelists, e.g. start of a group is an & which is the first character on a line, preceded only by whitespace? Similar for the end of a namelist group and \.

marshallward commented 8 months ago

I think you might be in luck. The standard might actually agree with your interpretation to some extent:

Input for a namelist input statement consists of (1) optional blanks and namelist comments, (2) the character & followed immediately by the namelist-group-name as specified in the NAMELIST statement, (3) one or more blanks, (4) a sequence of zero or more name-value subsequences separated by value separators, and (5) a slash to terminate the namelist input.

In other words, & followed by a blank is no namelist group, and f90nml is in error here.

I even tested this out in a test GFortran program and it had no problem skipping over the & Manizza etal content and reading &ocean_sponges_tracer_nml ... /. So again, f90nml looks like the one in error.

I have to say... this really looks like a preprocessing job on your end :P but an error's an error. I will have a crack at it.

marshallward commented 8 months ago

... and we'll just say "No comment" regarding the first requirement. In my experience, compilers have always seemed to be very generous about their handling of the space between namelist groups.

aidanheerdegen commented 8 months ago

this really looks like a preprocessing job on your end

TBH I am using f90nml as the pre-processor, for which it does an admirable job. Thanks!

It would be a pain to effectively re-invent what f90nml is doing to figure out what isn't a legit namelist. Scrabbling around in the entrails of STDOUT trying to recreate semantic structure feels very 1980ish, and yet here we (I) still are (am).

marshallward commented 8 months ago

It would be a pain to effectively re-invent what f90nml is doing to figure out what isn't a legit namelist.

I think this might be the problem: F90nml does a very poor job of detecting what is and isn't a valid namelist. There is a lot of hidden assumptions that the input is a namelist. So it may not actually be well suited to handle namelist groups embedded in other text. (There are similar open issues where people have tried to use F90nml to parse namelist-like files, and it rarely works correctly.)

On top of that, &Manizza without the blank would be a valid namelist group, and your typical Fortran program would crash if it were to encounter &Manizza. In this case, the correct response would be to raise an error. Or (in our case, unfortunately) feed back a bunch of garbage.

But... you have fallen into an interesting corner case with that extra whitespace, so a solution is likely.

As for that solution, I have taken a look. There is a small issue because the token iterator automatically skips over whitespace, which is why the Manizza group is being created. If I can make this optional, then I believe this can be fixed. & may be the only token in the entire namelist grammar which forbids whitespace as the next token.

aidanheerdegen commented 8 months ago

As long as I'm special that's all I care about.

marshallward / f90nml

Reducing permissiveness of parser #160