marshallward / f90nml

A Python module and command line tool for working with Fortran namelists
Apache License 2.0
137 stars 50 forks source link

Only extract a specified namelist #39

Open jacobwilliams opened 7 years ago

jacobwilliams commented 7 years ago

Idea... Say there was a file with multiple namelists, and you only wanted to read a specified one. Maybe this could be an option, where you specify the one you want, rather than having to parse the whole file. (related to #30 if the file is very large and the parsing is a bottleneck if you only want certain info from the file).

marshallward commented 7 years ago

Yes, a good idea I think, but I wonder how to do it in an effective way. Most of the time seems to be spent parsing and constructing the tokens (via shlex) and you would still need to sort through all the tokens prior to the specified namelist, even if only to determine where each namelist begins and ends.

It would let you exit immediately after reading the namelist, rather than going through the whole file, which might help in some cases.

A more intelligent tokenizer (#30) might be a way forward here. Or maybe it's time to just dump the entire namelist (or file) into memory and dice it up into pieces. (Maybe I should have done that from the beginning...)

jacobwilliams commented 7 years ago

Anything to speed things up has my support. Unfortunately it's only moral support right now. :)

jacobwilliams commented 5 years ago

FYI: I have a simple experiment related to this here. I was testing splitting up a file of multiple namelists into chunks, reading them separately with multiprocessing, and then stitching the results back together at the end. Even with only one thread, it's still faster than a default read.

Also related to #30.

marshallward commented 5 years ago

Thanks, useful info! Splitting the namelists would generally be more difficult, but it shows there's value in splitting up the work. At the least, splitting the namelist into groups before parsing them individually is probably a better approach.

I think that I do something like this in the new parser (which has lagged unfortunately) but will make it a priority when I get back to it.

BTW I'm in the process of relocating my family to a new job overseas, so no idea when I'll get time to think about this.

jacobwilliams commented 5 years ago

FYI: I noticed something else. If the keys contain array notation, the parsing is dramatically slower. See the examples here.

'files/test.nml'     # 112 namelists -- short keys [8 sec]
'files/test4b.nml'   # 112 namelists -- longer keys no arrays  [9 sec]
'files/test4c.nml'   # 112 namelists -- longer keys w/ types  [12 sec]
'files/test4.nml'    # 112 namelists -- longer keys w/ array [42 sec]

This is killing me since all my namelists have many arrays. :)