marbl-ecosys / MARBL

Marine Biogeochemistry Library
https://marbl-ecosys.github.io
Other
14 stars 25 forks source link

New tool for generating diagnostic list file #369

Open mnlevy1981 opened 3 years ago

mnlevy1981 commented 3 years ago

Had a conversation with @klindsay28 about generating ecosystem_diagnostics for POP based on a spreadsheet that organizes variables by how frequently they should be written to disk. So the workflow would be

  1. Create text files containing a list of variables (one per line) that all should be output at the same frequency; i.e. one file for all the monthly variables, one for all the daily, one for all the annual. Maybe something like

    • daily.txt:

      var1
      var2
      var3
    • monthly.txt:

      var4
      var5
    • annual.txt:

      var1
      var6
  2. Run a script that generates

    var1 : high_average,low_average
    var2 : high_average
    var3 : high_average
    var4 : medium_average
    var5 : medium_average
    var6 : low_average

The best API for the tool is still an open question. Some considerations

  1. command-line arguments for each frequency, such as --high-freq-file daily.txt --medium-freq-file monthly.txt --low-freq-file annual.txt
  2. command-line arguments for lists of files and frequencies such as --files daily.txt monthly.txt annual.txt --frequencies high medium low
  3. Some sort of configuration file (YAML?) that contains information of all the files and all the frequenices

An optional argument containing a list of valid variables to include (perhaps an existing diagnostic file that could be parsed to pull out variables) would also be useful so that the columns in the spreadsheet containing the output variables don't need to be sorted into BGC / non-BGC. In the example above, maybe var3 isn't coming from MARBL or based on a MARBL tracer, in which case we'd want the final output to be

var1 : high_average,low_average
var2 : high_average
var4 : medium_average
var5 : medium_average
var6 : low_average