Revamping time series interface to ease adding new pH analysis

ledm commented 1 year ago

Julien P. has requested that we add a few more fields to the analysis.

While this may seem like a simple task, I think it would be a good opportunity to re-vamp the analysis_timeseries and analysis_compare interfaces. I'm thinking of good ways to do this, and I'll add some ideas to this issue but it might be good to have a chat about it and come up with a plan.

Two sets of files:

replace list (physkeys, etc) with yaml lists of individual model keys
replace individual analysis with a yaml file that contains:
- all the input details
- info about coordinates
- info about obs
- etc

or a single long massive yaml file:

make a master analysis yaml with:
- list of switches to turn on and off individual keys
- list of keys
  - list of details for each key

or ... Something else?

The problem that I had before with this was that you can't include a function in yaml, only dicts. Will explain on the phone.

ledm commented 1 year ago

The fields that Julien wants are:

[ ] "RIV_ALK" : alk flux added,
[ ] "TOT_SHALK" : total alkalinity added (because we want to know how much is added in total)
[ ] "ATM_PCO2" : atmospheric pCO2
[ ] "CO2FLUX" : CO2 air-sea flux
[ ] "ATM_XCO2"
[ ] "OCN_PCO2"

I propose adding these into a new suite called Alkalinity or something

ledm commented 1 year ago

In terms of the interface, it currently works like this.

This is how analysis_timeseries creates a list of analyses to run:

bgcKeys = []
bgcKeys.append('N')  # WOA Nitrate
...
if analysisSuite.lower() in [ 'bgc',]:
    analysisKeys.extend(bgcKeys)

where analysisSuite is provided in the command line.

The details on how to run that analysis is then provided in a long function:

    if 'N' in analysisKeys:
        name = 'Nitrate'
        av[name]['modelFiles'] = listModelDataFiles(jobID, 'ptrc_T',
                                                    paths.ModelFolder_pref,
                                                    annual)
        if annual:
            av[name]['dataFile'] = WOAFolder + '/woa13_all_n00_01.nc'
        else:
            av[name]['dataFile'] = WOAFolder + '/nitrate_monthly_1deg.nc'

        av[name]['modelcoords'] = medusaCoords
        av[name]['datacoords'] = woaCoords

        av[name]['modeldetails'] = {
            'name': name,
            'vars': [
                'DIN',
            ],
            'convert': ukp.NoChange,
            'units': 'mmol N/m^3'
        }
        av[name]['datadetails'] = {
            'name': name,
            'vars': [
                'n_an',
            ],
            'convert': ukp.NoChange,
            'units': 'mmol N/m^3'
        }

        av[name]['layers'] = layerList
        av[name]['regions'] = regionList
        av[name]['metrics'] = metricList  #['mean','median', ]

        av[name]['datasource'] = 'WOA'
        av[name]['model'] = 'MEDUSA'

        av[name]['modelgrid'] = 'eORCA1'
        av[name]['gridFile'] = paths.orcaGridfn
        av[name]['Dimensions'] = 3

The autovivification (av) is not actually passed to timeseriesAnalysis. Instead, it is unpacked:

        tsa = timeseriesAnalysis(
            av[name]['modelFiles'],
            av[name]['dataFile'],
            dataType=name,
            modelcoords=av[name]['modelcoords'],
            modeldetails=av[name]['modeldetails'],
            datacoords=av[name]['datacoords'],
            datadetails=av[name]['datadetails'],
            datasource=av[name]['datasource'],
            model=av[name]['model'],
            jobID=jobID,
            layers=av[name]['layers'],
            regions=av[name]['regions'],
            metrics=av[name]['metrics'],
            workingDir=shelvedir,
            imageDir=imagedir,
            grid=av[name]['modelgrid'],
            gridFile=av[name]['gridFile'],
            clean=clean,
        )

There has got to be a better way to do this.

ledm commented 1 year ago

There are two parts to this:

Write a new interface that moves analysis details out of analysis_timeseries.py
Migrate current contents of analysis_timeseries.py into new interface.

This is actually a kind of massive job, but I think getting it right would make the new user experience a lot better.

valeriupredoi commented 1 year ago

I would argue for a single config file - two or more files complicate things, and don't remove the annoyance from the side of the user; if you structure your config file with sections then some sections can stay unchanged (maybe?). Why do you want to put functions in a config file? Just name the functions and have them being called from Python from various modules, or a single, big. module

ledm commented 1 year ago

I'm not sure about a single file. It's currently 4000 lines, anything that does that in a single yml might be a bit much.

valeriupredoi commented 1 year ago

so even 4 files each of 1000 lines is waaay too much - I suggest we shrink the configuration first

ledm commented 1 year ago

In my defence, it was a lot shorter before the python2 -> python3 conversion, which expanded it a lot.

valeriupredoi commented 1 year ago

how did a Py2->Py3 conversion expand the configuration :rofl:

ledm commented 1 year ago

I'm going ahead with working on a keys interface right now. As I think this is the way forward. I don't think that putting this into a single massive config file is the right way.

ledm commented 1 year ago

how did a Py2->Py3 conversion expand the configuration 🤣

It blew up the total number of lines:

python2:

        av[name]['modeldetails'] = {'name': name, 'vars': ['AEOLIAN', ],'convert': modeldustsum, 'units': 'Gmol Fe/yr'}

python3

        av[name]['modeldetails'] = {
            'name': name,
            'vars': [
                'AEOLIAN',
            ],
            'convert': modeldustsum,
            'units': 'Gmol Fe/yr'
        }

valeriupredoi commented 1 year ago

sounds good - the idea of a config file is to have anything that a user would tune on a regular basis, if there are a lot of config values that stay the same over long periods, we can either let them sit in the code or we can do a config-developer that is not touched by the users many times

valeriupredoi commented 1 year ago

how did a Py2->Py3 conversion expand the configuration rofl

It blew up the total number of lines:

python2:

        av[name]['modeldetails'] = {'name': name, 'vars': ['AEOLIAN', ],'convert': modeldustsum, 'units': 'Gmol Fe/yr'}

python3

        av[name]['modeldetails'] = {
            'name': name,
            'vars': [
                'AEOLIAN',
            ],
            'convert': modeldustsum,
            'units': 'Gmol Fe/yr'
        }

OK but that didn't change the configuration, it just put things neatly in a user-readable format :grin:

valeriupredoi commented 1 year ago

also, if you want to move a lot of text to other files, you can think of CMOR-like files where we store all those variables

ledm commented 1 year ago

My plan is the following code structure:

keys_yml - new directory
- bgc.yml: a file with the bgc keys
- physics.yml: a file with a list of physics keys
- etc
variables_yml: new directory with lots of individual analysis files:
- temperature.yml: single yml file with details for temperatrure analysis
- Ice: single yml file with some ice analysis.
- etc...

So yeah, kinda like cmor in esmvaltool.

(This is why I said it would be good to have a chat - heading to lunch now)

ledm commented 1 year ago

Starting to work in this here: https://github.com/valeriupredoi/bgcval2/tree/improving_the_timeseries_suite_list_interface

valeriupredoi commented 1 year ago

I like it! I'd just not call dirs with _yml suffix :grin:

valeriupredoi commented 1 year ago

open (Draft) PR too, man, so we can debug on the way

ledm commented 1 year ago

Hey, so the AV stuff, I already did that, but in the GMD version of BGC-val: https://github.com/ledm/bgc-val-public/blob/main/ini/ukesm_jasmin.ini

This writes them as a single long list, but I think individual files is the way forward instead, I'm wondering if there's some easy way of bringing those changes over.

The code that reads these files is: https://github.com/ledm/bgc-val-public/blob/78798c774dc2e8c99107c2634c7327227c32d858/bgcvaltools/configparser.py

ledm commented 1 year ago

Bringing in the code from that bgc-val-public:

[ ] Split ukesm_jasmin.ini into multiple single files and convert them to yaml format. Format change is:
- Add --- at the start and brief description
- Remove [name]
- add bgcval2 before functions - new path.
[ ] Bring over functions directory and convert it to python3 and bgcval2 format.
- @valeriupredoi - I'm currently struggling with 2to3 as a conversion for this code. It doesn't seem to work.
[ ] Set up tools, perhaps using bgc-val-public's config parser to load yaml file.

ledm commented 1 year ago

Okay, I've got some progress here for a debug mode. It's running! I'll set up a draft PR next.

valeriupredoi / bgcval2

Revamping time series interface to ease adding new pH analysis #39