valeriupredoi / bgcval2

Package for BGCVal v2.0
3 stars 0 forks source link

Revamping time series interface to ease adding new pH analysis #39

Open ledm opened 1 year ago

ledm commented 1 year ago

Julien P. has requested that we add a few more fields to the analysis.

While this may seem like a simple task, I think it would be a good opportunity to re-vamp the analysis_timeseries and analysis_compare interfaces. I'm thinking of good ways to do this, and I'll add some ideas to this issue but it might be good to have a chat about it and come up with a plan.

Two sets of files:

or a single long massive yaml file:

or ... Something else?

The problem that I had before with this was that you can't include a function in yaml, only dicts. Will explain on the phone.

ledm commented 1 year ago

The fields that Julien wants are:

I propose adding these into a new suite called Alkalinity or something

ledm commented 1 year ago

In terms of the interface, it currently works like this.

This is how analysis_timeseries creates a list of analyses to run:

bgcKeys = []
bgcKeys.append('N')  # WOA Nitrate
...
if analysisSuite.lower() in [ 'bgc',]:
    analysisKeys.extend(bgcKeys)

where analysisSuite is provided in the command line.

The details on how to run that analysis is then provided in a long function:

    if 'N' in analysisKeys:
        name = 'Nitrate'
        av[name]['modelFiles'] = listModelDataFiles(jobID, 'ptrc_T',
                                                    paths.ModelFolder_pref,
                                                    annual)
        if annual:
            av[name]['dataFile'] = WOAFolder + '/woa13_all_n00_01.nc'
        else:
            av[name]['dataFile'] = WOAFolder + '/nitrate_monthly_1deg.nc'

        av[name]['modelcoords'] = medusaCoords
        av[name]['datacoords'] = woaCoords

        av[name]['modeldetails'] = {
            'name': name,
            'vars': [
                'DIN',
            ],
            'convert': ukp.NoChange,
            'units': 'mmol N/m^3'
        }
        av[name]['datadetails'] = {
            'name': name,
            'vars': [
                'n_an',
            ],
            'convert': ukp.NoChange,
            'units': 'mmol N/m^3'
        }

        av[name]['layers'] = layerList
        av[name]['regions'] = regionList
        av[name]['metrics'] = metricList  #['mean','median', ]

        av[name]['datasource'] = 'WOA'
        av[name]['model'] = 'MEDUSA'

        av[name]['modelgrid'] = 'eORCA1'
        av[name]['gridFile'] = paths.orcaGridfn
        av[name]['Dimensions'] = 3

The autovivification (av) is not actually passed to timeseriesAnalysis. Instead, it is unpacked:

        tsa = timeseriesAnalysis(
            av[name]['modelFiles'],
            av[name]['dataFile'],
            dataType=name,
            modelcoords=av[name]['modelcoords'],
            modeldetails=av[name]['modeldetails'],
            datacoords=av[name]['datacoords'],
            datadetails=av[name]['datadetails'],
            datasource=av[name]['datasource'],
            model=av[name]['model'],
            jobID=jobID,
            layers=av[name]['layers'],
            regions=av[name]['regions'],
            metrics=av[name]['metrics'],
            workingDir=shelvedir,
            imageDir=imagedir,
            grid=av[name]['modelgrid'],
            gridFile=av[name]['gridFile'],
            clean=clean,
        )

There has got to be a better way to do this.

ledm commented 1 year ago

There are two parts to this:

  1. Write a new interface that moves analysis details out of analysis_timeseries.py
  2. Migrate current contents of analysis_timeseries.py into new interface.

This is actually a kind of massive job, but I think getting it right would make the new user experience a lot better.

valeriupredoi commented 1 year ago

I would argue for a single config file - two or more files complicate things, and don't remove the annoyance from the side of the user; if you structure your config file with sections then some sections can stay unchanged (maybe?). Why do you want to put functions in a config file? Just name the functions and have them being called from Python from various modules, or a single, big. module

ledm commented 1 year ago

I'm not sure about a single file. It's currently 4000 lines, anything that does that in a single yml might be a bit much.

valeriupredoi commented 1 year ago

so even 4 files each of 1000 lines is waaay too much - I suggest we shrink the configuration first

ledm commented 1 year ago

In my defence, it was a lot shorter before the python2 -> python3 conversion, which expanded it a lot.

valeriupredoi commented 1 year ago

how did a Py2->Py3 conversion expand the configuration :rofl:

ledm commented 1 year ago

I'm going ahead with working on a keys interface right now. As I think this is the way forward. I don't think that putting this into a single massive config file is the right way.

ledm commented 1 year ago

how did a Py2->Py3 conversion expand the configuration 🤣

It blew up the total number of lines:

python2:

        av[name]['modeldetails'] = {'name': name, 'vars': ['AEOLIAN', ],'convert': modeldustsum, 'units': 'Gmol Fe/yr'}

python3

        av[name]['modeldetails'] = {
            'name': name,
            'vars': [
                'AEOLIAN',
            ],
            'convert': modeldustsum,
            'units': 'Gmol Fe/yr'
        }
valeriupredoi commented 1 year ago

sounds good - the idea of a config file is to have anything that a user would tune on a regular basis, if there are a lot of config values that stay the same over long periods, we can either let them sit in the code or we can do a config-developer that is not touched by the users many times

valeriupredoi commented 1 year ago

how did a Py2->Py3 conversion expand the configuration rofl

It blew up the total number of lines:

python2:

        av[name]['modeldetails'] = {'name': name, 'vars': ['AEOLIAN', ],'convert': modeldustsum, 'units': 'Gmol Fe/yr'}

python3

        av[name]['modeldetails'] = {
            'name': name,
            'vars': [
                'AEOLIAN',
            ],
            'convert': modeldustsum,
            'units': 'Gmol Fe/yr'
        }

OK but that didn't change the configuration, it just put things neatly in a user-readable format :grin:

valeriupredoi commented 1 year ago

also, if you want to move a lot of text to other files, you can think of CMOR-like files where we store all those variables

ledm commented 1 year ago

My plan is the following code structure:

So yeah, kinda like cmor in esmvaltool.

(This is why I said it would be good to have a chat - heading to lunch now)

ledm commented 1 year ago

Starting to work in this here: https://github.com/valeriupredoi/bgcval2/tree/improving_the_timeseries_suite_list_interface

valeriupredoi commented 1 year ago

I like it! I'd just not call dirs with _yml suffix :grin:

valeriupredoi commented 1 year ago

open (Draft) PR too, man, so we can debug on the way

ledm commented 1 year ago

Hey, so the AV stuff, I already did that, but in the GMD version of BGC-val: https://github.com/ledm/bgc-val-public/blob/main/ini/ukesm_jasmin.ini

This writes them as a single long list, but I think individual files is the way forward instead, I'm wondering if there's some easy way of bringing those changes over.

The code that reads these files is: https://github.com/ledm/bgc-val-public/blob/78798c774dc2e8c99107c2634c7327227c32d858/bgcvaltools/configparser.py

ledm commented 1 year ago

Bringing in the code from that bgc-val-public:

ledm commented 1 year ago

Okay, I've got some progress here for a debug mode. It's running! I'll set up a draft PR next.