Open ledm opened 1 year ago
The fields that Julien wants are:
I propose adding these into a new suite called Alkalinity
or something
In terms of the interface, it currently works like this.
This is how analysis_timeseries
creates a list of analyses to run:
bgcKeys = []
bgcKeys.append('N') # WOA Nitrate
...
if analysisSuite.lower() in [ 'bgc',]:
analysisKeys.extend(bgcKeys)
where analysisSuite is provided in the command line.
The details on how to run that analysis is then provided in a long function:
if 'N' in analysisKeys:
name = 'Nitrate'
av[name]['modelFiles'] = listModelDataFiles(jobID, 'ptrc_T',
paths.ModelFolder_pref,
annual)
if annual:
av[name]['dataFile'] = WOAFolder + '/woa13_all_n00_01.nc'
else:
av[name]['dataFile'] = WOAFolder + '/nitrate_monthly_1deg.nc'
av[name]['modelcoords'] = medusaCoords
av[name]['datacoords'] = woaCoords
av[name]['modeldetails'] = {
'name': name,
'vars': [
'DIN',
],
'convert': ukp.NoChange,
'units': 'mmol N/m^3'
}
av[name]['datadetails'] = {
'name': name,
'vars': [
'n_an',
],
'convert': ukp.NoChange,
'units': 'mmol N/m^3'
}
av[name]['layers'] = layerList
av[name]['regions'] = regionList
av[name]['metrics'] = metricList #['mean','median', ]
av[name]['datasource'] = 'WOA'
av[name]['model'] = 'MEDUSA'
av[name]['modelgrid'] = 'eORCA1'
av[name]['gridFile'] = paths.orcaGridfn
av[name]['Dimensions'] = 3
The autovivification (av) is not actually passed to timeseriesAnalysis
. Instead, it is unpacked:
tsa = timeseriesAnalysis(
av[name]['modelFiles'],
av[name]['dataFile'],
dataType=name,
modelcoords=av[name]['modelcoords'],
modeldetails=av[name]['modeldetails'],
datacoords=av[name]['datacoords'],
datadetails=av[name]['datadetails'],
datasource=av[name]['datasource'],
model=av[name]['model'],
jobID=jobID,
layers=av[name]['layers'],
regions=av[name]['regions'],
metrics=av[name]['metrics'],
workingDir=shelvedir,
imageDir=imagedir,
grid=av[name]['modelgrid'],
gridFile=av[name]['gridFile'],
clean=clean,
)
There has got to be a better way to do this.
There are two parts to this:
analysis_timeseries.py
analysis_timeseries.py
into new interface. This is actually a kind of massive job, but I think getting it right would make the new user experience a lot better.
I would argue for a single config file - two or more files complicate things, and don't remove the annoyance from the side of the user; if you structure your config file with sections then some sections can stay unchanged (maybe?). Why do you want to put functions in a config file? Just name the functions and have them being called from Python from various modules, or a single, big. module
I'm not sure about a single file. It's currently 4000 lines, anything that does that in a single yml might be a bit much.
so even 4 files each of 1000 lines is waaay too much - I suggest we shrink the configuration first
In my defence, it was a lot shorter before the python2 ->
python3 conversion, which expanded it a lot.
how did a Py2->Py3 conversion expand the configuration :rofl:
I'm going ahead with working on a keys interface right now. As I think this is the way forward. I don't think that putting this into a single massive config file is the right way.
how did a Py2->Py3 conversion expand the configuration 🤣
It blew up the total number of lines:
python2:
av[name]['modeldetails'] = {'name': name, 'vars': ['AEOLIAN', ],'convert': modeldustsum, 'units': 'Gmol Fe/yr'}
python3
av[name]['modeldetails'] = {
'name': name,
'vars': [
'AEOLIAN',
],
'convert': modeldustsum,
'units': 'Gmol Fe/yr'
}
sounds good - the idea of a config file is to have anything that a user would tune on a regular basis, if there are a lot of config values that stay the same over long periods, we can either let them sit in the code or we can do a config-developer that is not touched by the users many times
how did a Py2->Py3 conversion expand the configuration rofl
It blew up the total number of lines:
python2:
av[name]['modeldetails'] = {'name': name, 'vars': ['AEOLIAN', ],'convert': modeldustsum, 'units': 'Gmol Fe/yr'}
python3
av[name]['modeldetails'] = { 'name': name, 'vars': [ 'AEOLIAN', ], 'convert': modeldustsum, 'units': 'Gmol Fe/yr' }
OK but that didn't change the configuration, it just put things neatly in a user-readable format :grin:
also, if you want to move a lot of text to other files, you can think of CMOR-like files where we store all those variables
My plan is the following code structure:
keys_yml - new directory
variables_yml: new directory with lots of individual analysis files:
So yeah, kinda like cmor in esmvaltool.
(This is why I said it would be good to have a chat - heading to lunch now)
Starting to work in this here: https://github.com/valeriupredoi/bgcval2/tree/improving_the_timeseries_suite_list_interface
I like it! I'd just not call dirs with _yml
suffix :grin:
open (Draft) PR too, man, so we can debug on the way
Hey, so the AV stuff, I already did that, but in the GMD version of BGC-val: https://github.com/ledm/bgc-val-public/blob/main/ini/ukesm_jasmin.ini
This writes them as a single long list, but I think individual files is the way forward instead, I'm wondering if there's some easy way of bringing those changes over.
The code that reads these files is: https://github.com/ledm/bgc-val-public/blob/78798c774dc2e8c99107c2634c7327227c32d858/bgcvaltools/configparser.py
Bringing in the code from that bgc-val-public:
---
at the start and brief description[name]
bgcval2
before functions
- new path.Okay, I've got some progress here for a debug mode. It's running! I'll set up a draft PR next.
Julien P. has requested that we add a few more fields to the analysis.
While this may seem like a simple task, I think it would be a good opportunity to re-vamp the
analysis_timeseries
andanalysis_compare
interfaces. I'm thinking of good ways to do this, and I'll add some ideas to this issue but it might be good to have a chat about it and come up with a plan.Two sets of files:
or a single long massive yaml file:
or ... Something else?
The problem that I had before with this was that you can't include a function in yaml, only dicts. Will explain on the phone.