rwegener2 commented 3 years ago

Description

From the README:

[ ] (1) Loading SST and surface nitrate for your chosen CMIP6 multi-model ensemble (in this case arbitrarily choose a few models only, keeping in mind hub memory limits)
[ ] (2) Visualise the spread of values across the multi-model ensembles for each scenario.
[ ] (3) Do we need to de-drift each CMIP model and how?
[ ] (4) Build an appropriate mask for computing the Arabian Sea spatially-averaged timeseries.
[ ] (5) Generate the region-average timeseries considering appropriate weighting for each model's grid. Format the timeseries data in a useful, human readable way.

Issue background

From the README: Use case 1: SST & surface nitrate projections for Arabian Sea ecological modelling

A hypothetical ecological model of the Arabian Sea includes basin averaged SST and surface nitrate as inputs. Researchers would like to explore ecosystem responses to future climate change projections under both SSP1-2.6 and SSP5-8.5 scenarios. A monthly timeseries through 2100, averaged over the Arabian Sea, is required from a range of CMIP6 models to start the effort, and ouput in a useful format for import into R is required. Another useful final product would be regridding all models onto a common grid for further comparison.

17 #15

jbusecke commented 3 years ago

Nice! Do you envision this to be performed on the native grids or after #17 ?

Sumanshekhar17 commented 3 years ago

I am unaware of the term native grids. @jbusecke can you point me I can take it from there

jbusecke commented 3 years ago

To select a 'smaller than basin' region like the Arabian Sea, you might want to have a look at regionmask, in particular the natural earth ocean basins.

jbusecke commented 3 years ago

I am unaware of the term native grids. @jbusecke can you point me I can take it from there

Ah sorry for the jargon. The 'native' grid is the grid the ocean model is run on. Theser are usually not just regularly divided into lon/lat intervals, but instead can have quite complex geometry (example). Many variables have an additonal grid_label (e.g. gr) which is the same output but regridded into a regular lon/lat grid. Does that make sense?

Sumanshekhar17 commented 3 years ago

I am unaware of the term native grids. @jbusecke can you point me I can take it from there

Ah sorry for the jargon. The 'native' grid is the grid the ocean model is run on. Theser are usually not just regularly divided into lon/lat intervals, but instead can have quite complex geometry (example). Many variables have an additonal grid_label (e.g. gr) which is the same output but regridded into a regular lon/lat grid. Does that make sense?

Yeah. Just a small question here I am assuming that every climate model use same grid and they use interpolation packages to regrid their grid. I just want a clear picture in my mind.

Sumanshekhar17 commented 3 years ago

Nice! Do you envision this to be performed on the native grids or after #17 ?

Now coming to your question, we are right now postprocess on the native grid as you have shown us in the notebook, But we are planning to regrid so that it can give use the common platform to analyze time series data. Please correct me If anywhere I do mistake I still an undergrad and learning things.

Thomas-Moore-Creative commented 3 years ago

Use case 1 is hopefully designed to give opportunities to try these different steps in a typical workflow.

A monthly timeseries through 2100, averaged over the Arabian Sea, is required from a range of CMIP6 models to start the effort, and ouput in a useful format for import into R is required. Another useful final product would be regridding all models onto a common grid for further comparison.

So to my mind the first part could be timeseries generated for each model in each of their "native" grids. But another goal product would require regridding onto a common grid.

Sumanshekhar17 commented 3 years ago

I am facing this issue with importing a model which have both tos and no3os, inspired from @isugiura 's code, I have defined a function where I took intersection between the lists using intersect1d( ) method from numpy package -

def model_list(experiment_id1,experiment_id2,variable_id1,variable_id2):
    list1 = col.df[col.df.experiment_id == experiment_id1].source_id.unique()
    list2 = col.df[col.df.experiment_id == experiment_id2].source_id.unique()

    elist = np.intersect1d(list1,list2)

    # all source_id with variable_id == 'no3os' or 'tos'
    vlist1 = col.df[col.df.variable_id == variable_id1].source_id.unique()
    vlist2 = col.df[col.df.variable_id == variable_id2].source_id.unique()

    vlist = np.intersect1d(vlist1,vlist2)

    model = np.intersect1d(vlist,elist)
    return model

Now applying search method just for these models-

models = model_list('historical','ssp585','tos','no3os')

cat = col.search(
    source_id=models,
    grid_label='gn',
    table_id='Omon',
    member_id = ['r2i1p1f1', 'r3i1p1f1', 'r2i1p2f1', 'r3i1p2f1'] 
)
cat.df.groupby("source_id")[
    ["experiment_id", "variable_id", "table_id", "member_id"]
].nunique()

doubt

You can see that there are three models anyhow making there way into the dataset. I don't know where is the issue can any one please help me out here?

isugiura commented 3 years ago

I am facing this issue with importing a model which have both tos and no3os, inspired from @isugiura 's code, I have defined a function where I took intersection between the lists using intersect1d( ) method from numpy package -
def model_list(experiment_id1,experiment_id2,variable_id1,variable_id2):
    list1 = col.df[col.df.experiment_id == experiment_id1].source_id.unique()
    list2 = col.df[col.df.experiment_id == experiment_id2].source_id.unique()

    elist = np.intersect1d(list1,list2)

    # all source_id with variable_id == 'no3os' or 'tos'
    vlist1 = col.df[col.df.variable_id == variable_id1].source_id.unique()
    vlist2 = col.df[col.df.variable_id == variable_id2].source_id.unique()

    vlist = np.intersect1d(vlist1,vlist2)

    model = np.intersect1d(vlist,elist)
    return model
Now applying search method just for these models-
models = model_list('historical','ssp585','tos','no3os')

cat = col.search(
    source_id=models,
    grid_label='gn',
    table_id='Omon',
    member_id = ['r2i1p1f1', 'r3i1p1f1', 'r2i1p2f1', 'r3i1p2f1'] 
)
cat.df.groupby("source_id")[
    ["experiment_id", "variable_id", "table_id", "member_id"]
].nunique()
You can see that there are three models anyhow making there way into the dataset. I don't know where is the issue can any one please help me out here?

Interesting. When I run your notebook on my JupyterLab, it seems that I'm getting different results. Do you know what is happening?

Sumanshekhar17 commented 3 years ago

@isugiura sorry lil mistake in the code-

use this to select data. This gives the result that I mentioned earlier comment

models = model_list('historical','ssp585','tos','no3os') #function to filter model from datalog

cat = col.search(
    experiment_id = ['historical','ssp585'],
    variable_id = ['tos','no3os'],
    source_id=models,
    grid_label='gn',
    table_id='Omon',
    member_id = ['r2i1p1f1', 'r3i1p1f1', 'r2i1p2f1', 'r3i1p2f1'] 
)
cat.df.groupby("source_id")[
    ["experiment_id", "variable_id", "table_id", "member_id"]
].nunique()

jbusecke commented 3 years ago

Try to drop the member_id search requirements! This was just a way to restrict the datasets for the example. You could also try to additionally drop the grid_label (giving you both regridded and native output). Does that yield more results?

oceanhackweek / ohw21-proj-cmip-ard

Use Case 1 Region Timeseries #16

Description

Issue background

See also

17 #15