oceanhackweek / ohw21-proj-cmip-ard

Repository to support the 2021 OceanHackweek project CMIP analysis ready data (ARD) workflow: turning big climate projection data into useful inputs for modelling or analysis
1 stars 14 forks source link

Use Case 1 Region Timeseries #16

Open rwegener2 opened 3 years ago

rwegener2 commented 3 years ago

Description

From the README:

Issue background

From the README: Use case 1: SST & surface nitrate projections for Arabian Sea ecological modelling

A hypothetical ecological model of the Arabian Sea includes basin averaged SST and surface nitrate as inputs. Researchers would like to explore ecosystem responses to future climate change projections under both SSP1-2.6 and SSP5-8.5 scenarios. A monthly timeseries through 2100, averaged over the Arabian Sea, is required from a range of CMIP6 models to start the effort, and ouput in a useful format for import into R is required. Another useful final product would be regridding all models onto a common grid for further comparison.

See also

17 #15

jbusecke commented 3 years ago

Nice! Do you envision this to be performed on the native grids or after #17 ?

Sumanshekhar17 commented 3 years ago

I am unaware of the term native grids. @jbusecke can you point me I can take it from there

jbusecke commented 3 years ago

To select a 'smaller than basin' region like the Arabian Sea, you might want to have a look at regionmask, in particular the natural earth ocean basins.

jbusecke commented 3 years ago

I am unaware of the term native grids. @jbusecke can you point me I can take it from there

Ah sorry for the jargon. The 'native' grid is the grid the ocean model is run on. Theser are usually not just regularly divided into lon/lat intervals, but instead can have quite complex geometry (example). Many variables have an additonal grid_label (e.g. gr) which is the same output but regridded into a regular lon/lat grid. Does that make sense?

Sumanshekhar17 commented 3 years ago

I am unaware of the term native grids. @jbusecke can you point me I can take it from there

Ah sorry for the jargon. The 'native' grid is the grid the ocean model is run on. Theser are usually not just regularly divided into lon/lat intervals, but instead can have quite complex geometry (example). Many variables have an additonal grid_label (e.g. gr) which is the same output but regridded into a regular lon/lat grid. Does that make sense?

Yeah. Just a small question here I am assuming that every climate model use same grid and they use interpolation packages to regrid their grid. I just want a clear picture in my mind.

Sumanshekhar17 commented 3 years ago

Nice! Do you envision this to be performed on the native grids or after #17 ?

Now coming to your question, we are right now postprocess on the native grid as you have shown us in the notebook, But we are planning to regrid so that it can give use the common platform to analyze time series data. Please correct me If anywhere I do mistake I still an undergrad and learning things.

Thomas-Moore-Creative commented 3 years ago

Use case 1 is hopefully designed to give opportunities to try these different steps in a typical workflow.

A monthly timeseries through 2100, averaged over the Arabian Sea, is required from a range of CMIP6 models to start the effort, and ouput in a useful format for import into R is required. Another useful final product would be regridding all models onto a common grid for further comparison.

So to my mind the first part could be timeseries generated for each model in each of their "native" grids. But another goal product would require regridding onto a common grid.

Sumanshekhar17 commented 3 years ago

I am facing this issue with importing a model which have both tos and no3os, inspired from @isugiura 's code, I have defined a function where I took intersection between the lists using intersect1d( ) method from numpy package -

def model_list(experiment_id1,experiment_id2,variable_id1,variable_id2):
    list1 = col.df[col.df.experiment_id == experiment_id1].source_id.unique()
    list2 = col.df[col.df.experiment_id == experiment_id2].source_id.unique()

    elist = np.intersect1d(list1,list2)

    # all source_id with variable_id == 'no3os' or 'tos'
    vlist1 = col.df[col.df.variable_id == variable_id1].source_id.unique()
    vlist2 = col.df[col.df.variable_id == variable_id2].source_id.unique()

    vlist = np.intersect1d(vlist1,vlist2)

    model = np.intersect1d(vlist,elist)
    return model

Now applying search method just for these models-

models = model_list('historical','ssp585','tos','no3os')

cat = col.search(
    source_id=models,
    grid_label='gn',
    table_id='Omon',
    member_id = ['r2i1p1f1', 'r3i1p1f1', 'r2i1p2f1', 'r3i1p2f1'] 
)
cat.df.groupby("source_id")[
    ["experiment_id", "variable_id", "table_id", "member_id"]
].nunique()

doubt

You can see that there are three models anyhow making there way into the dataset. I don't know where is the issue can any one please help me out here?

isugiura commented 3 years ago

I am facing this issue with importing a model which have both tos and no3os, inspired from @isugiura 's code, I have defined a function where I took intersection between the lists using intersect1d( ) method from numpy package -

def model_list(experiment_id1,experiment_id2,variable_id1,variable_id2):
    list1 = col.df[col.df.experiment_id == experiment_id1].source_id.unique()
    list2 = col.df[col.df.experiment_id == experiment_id2].source_id.unique()

    elist = np.intersect1d(list1,list2)

    # all source_id with variable_id == 'no3os' or 'tos'
    vlist1 = col.df[col.df.variable_id == variable_id1].source_id.unique()
    vlist2 = col.df[col.df.variable_id == variable_id2].source_id.unique()

    vlist = np.intersect1d(vlist1,vlist2)

    model = np.intersect1d(vlist,elist)
    return model

Now applying search method just for these models-

models = model_list('historical','ssp585','tos','no3os')

cat = col.search(
    source_id=models,
    grid_label='gn',
    table_id='Omon',
    member_id = ['r2i1p1f1', 'r3i1p1f1', 'r2i1p2f1', 'r3i1p2f1'] 
)
cat.df.groupby("source_id")[
    ["experiment_id", "variable_id", "table_id", "member_id"]
].nunique()

doubt

You can see that there are three models anyhow making there way into the dataset. I don't know where is the issue can any one please help me out here?

Interesting. When I run your notebook on my JupyterLab, it seems that I'm getting different results. Do you know what is happening?

Screen Shot 2021-08-05 at 10 30 09
Sumanshekhar17 commented 3 years ago

@isugiura sorry lil mistake in the code-

use this to select data. This gives the result that I mentioned earlier comment

models = model_list('historical','ssp585','tos','no3os') #function to filter model from datalog

cat = col.search(
    experiment_id = ['historical','ssp585'],
    variable_id = ['tos','no3os'],
    source_id=models,
    grid_label='gn',
    table_id='Omon',
    member_id = ['r2i1p1f1', 'r3i1p1f1', 'r2i1p2f1', 'r3i1p2f1'] 
)
cat.df.groupby("source_id")[
    ["experiment_id", "variable_id", "table_id", "member_id"]
].nunique()
jbusecke commented 3 years ago

Try to drop the member_id search requirements! This was just a way to restrict the datasets for the example. You could also try to additionally drop the grid_label (giving you both regridded and native output). Does that yield more results?