Open dgergel opened 3 years ago
Thanks @dgergel! I'll take a look at this before our next meeting.
@dgergel, as a follow-up to our conversation yesterday, noting here the inputs available on S3 which match your specified criteria. https://github.com/pangeo-forge/cmip6-pipeline/pull/18 includes the utility class CMIPS3Search
, which I've used to retrieve these matches.
The 18 matches on S3 are collectively 82 GB in size:
variables = ["tasmax", "tasmin", "pr"]
datasets = [f".ssp585.r1i1p1f1.day.{v}." for v in variables]
ssp585 = CMIPS3Search(datasets, variables)
ssp585.print_sizes()
The return_inputs
method of CMIPS3Search
instances returns a dictionary which maps the dataset's 6-tuple identifier to a list of its source urls. CMIPS3Search
instances also have a tuples
attribute which is a list of all the matching 6-tuple identifiers on S3:
inputs = ssp585.return_inputs()
inputs[ssp585.tuples[0]]
['s3://esgf-world/CMIP6/ScenarioMIP/NOAA-GFDL/GFDL-CM4/ssp585/r1i1p1f1/day/tasmax/gr1/v20180701/tasmax_day_GFDL-CM4_ssp585_r1i1p1f1_gr1_20150101-20341231.nc',
's3://esgf-world/CMIP6/ScenarioMIP/NOAA-GFDL/GFDL-CM4/ssp585/r1i1p1f1/day/tasmax/gr1/v20180701/tasmax_day_GFDL-CM4_ssp585_r1i1p1f1_gr1_20350101-20541231.nc',
's3://esgf-world/CMIP6/ScenarioMIP/NOAA-GFDL/GFDL-CM4/ssp585/r1i1p1f1/day/tasmax/gr1/v20180701/tasmax_day_GFDL-CM4_ssp585_r1i1p1f1_gr1_20550101-20741231.nc',
's3://esgf-world/CMIP6/ScenarioMIP/NOAA-GFDL/GFDL-CM4/ssp585/r1i1p1f1/day/tasmax/gr1/v20180701/tasmax_day_GFDL-CM4_ssp585_r1i1p1f1_gr1_20750101-20941231.nc',
's3://esgf-world/CMIP6/ScenarioMIP/NOAA-GFDL/GFDL-CM4/ssp585/r1i1p1f1/day/tasmax/gr1/v20180701/tasmax_day_GFDL-CM4_ssp585_r1i1p1f1_gr1_20950101-21001231.nc']
Once you link to your code for crawling the full ESGF catalog, I can incorporate those inputs into this evolving recipe as well.
@cisaacstern this looks great. I just created a PR in pangeo-forge/cmip6-pipeline with my refactored code. I hadn't worked on this since winter, so some of it may be a bit out of date (I commented on this in the PR as well, but hoping @naomi-henderson can look over it to see if there are any functions that should be deprecated and replaced with newer ones).
I'm envisioning that the functions under cmip6-cloud/esgf.py
could be used to create a similar CMIPESGFSearch
utility class, perhaps with some flags if an ESGF node is known to be down (and thus shouldn't be searched). Or perhaps there could be a "priority" node and then other nodes attempted if that one is down. Naomi might have thoughts on this too.
Following up to our CMIP6-in-the-cloud collaboration meeting last week, wanted to include some specs that it would useful to test out the CMIP6 recipe with:
member_id
:r1i1p1f1
if available, otherwise analogous ensemble member (e.g.r2i1p1f1
)experiment_id
:ssp585
variable_id
:tasmax
,tasmin
,pr
table_id
:day
activity_id
:ScenarioMIP
Models: all available with the above specs
cc @cisaacstern @naomi-henderson