Open jbusecke opened 6 months ago
Ok so in the newest run over at #145 I am able to ingest pretty much whatever I want from the file level like this:
import xarray as xr
ds = xr.open_dataset("gs://leap-scratch/data-library/cmip6-pr-copied/8979323652_1/CMIP6.HighResMIP.CMCC.CMCC-CM2-VHR4.highres-future.r1i1p1f1.6hrPlevPt.psl.gn.v20190509.zarr", engine='zarr')
ds.attrs['pangeo_forge_file_data']
{'checksum': [['6619b7522b9595714ea5c502d2681357e5f913431950fcfe1146289e144b350e'],
['a52dca9f7b0e3453f2d3c6bcc0a9437632d6207df624f729b4260a535a3cd23c'],
['e1c9f8ceb133b2bace66fcc51b03df1308f7bb31328af22709f0b2a0cbdc9032'],
['412d05bdcfe8aaeaa0d45ca7771e50cbb71adb628cf5900aebd9fe2456925b98'],
['2475a1bea861704589bfde45bbbf3072d6c809319d9492621375f5140bed9940'],
['62f83110ffc9ad8e03352dd500040864ab1994d98f155a3db441b1973e6f76c5'],
['289c4613e27ca0f1b98d9a71d063acf51f7a9294bdc40001557747c776e6401c'],
['a99f40333c6816933349ef0cd564432cab789a03ca303a265b199f389ed03fe9'],
['fc580bce392e3aff0e6ed182da6871c2f02907c7760ad6c3eb93ddc1a0698e71'],
['665b97fe0b5e2215cc1414520874c44eae556474612ae0fcab16f67d1a341de3']],
'checksum_type': [['SHA256'],
['SHA256'],
['SHA256'],
['SHA256'],
['SHA256'],
['SHA256'],
['SHA256'],
['SHA256'],
['SHA256'],
['SHA256']],
'citation_url': [['http://cera-www.dkrz.de/WDCC/meta/CMIP6/CMIP6.HighResMIP.CMCC.CMCC-CM2-VHR4.highres-future.r1i1p1f1.6hrPlevPt.psl.gn.v20190509.json'],
['http://cera-www.dkrz.de/WDCC/meta/CMIP6/CMIP6.HighResMIP.CMCC.CMCC-CM2-VHR4.highres-future.r1i1p1f1.6hrPlevPt.psl.gn.v20190509.json'],
['http://cera-www.dkrz.de/WDCC/meta/CMIP6/CMIP6.HighResMIP.CMCC.CMCC-CM2-VHR4.highres-future.r1i1p1f1.6hrPlevPt.psl.gn.v20190509.json'],
['http://cera-www.dkrz.de/WDCC/meta/CMIP6/CMIP6.HighResMIP.CMCC.CMCC-CM2-VHR4.highres-future.r1i1p1f1.6hrPlevPt.psl.gn.v20190509.json'],
['http://cera-www.dkrz.de/WDCC/meta/CMIP6/CMIP6.HighResMIP.CMCC.CMCC-CM2-VHR4.highres-future.r1i1p1f1.6hrPlevPt.psl.gn.v20190509.json'],
['http://cera-www.dkrz.de/WDCC/meta/CMIP6/CMIP6.HighResMIP.CMCC.CMCC-CM2-VHR4.highres-future.r1i1p1f1.6hrPlevPt.psl.gn.v20190509.json'],
['http://cera-www.dkrz.de/WDCC/meta/CMIP6/CMIP6.HighResMIP.CMCC.CMCC-CM2-VHR4.highres-future.r1i1p1f1.6hrPlevPt.psl.gn.v20190509.json'],
['http://cera-www.dkrz.de/WDCC/meta/CMIP6/CMIP6.HighResMIP.CMCC.CMCC-CM2-VHR4.highres-future.r1i1p1f1.6hrPlevPt.psl.gn.v20190509.json'],
['http://cera-www.dkrz.de/WDCC/meta/CMIP6/CMIP6.HighResMIP.CMCC.CMCC-CM2-VHR4.highres-future.r1i1p1f1.6hrPlevPt.psl.gn.v20190509.json'],
['http://cera-www.dkrz.de/WDCC/meta/CMIP6/CMIP6.HighResMIP.CMCC.CMCC-CM2-VHR4.highres-future.r1i1p1f1.6hrPlevPt.psl.gn.v20190509.json']],
'data_node': ['esgf.ceda.ac.uk',
'esgf.ceda.ac.uk',
'esgf.ceda.ac.uk',
'esgf.ceda.ac.uk',
'esgf.ceda.ac.uk',
'esgf.ceda.ac.uk',
'esgf.ceda.ac.uk',
'esgf.ceda.ac.uk',
'esgf.ceda.ac.uk',
'esgf.ceda.ac.uk'],
'further_info_url': [['https://furtherinfo.es-doc.org/CMIP6.CMCC.CMCC-CM2-VHR4.highres-future.none.r1i1p1f1'],
['https://furtherinfo.es-doc.org/CMIP6.CMCC.CMCC-CM2-VHR4.highres-future.none.r1i1p1f1'],
['https://furtherinfo.es-doc.org/CMIP6.CMCC.CMCC-CM2-VHR4.highres-future.none.r1i1p1f1'],
['https://furtherinfo.es-doc.org/CMIP6.CMCC.CMCC-CM2-VHR4.highres-future.none.r1i1p1f1'],
['https://furtherinfo.es-doc.org/CMIP6.CMCC.CMCC-CM2-VHR4.highres-future.none.r1i1p1f1'],
['https://furtherinfo.es-doc.org/CMIP6.CMCC.CMCC-CM2-VHR4.highres-future.none.r1i1p1f1'],
['https://furtherinfo.es-doc.org/CMIP6.CMCC.CMCC-CM2-VHR4.highres-future.none.r1i1p1f1'],
['https://furtherinfo.es-doc.org/CMIP6.CMCC.CMCC-CM2-VHR4.highres-future.none.r1i1p1f1'],
['https://furtherinfo.es-doc.org/CMIP6.CMCC.CMCC-CM2-VHR4.highres-future.none.r1i1p1f1'],
['https://furtherinfo.es-doc.org/CMIP6.CMCC.CMCC-CM2-VHR4.highres-future.none.r1i1p1f1']],
'id': ['CMIP6.HighResMIP.CMCC.CMCC-CM2-VHR4.highres-future.r1i1p1f1.6hrPlevPt.psl.gn.v20190509.psl_6hrPlevPt_CMCC-CM2-VHR4_highres-future_r1i1p1f1_gn_201501010000-201501311800.nc|esgf.ceda.ac.uk',
'CMIP6.HighResMIP.CMCC.CMCC-CM2-VHR4.highres-future.r1i1p1f1.6hrPlevPt.psl.gn.v20190509.psl_6hrPlevPt_CMCC-CM2-VHR4_highres-future_r1i1p1f1_gn_201502010000-201502281800.nc|esgf.ceda.ac.uk',
'CMIP6.HighResMIP.CMCC.CMCC-CM2-VHR4.highres-future.r1i1p1f1.6hrPlevPt.psl.gn.v20190509.psl_6hrPlevPt_CMCC-CM2-VHR4_highres-future_r1i1p1f1_gn_201503010000-201503311800.nc|esgf.ceda.ac.uk',
'CMIP6.HighResMIP.CMCC.CMCC-CM2-VHR4.highres-future.r1i1p1f1.6hrPlevPt.psl.gn.v20190509.psl_6hrPlevPt_CMCC-CM2-VHR4_highres-future_r1i1p1f1_gn_201504010000-201504301800.nc|esgf.ceda.ac.uk',
'CMIP6.HighResMIP.CMCC.CMCC-CM2-VHR4.highres-future.r1i1p1f1.6hrPlevPt.psl.gn.v20190509.psl_6hrPlevPt_CMCC-CM2-VHR4_highres-future_r1i1p1f1_gn_201505010000-201505311800.nc|esgf.ceda.ac.uk',
'CMIP6.HighResMIP.CMCC.CMCC-CM2-VHR4.highres-future.r1i1p1f1.6hrPlevPt.psl.gn.v20190509.psl_6hrPlevPt_CMCC-CM2-VHR4_highres-future_r1i1p1f1_gn_201506010000-201506301800.nc|esgf.ceda.ac.uk',
'CMIP6.HighResMIP.CMCC.CMCC-CM2-VHR4.highres-future.r1i1p1f1.6hrPlevPt.psl.gn.v20190509.psl_6hrPlevPt_CMCC-CM2-VHR4_highres-future_r1i1p1f1_gn_201507010000-201507311800.nc|esgf.ceda.ac.uk',
'CMIP6.HighResMIP.CMCC.CMCC-CM2-VHR4.highres-future.r1i1p1f1.6hrPlevPt.psl.gn.v20190509.psl_6hrPlevPt_CMCC-CM2-VHR4_highres-future_r1i1p1f1_gn_201508010000-201508311800.nc|esgf.ceda.ac.uk',
'CMIP6.HighResMIP.CMCC.CMCC-CM2-VHR4.highres-future.r1i1p1f1.6hrPlevPt.psl.gn.v20190509.psl_6hrPlevPt_CMCC-CM2-VHR4_highres-future_r1i1p1f1_gn_201509010000-201509301800.nc|esgf.ceda.ac.uk',
'CMIP6.HighResMIP.CMCC.CMCC-CM2-VHR4.highres-future.r1i1p1f1.6hrPlevPt.psl.gn.v20190509.psl_6hrPlevPt_CMCC-CM2-VHR4_highres-future_r1i1p1f1_gn_201510010000-201510311800.nc|esgf.ceda.ac.uk'],
'instance_id': ['CMIP6.HighResMIP.CMCC.CMCC-CM2-VHR4.highres-future.r1i1p1f1.6hrPlevPt.psl.gn.v20190509.psl_6hrPlevPt_CMCC-CM2-VHR4_highres-future_r1i1p1f1_gn_201501010000-201501311800.nc',
'CMIP6.HighResMIP.CMCC.CMCC-CM2-VHR4.highres-future.r1i1p1f1.6hrPlevPt.psl.gn.v20190509.psl_6hrPlevPt_CMCC-CM2-VHR4_highres-future_r1i1p1f1_gn_201502010000-201502281800.nc',
'CMIP6.HighResMIP.CMCC.CMCC-CM2-VHR4.highres-future.r1i1p1f1.6hrPlevPt.psl.gn.v20190509.psl_6hrPlevPt_CMCC-CM2-VHR4_highres-future_r1i1p1f1_gn_201503010000-201503311800.nc',
'CMIP6.HighResMIP.CMCC.CMCC-CM2-VHR4.highres-future.r1i1p1f1.6hrPlevPt.psl.gn.v20190509.psl_6hrPlevPt_CMCC-CM2-VHR4_highres-future_r1i1p1f1_gn_201504010000-201504301800.nc',
'CMIP6.HighResMIP.CMCC.CMCC-CM2-VHR4.highres-future.r1i1p1f1.6hrPlevPt.psl.gn.v20190509.psl_6hrPlevPt_CMCC-CM2-VHR4_highres-future_r1i1p1f1_gn_201505010000-201505311800.nc',
'CMIP6.HighResMIP.CMCC.CMCC-CM2-VHR4.highres-future.r1i1p1f1.6hrPlevPt.psl.gn.v20190509.psl_6hrPlevPt_CMCC-CM2-VHR4_highres-future_r1i1p1f1_gn_201506010000-201506301800.nc',
'CMIP6.HighResMIP.CMCC.CMCC-CM2-VHR4.highres-future.r1i1p1f1.6hrPlevPt.psl.gn.v20190509.psl_6hrPlevPt_CMCC-CM2-VHR4_highres-future_r1i1p1f1_gn_201507010000-201507311800.nc',
'CMIP6.HighResMIP.CMCC.CMCC-CM2-VHR4.highres-future.r1i1p1f1.6hrPlevPt.psl.gn.v20190509.psl_6hrPlevPt_CMCC-CM2-VHR4_highres-future_r1i1p1f1_gn_201508010000-201508311800.nc',
'CMIP6.HighResMIP.CMCC.CMCC-CM2-VHR4.highres-future.r1i1p1f1.6hrPlevPt.psl.gn.v20190509.psl_6hrPlevPt_CMCC-CM2-VHR4_highres-future_r1i1p1f1_gn_201509010000-201509301800.nc',
'CMIP6.HighResMIP.CMCC.CMCC-CM2-VHR4.highres-future.r1i1p1f1.6hrPlevPt.psl.gn.v20190509.psl_6hrPlevPt_CMCC-CM2-VHR4_highres-future_r1i1p1f1_gn_201510010000-201510311800.nc'],
'pid': [['hdl:21.14100/bb2e98c8-f461-3adc-95b2-d6666ce904ff'],
['hdl:21.14100/bb2e98c8-f461-3adc-95b2-d6666ce904ff'],
['hdl:21.14100/bb2e98c8-f461-3adc-95b2-d6666ce904ff'],
['hdl:21.14100/bb2e98c8-f461-3adc-95b2-d6666ce904ff'],
['hdl:21.14100/bb2e98c8-f461-3adc-95b2-d6666ce904ff'],
['hdl:21.14100/bb2e98c8-f461-3adc-95b2-d6666ce904ff'],
['hdl:21.14100/bb2e98c8-f461-3adc-95b2-d6666ce904ff'],
['hdl:21.14100/bb2e98c8-f461-3adc-95b2-d6666ce904ff'],
['hdl:21.14100/bb2e98c8-f461-3adc-95b2-d6666ce904ff'],
['hdl:21.14100/bb2e98c8-f461-3adc-95b2-d6666ce904ff']],
'title': ['psl_6hrPlevPt_CMCC-CM2-VHR4_highres-future_r1i1p1f1_gn_201501010000-201501311800.nc',
'psl_6hrPlevPt_CMCC-CM2-VHR4_highres-future_r1i1p1f1_gn_201502010000-201502281800.nc',
'psl_6hrPlevPt_CMCC-CM2-VHR4_highres-future_r1i1p1f1_gn_201503010000-201503311800.nc',
'psl_6hrPlevPt_CMCC-CM2-VHR4_highres-future_r1i1p1f1_gn_201504010000-201504301800.nc',
'psl_6hrPlevPt_CMCC-CM2-VHR4_highres-future_r1i1p1f1_gn_201505010000-201505311800.nc',
'psl_6hrPlevPt_CMCC-CM2-VHR4_highres-future_r1i1p1f1_gn_201506010000-201506301800.nc',
'psl_6hrPlevPt_CMCC-CM2-VHR4_highres-future_r1i1p1f1_gn_201507010000-201507311800.nc',
'psl_6hrPlevPt_CMCC-CM2-VHR4_highres-future_r1i1p1f1_gn_201508010000-201508311800.nc',
'psl_6hrPlevPt_CMCC-CM2-VHR4_highres-future_r1i1p1f1_gn_201509010000-201509301800.nc',
'psl_6hrPlevPt_CMCC-CM2-VHR4_highres-future_r1i1p1f1_gn_201510010000-201510311800.nc'],
'tracking_id': [['hdl:21.14100/17d1228c-4821-41c0-b81c-bcc885f22674'],
['hdl:21.14100/67a26fb7-f72e-439f-a65a-f9a51a21827f'],
['hdl:21.14100/d99b9ada-bf5f-4e15-9d35-9600f0df8161'],
['hdl:21.14100/b2a063be-9759-4a9b-8a85-f62261966df1'],
['hdl:21.14100/db3191c0-8027-4c34-9bdd-435f1806c404'],
['hdl:21.14100/2b50a6f3-8018-4a47-ae2b-90f99948ee4e'],
['hdl:21.14100/1743b1e0-1d5d-4584-a8ae-a9f153bafdea'],
['hdl:21.14100/79ee082d-21d2-42a7-b129-822ea800b847'],
['hdl:21.14100/6aa7761b-3408-4bf9-8711-bae94dda2674'],
['hdl:21.14100/4ce1c9ab-acaa-48bc-8404-cf57e37a7011']],
'url': ['https://esgf.ceda.ac.uk/thredds/fileServer/esg_cmip6/CMIP6/HighResMIP/CMCC/CMCC-CM2-VHR4/highres-future/r1i1p1f1/6hrPlevPt/psl/gn/v20190509/psl_6hrPlevPt_CMCC-CM2-VHR4_highres-future_r1i1p1f1_gn_201501010000-201501311800.nc',
'https://esgf.ceda.ac.uk/thredds/fileServer/esg_cmip6/CMIP6/HighResMIP/CMCC/CMCC-CM2-VHR4/highres-future/r1i1p1f1/6hrPlevPt/psl/gn/v20190509/psl_6hrPlevPt_CMCC-CM2-VHR4_highres-future_r1i1p1f1_gn_201502010000-201502281800.nc',
'https://esgf.ceda.ac.uk/thredds/fileServer/esg_cmip6/CMIP6/HighResMIP/CMCC/CMCC-CM2-VHR4/highres-future/r1i1p1f1/6hrPlevPt/psl/gn/v20190509/psl_6hrPlevPt_CMCC-CM2-VHR4_highres-future_r1i1p1f1_gn_201503010000-201503311800.nc',
'https://esgf.ceda.ac.uk/thredds/fileServer/esg_cmip6/CMIP6/HighResMIP/CMCC/CMCC-CM2-VHR4/highres-future/r1i1p1f1/6hrPlevPt/psl/gn/v20190509/psl_6hrPlevPt_CMCC-CM2-VHR4_highres-future_r1i1p1f1_gn_201504010000-201504301800.nc',
'https://esgf.ceda.ac.uk/thredds/fileServer/esg_cmip6/CMIP6/HighResMIP/CMCC/CMCC-CM2-VHR4/highres-future/r1i1p1f1/6hrPlevPt/psl/gn/v20190509/psl_6hrPlevPt_CMCC-CM2-VHR4_highres-future_r1i1p1f1_gn_201505010000-201505311800.nc',
'https://esgf.ceda.ac.uk/thredds/fileServer/esg_cmip6/CMIP6/HighResMIP/CMCC/CMCC-CM2-VHR4/highres-future/r1i1p1f1/6hrPlevPt/psl/gn/v20190509/psl_6hrPlevPt_CMCC-CM2-VHR4_highres-future_r1i1p1f1_gn_201506010000-201506301800.nc',
'https://esgf.ceda.ac.uk/thredds/fileServer/esg_cmip6/CMIP6/HighResMIP/CMCC/CMCC-CM2-VHR4/highres-future/r1i1p1f1/6hrPlevPt/psl/gn/v20190509/psl_6hrPlevPt_CMCC-CM2-VHR4_highres-future_r1i1p1f1_gn_201507010000-201507311800.nc',
'https://esgf.ceda.ac.uk/thredds/fileServer/esg_cmip6/CMIP6/HighResMIP/CMCC/CMCC-CM2-VHR4/highres-future/r1i1p1f1/6hrPlevPt/psl/gn/v20190509/psl_6hrPlevPt_CMCC-CM2-VHR4_highres-future_r1i1p1f1_gn_201508010000-201508311800.nc',
'https://esgf.ceda.ac.uk/thredds/fileServer/esg_cmip6/CMIP6/HighResMIP/CMCC/CMCC-CM2-VHR4/highres-future/r1i1p1f1/6hrPlevPt/psl/gn/v20190509/psl_6hrPlevPt_CMCC-CM2-VHR4_highres-future_r1i1p1f1_gn_201509010000-201509301800.nc',
'https://esgf.ceda.ac.uk/thredds/fileServer/esg_cmip6/CMIP6/HighResMIP/CMCC/CMCC-CM2-VHR4/highres-future/r1i1p1f1/6hrPlevPt/psl/gn/v20190509/psl_6hrPlevPt_CMCC-CM2-VHR4_highres-future_r1i1p1f1_gn_201510010000-201510311800.nc']}
I have gone through too many cycles of thinking that I have isolated the core/required attributes to extract from the API response. In my latest approach I am literally taking everything I got back from the ESGF API (for dataset and files), and inject it into the attributes:
ds.attrs['pangeo_forge_api_responses']
gives something like this now:
Not exactly subtle, but this will leave me with the biggest flexibility to implement tests based on this additional metadata (#99, which should help identify #53, probably avoid #30?)
Just looking over the original docs for the zarr stores here: https://pangeo-data.github.io/pangeo-cmip6-cloud/overview.html#zarr-storage-format
We should add the handle_id concatenation to our recipes.
This should also be coordinated with https://github.com/jbusecke/esgf-virtual-zarr-data-access/issues/6 so that the output looks exactly the same to the user!