pangeo-forge / cmip6-feedstock

A Pangeo Forge Feedstock for cmip6.
Apache License 2.0
3 stars 2 forks source link

Dynamically detect netcdf version #9

Closed jbusecke closed 2 years ago

jbusecke commented 2 years ago

In #8 we noticed that some of the datasets seem to be netcdf3 files.

This can be resolved easily by providing a keyword argument to file_pattern, but as with other kwargs, the sheer volume of recipes that this will ultimately generate makes that extremely tedious.

I would like to explore if there are ways to automatically determine the netcdf version from 1) the ESGF metadata 2) Or if there is a way to determine the netcdf format from a given url

I explored 1) with this:

url = "https://esgf-node.llnl.gov/esg-search/search"

params = {
            "type": "File",
            "retracted": "false",
            "replica":"false",
            "format": "application/solr+json",
            "latest": "true",
            "limit": 500,
            'variable_id':'thetao',
            'source_id':'GFDL-ESM4',
        }
resp = requests.get(url=url, params=params)
for k,v in resp.json()['response']['docs'][0].items():
    if ('cdf' in str(k)) or ('cdf' in str(v)):
        print(k)
        print(v)

But that only returns the actual url.

Is there a lightweight way to query the dataset and determine the version from the header?

jbusecke commented 2 years ago

Closed via #8