In #8 we noticed that some of the datasets seem to be netcdf3 files.
This can be resolved easily by providing a keyword argument to file_pattern, but as with other kwargs, the sheer volume of recipes that this will ultimately generate makes that extremely tedious.
I would like to explore if there are ways to automatically determine the netcdf version from 1) the ESGF metadata 2) Or if there is a way to determine the netcdf format from a given url
I explored 1) with this:
url = "https://esgf-node.llnl.gov/esg-search/search"
params = {
"type": "File",
"retracted": "false",
"replica":"false",
"format": "application/solr+json",
"latest": "true",
"limit": 500,
'variable_id':'thetao',
'source_id':'GFDL-ESM4',
}
resp = requests.get(url=url, params=params)
for k,v in resp.json()['response']['docs'][0].items():
if ('cdf' in str(k)) or ('cdf' in str(v)):
print(k)
print(v)
But that only returns the actual url.
Is there a lightweight way to query the dataset and determine the version from the header?
In #8 we noticed that some of the datasets seem to be netcdf3 files.
This can be resolved easily by providing a keyword argument to file_pattern, but as with other kwargs, the sheer volume of recipes that this will ultimately generate makes that extremely tedious.
I would like to explore if there are ways to automatically determine the netcdf version from 1) the ESGF metadata 2) Or if there is a way to determine the netcdf format from a given url
I explored 1) with this:
But that only returns the actual url.
Is there a lightweight way to query the dataset and determine the version from the header?