jbusecke / pangeo-forge-esgf

Using queries to the ESGF API to generate urls and keyword arguments for receipe generation in pangeo-forge
Apache License 2.0
6 stars 4 forks source link

API Issue: multiple versions returned #17

Closed jbusecke closed 11 months ago

jbusecke commented 11 months ago

I just encountered some unexpected behavior with code in #15 commit.

iid = "CMIP6.ScenarioMIP.MIROC.MIROC6.ssp245.r47i1p1f1.day.psl.gn.v20210917"
from pangeo_forge_esgf import get_urls_from_esgf
url_dict = await get_urls_from_esgf([iid])
urls = url_dict[iid]
urls

raised

ValueError: Duplicate files found. This sometimes happens when the API returns multiple versions. Got 
[
'CMIP6.ScenarioMIP.MIROC.MIROC6.ssp245.r47i1p1f1.day.psl.gn.v20200623.psl_day_MIROC6_ssp245_r47i1p1f1_gn_20150101-20241231.nc', 
'CMIP6.ScenarioMIP.MIROC.MIROC6.ssp245.r47i1p1f1.day.psl.gn.v20200623.psl_day_MIROC6_ssp245_r47i1p1f1_gn_20250101-20341231.nc', 
'CMIP6.ScenarioMIP.MIROC.MIROC6.ssp245.r47i1p1f1.day.psl.gn.v20200623.psl_day_MIROC6_ssp245_r47i1p1f1_gn_20350101-20391231.nc', 
'CMIP6.ScenarioMIP.MIROC.MIROC6.ssp245.r47i1p1f1.day.psl.gn.v20210917.psl_day_MIROC6_ssp245_r47i1p1f1_gn_20150101-20241231.nc', 
'CMIP6.ScenarioMIP.MIROC.MIROC6.ssp245.r47i1p1f1.day.psl.gn.v20210917.psl_day_MIROC6_ssp245_r47i1p1f1_gn_20250101-20341231.nc', 
'CMIP6.ScenarioMIP.MIROC.MIROC6.ssp245.r47i1p1f1.day.psl.gn.v20210917.psl_day_MIROC6_ssp245_r47i1p1f1_gn_20350101-20391231.nc', 
'CMIP6.ScenarioMIP.MIROC.MIROC6.ssp245.r47i1p1f1.day.psl.gn.v20210917.psl_day_MIROC6_ssp245_r47i1p1f1_gn_20400101-20491231.nc', 
'CMIP6.ScenarioMIP.MIROC.MIROC6.ssp245.r47i1p1f1.day.psl.gn.v20210917.psl_day_MIROC6_ssp245_r47i1p1f1_gn_20500101-20591231.nc', 
'CMIP6.ScenarioMIP.MIROC.MIROC6.ssp245.r47i1p1f1.day.psl.gn.v20210917.psl_day_MIROC6_ssp245_r47i1p1f1_gn_20600101-20691231.nc', 
'CMIP6.ScenarioMIP.MIROC.MIROC6.ssp245.r47i1p1f1.day.psl.gn.v20210917.psl_day_MIROC6_ssp245_r47i1p1f1_gn_20700101-20791231.nc', 
'CMIP6.ScenarioMIP.MIROC.MIROC6.ssp245.r47i1p1f1.day.psl.gn.v20210917.psl_day_MIROC6_ssp245_r47i1p1f1_gn_20800101-20891231.nc', 
'CMIP6.ScenarioMIP.MIROC.MIROC6.ssp245.r47i1p1f1.day.psl.gn.v20210917.psl_day_MIROC6_ssp245_r47i1p1f1_gn_20900101-20991231.nc', 
'CMIP6.ScenarioMIP.MIROC.MIROC6.ssp245.r47i1p1f1.day.psl.gn.v20210917.psl_day_MIROC6_ssp245_r47i1p1f1_gn_21000101-21001231.nc'].

Note the duplicate filenames that have different versions. This seems ultimately an error on the ESGF side. We currently always query with 'latest', practically ignoring the version. But this should never happen.

Ill try to fix this by actually querying the specific version, otherwise I will have to manually fix this.

jbusecke commented 11 months ago

Ok this is fixed with not using latest for the search.