When we transfer CMIP6 data from Globus, the transfers pipeline selects the most recent version date to build the manifest. (See this part of the esgf_holdings.py script.)
However, there is no provision for removing pre-existing older versions from the CMIP6 data directory. Each new run of the transfers pipeline may introduce new versions of a dataset, without removing the older ones. This is not necessarily a problem, but needs to be dealt with during regridding (which version of the dataset should be regridded?) and QC (which version of the dataset is the really the source of the data being QC'd?). Some QC errors have popped up because the comparison between regridded and source datasets used the wrong source dataset version.
Ideas:
do not allow multiple versions; write a script to purge older versions as part of the transfers pipeline.
allow multiple versions; requires explicitly choosing most recent version in regridding pipeline.
When we transfer CMIP6 data from Globus, the transfers pipeline selects the most recent version date to build the manifest. (See this part of the
esgf_holdings.py
script.)However, there is no provision for removing pre-existing older versions from the CMIP6 data directory. Each new run of the transfers pipeline may introduce new versions of a dataset, without removing the older ones. This is not necessarily a problem, but needs to be dealt with during regridding (which version of the dataset should be regridded?) and QC (which version of the dataset is the really the source of the data being QC'd?). Some QC errors have popped up because the comparison between regridded and source datasets used the wrong source dataset version.
Ideas: