ua-snap / cmip6-utils

Pipelines and utilites for working with CMIP6 data
0 stars 1 forks source link

Deal with multiple versions of same dataset #49

Open Joshdpaul opened 3 months ago

Joshdpaul commented 3 months ago

When we transfer CMIP6 data from Globus, the transfers pipeline selects the most recent version date to build the manifest. (See this part of the esgf_holdings.py script.)

However, there is no provision for removing pre-existing older versions from the CMIP6 data directory. Each new run of the transfers pipeline may introduce new versions of a dataset, without removing the older ones. This is not necessarily a problem, but needs to be dealt with during regridding (which version of the dataset should be regridded?) and QC (which version of the dataset is the really the source of the data being QC'd?). Some QC errors have popped up because the comparison between regridded and source datasets used the wrong source dataset version.

Ideas: