roocs / 34e-mngmt

Management of 34e
BSD 2-Clause "Simplified" License
0 stars 0 forks source link

Make sure all repos use a mapping function between path and ds_id within roocs-utils #75

Closed agstephens closed 3 years ago

agstephens commented 3 years ago

We need this function because there is no way to work out whether a dataset ID beginning with "CMIP6." is related to c3s-cmip6 or usual cmip6.

Need to check all repos use this function.

ellesmith88 commented 3 years ago

Currently c3s-cmip6 is hardcoded in rook/director/director.py on line 19:

https://github.com/roocs/rook/blob/9633147373b6f3a45d250951fb25448defeb506a/rook/director/director.py#L19

ellesmith88 commented 3 years ago

I've been looking at this and currently can't find a way of working out if the project is c3s-cmip6 or usual cmip6 (and in the case of an xarray dataset the same applies for cmip5 and cordex) other than looking in the inventory, because the file path is the same for both and the project name (mip_era) within the datasets is the same.

Did you have a method in mind @agstephens

agstephens commented 3 years ago

My thoughts are:

  1. We should change the c3s-cmip6 inventory so that mip_era is c3s-cmip6.
  2. We need a generic Mapping class in roocs_utils to simplify the mapping process. It should take in either a dataset Id or a data path (and/or maybe other stuff later - e.g. a dictionary of facets???).

Rough sketch of the class could be:

class ProjectDataset:   # <-- feel free to give this a better name

    def __init__(self, dset, project=None):
        self._project = project
        self._parse(dset)

    def _deduce_project(self, dset):
        ...

    def _parse(self, dset):
        - if '/', if '.', if base dir, 
        - self._deduce_project(...)
        - # sets the properties of: datapath, did, basedir (and maybe facets?, and maybe files?)

    @property
    def datapath(self):

    @property
    def dsid(self):

    @property
    def basedir(self):

    @property
    def facets(self):

    @property
    def files(self):

# You could imagine some utility functions that wrap the ProjectDataset class. 
def derive_dset(dset):
    return ProjectDataset(dset)

def open_xr_dataset(dset):

def datapath_to_dset(datapath):

def dset_to_datapath(dset)
agstephens commented 3 years ago

One problem is:

Could solve this with:

In the roocs.ini file inside rook, we can ensure that the defaults are the c3s-... versions of the projects.