Variables with hard-coded dimension labels are not very flexible

benbovy commented 6 years ago

Some processes are generic enough to be re-used in contexts where the (space) domain of a model even differs by its number of dimensions. Take for example the following process class:

@xs.process
class SurfaceTopography(object):
    """Update a surface topography as resulting from
    multiple processes either rising or lowering elevation.
    """
    elevation = xs.variable(dims=('y', 'x'), intent='inout',
                            description='surface topography elevation')

    elevation_up_vars = xs.group('elevation_up')
    elevation_down_vars = xs.group('elevation_down')

    def run_step(self, *args):
        elevation_up = sum((v for v in self.elevation_up_vars))
        elevation_down = sum((v for v in self.elevation_down_vars))
        self.elevation_change = elevation_up - elevation_down

    def finalize_step(self):
        self.elevation += self.elevation_change

It would be nice to just re-use this class for driving the evolution of either a 1-d profile (e.g., river channel), a 2-d regular grid (raster DEM) or an irregular mesh. Unfortunately, the class above doesn't allow that flexibility because of the hard-coded dims=('y', 'x') for the elevation variable.

There are different options to overcome this limitation:

1. Use class inheritance

Create an inherited process class for each case and redefine dims for the relevant variables , e.g.,

@xs.process
class ProfileTopography(SurfaceTopography):
    """Update a profile topography as resulting from
    multiple processes either rising or lowering elevation.
    """
    elevation = xs.variable(dims='x', intent='inout',
                            description='profile topography elevation')

This has the advantage to be explicit. We can adapt the class docstrings and variable description depending on the context. A downside is that we need to do create a new process class for every case and every generic process. We might end up with a lot of classes. Also, in the example above we only need to redefine one variable but there might be processes where many variables would need to be redefined. We might end up with a lot of code duplication.

2. Set `dims` so that it allows all possible dimension labels

For example:

@xs.process
class SurfaceTopography(object):
    """Update a surface topography as resulting from
    multiple processes either rising or lowering elevation.
    """
    elevation = xs.variable(dims=['x', ('y', 'x')], ...)

    ...

This is straightforward (nothing specific to implement), but also confusing and error-prone. It doesn't tell which of these dimension(s) labels are actually used in the model where the process is included.

3. Set `dims` so that dimension labels are defined at model creation using a mapping

For example:

SPACE_DIMS = xs.DimMapping('space',
                           default=('y', 'x'),
                           choices=['x', 'nodes', ('y', 'x')])

@xs.process
class SurfaceTopography(object):
    """Update a surface topography as resulting from
    multiple processes either rising or lowering elevation.
    """
    elevation = xs.variable(dims=SPACE_DIMS, ...)

    ...

when (an instance of) the SurfaceTopography process is attached to a model, xs.DimMapping('space') will try to get and return the actual dimension labels from the 'space' key of a mapping that is somehow stuck to a Model instance. This could be done, e.g., by adding an argument to Model(), e.g.,

model_grid2d = Model({'topo': SurfaceTopography}, map_dims={SPACE_DIMS: ('y', 'x')})
model_profile1d = Model({'topo': SurfaceTopography}, map_dims={SPACE_DIMS: 'x'})
model_mesh = Model({'topo': SurfaceTopography}, map_dims={SPACE_DIMS: 'nodes'})

map_dims is optional (empty mapping by default). If 'space' is not found in mapping keys, the default value set for the xs.DimMapping object is returned. If no default value is set, then a KeyError is raised.

It might make sense to somehow stick such a mapping to a specific process class, e.g., the one that defines the grid, so that when this class is included in a model, the latter automatically use it.

This option has several advantages:

it scales well and it is less error-prone (just define the labels once and use them everywhere), although obviously we don't really need DimMapping for that
it lets the user choosing arbitrary/alternative labels when creating a model
for process classes that use this feature but are maintained in different code bases, it would allow something like this:
```
model_grid2d = Model({...}, map_dims={SPACE_DIMS: ('y', 'x'), OTHER_DIMS: ('y', 'x')})
```

possible issues:

How to propagate the mapping when building a new model from another, e.g., with Model.update_processes?

benbovy commented 6 years ago

Alternative names for DimMapping:

ModelDim
SurrogateDim

benbovy commented 6 years ago

Another option:

4. A class factory that replaces one or more dimension labels declared in a process class.

Such a function would take a process class + a mapping old->new dimension labels and would return a new process class with replaced labels (looking at all declared variables).

model_grid2d = Model({'topo': SurfaceTopography})

model_profile1d = Model({
    'topo': xs.replace_dims(SurfaceTopography, {('y', 'x'): 'x'})
})

model_mesh = Model({
    'topo': xs.replace_dims(SurfaceTopography, {('y', 'x'): 'nodes'})
})

This is very simple and elegant from the API point of view. To avoid broken process dependencies , the returned class may just be a subclass of the input class (if #45 is merged). This option thus looks like option 1 but with less code to write.

benbovy commented 4 years ago

With option 4, xs.replace_dims could also accept a Model object (and return another Model object), for replacing dimensions throughout all the processes in a model:

model_profile1d = xs.replace_dims(model_grid2d, {('y', 'x'): 'x'})

benbovy commented 4 years ago

Closing this as I'm not very satisfied by any of the options above. #44 is a better approach IMO, that lefts users do what they want (except that it wouldn't work well with auto-generated docstrings #3).

If anyone else reads this and wants to share thoughts, feel free to comment, I can reopen it later.

xarray-contrib / xarray-simlab