mcgibbon / sympl

A toolkit for building planetary/Earth system models in Python
http://sympl.readthedocs.io
Other
48 stars 14 forks source link

dims_like should allow multiple targets #14

Closed mcgibbon closed 6 years ago

mcgibbon commented 6 years ago

Currently there's a deficiency in the dims_like attribute of property dictionaries. Say you have an output Z which is the result of something like X*Y, with array broadcasting. If X is shape (a,) and y is shape (a, b), it is insufficient to say Z has dims_like X, and vice-versa when the shapes are reversed. What you would really want to say is that Z has dims like the broadcast of X and Y.

In other words, currently dims_like says that wildcard dimensions should be matched like the target. But what it should say is that it should be matched like either one or a list of targets. In the case that it's a list, it should defer to matches that actually have the wildcard dimension, and require that if multiple targets have that wildcard dimension then they all are the same.

This change is backwards-compatible, since it only implies new functionality when a list of inputs is given in dims_like.

Some examples:

If X has shape ('mid_levels',), and Y has shape ('column', 'mid_levels') in the state, and the component requires ('*', 'z') for both, then dims_like ['X', 'Y'] would be ('column', 'mid_levels').

If X has shape ('mid_levels',), and Y has shape ('mid_levels',) in the state, and the component requires ('*', 'z') for both, then dims_like ['X', 'Y'] would be ('mid_levels',).

If X has shape ('mid_levels',), and Y has shape ('column', 'mid_levels') in the state, and the component requires ('*', 'z') for Y and ('z',) for X, then an exception is raised on dims_like ('X', 'Y').

If X has shape ('column_edge', 'mid_levels'), and Y has shape ('column_center', 'mid_levels') in the state, and the component requires ('*', 'z') for both, then an exception is raised.

Additionally, an exception is raised if the output does not actually have the required shape. For example, if dims_like implies output shape of ('column', 'mid_levels') but the output quantity does not have the right dimension size for 'column', or that dimension is missing, then an exception is raised.

mcgibbon commented 6 years ago

After thinking about this more there are serious flaws with what I have written above, but I don't have them sorted out enough to clear them up right now.

This all stems from a use case I have where say 'height' and 'air_temperature' and 'air_pressure' both have dims [' ', 'z'], and in the state height has dims ['mid_levels'] while air_temperature and 'air_pressure' have dims ['column', 'mid_levels']. I'd like to ensure air_temperature and air_pressure have the same ' ' dimension, even though they don't match with 'height' (since it's missing any ' * ' matches). The solution may be to treat unmatched wildcards differently, and internally chain any dims_like relations...

Note: Spaces around * to avoid interpreting them as formatting markers.

mcgibbon commented 6 years ago

We may be able to resolve this by removing dims_like and instead using named wildcards, where all instances of the named wildcard must be compatible with one another.

mcgibbon commented 6 years ago

Right now my idea is to allow a dims keyword to be used instead of dims_like, where any wildcards in the dims keyword are assumed to be the amalgamation of any matches for that wildcard in the inputs. This should error for "*" if and only if there are different inputs for which the same matched dimensions appear in different orders. In that case one cannot construct. For x, y, and z this also should error if more than one dimension name is matched.

This check should actually be applied earlier, when converting the input state to numpy arrays. All instances of a given wildcard should be broadcast to the same dimensions before collapsing into a single wildcard dimension, and an error should occur if x, y, or z have multiple matches among the inputs. This guarantees that all inputs with the same dims actually have the same shape.

It is theoretically possible one might want to have multiple "*" that actually mean different things within a single component, but I can't imagine a real use case for this. Down the line we may want to add the ability to name "*" wildcards (e.g. "*a", "*b") so that this broadcasting operation only occurs among wildcards with the same name. I propose we don't do this right away, but add it later if an actual use case shows up.

@JoyMonteiro

mcgibbon commented 6 years ago

No longer necessary due to #28 and closed by #26.