mcgibbon / sympl

A toolkit for building planetary/Earth system models in Python
http://sympl.readthedocs.io
Other
50 stars 14 forks source link

sympl interface spec for arrays/units #46

Open JoyMonteiro opened 5 years ago

JoyMonteiro commented 5 years ago

I would like to try these ideas out on a fork if that makes more sense, and merge it later.

Currently, sympl assumes that the arrays inside the state dictionary are instances of DataArray. While this made sense initially, I'm continually coming up against performance issues (like https://github.com/mcgibbon/sympl/issues/43).

For instance,

These issues really come to the front when writing models which work with a single column of data, which currently is the major use-case for climt at least.

While it is desirable to keep the DataArray interface, it would be really helpful downstream if sympl described an API which any array object must implement. This will require some re-writing of internal code which assumes that the arrays are DataArrays, but in the end will allow more performant array representations like unyt to be used seamlessly in sympl components.

This might also require sympl to allow an implementing library to replace functions like get_numpy_array with custom versions.

In general, it might be good to specify a number of functions that an implementing library must provide which can replace the logic that currently resides within __call__ of any sympl component. This will make it easy to add functionality without having to build custom subclasses of the base sympl components, which is undesirable.

IMO this also makes sense since sympl is a framework, and it need not be opinionated about what kind of arrays are used, or how the validation of these arrays and their dimensions is done. sympl could register callbacks based on the type of the input array formats and use them for validation and reshaping.