pySTEPS / pysteps

Python framework for short-term ensemble prediction systems.
https://pysteps.github.io/
BSD 3-Clause "New" or "Revised" License
441 stars 160 forks source link

Evaluate the work needed to finally implement the xarray data model throughout pysteps. #373

Closed ladc closed 1 month ago

ladc commented 1 month ago

Already since 2018 (see #12, #216 and https://gist.github.com/kmuehlbauer/645e42a53b30752230c08c20a9c964f9 ) there has been talk of moving to xarray. At the time, it was not yet mature, but now it would be very beneficial for the code performance, readability, maintainability, and usability. This requires adaptations throughout the code, rigorous testing, potentially requiring the creation of new unit tests to ensure that all functions still function as expected. More importantly, this would introduce many breaking changes so it's not a minor update; it would instead constitute a significant part of the much-anticipated pysteps v2 (#216). Your insights welcome! @dnerini @RubenImhoff

ladc commented 1 month ago

When we tackle this topic, we can create new issues to divide the work in more digestible bits.

RubenImhoff commented 1 month ago

And see also PR #240 and #254 for some inspiration and examples to get started.

ladc commented 1 month ago

@mats-knmi and Anshul @gjm174 will tackle it - Mats will start by rewriting all the importers which will break all the unit tests 👏 . He will then fix all the tests. The rewritten functions will break the rest of the code like dominoes and this will be addressed bit by bit. Anshul focuses on profiling.

The current metadata properties are described here: https://pysteps.readthedocs.io/en/stable/pysteps_reference/io.html#pysteps-io-importers

ladc commented 1 month ago

@dnerini We are discussing how we should approach this. These changes will break a lot. @mats-knmi proposes to create a v2 branch after all, so these changes can still be reviewed and included incrementally into v2 without totally breaking the master banch. What do you think?

dnerini commented 1 month ago

I'm happy with it, but maybe it's also worth looking at what was already done in the older pysteps-v2 branch? I created a draft PR to have a closer look: https://github.com/pySTEPS/pysteps/pull/383

ladc commented 1 month ago

@dnerini Mats had a look at it and it has diverged significantly from master. He has done some similar work already but the main difference is that there's no legacy support in his current revised version. This would result in much cleaner code but would break compatibility with v1.

The current methods are rewritten to support xarray but only by exporting the required metadata from the dataset, processing and inserting it back into the data array.

Some room for improvement probably remains (using the xarray functionality directly) but most of the time it's unavoidable to simply extract the np arrays to do the processing in the various functions.

dnerini commented 1 month ago

also, some work done in https://github.com/pySTEPS/pysteps/pull/383 is probably still relevant.

ladc commented 1 month ago

The gallery / examples also needs to be adapted, they serve as integration tests. We will maintain a v2 branch in parallel to the master, but add the other improvements to the master branch. CI should be set up for the v2 branch. Any help is welcome! E.g. getting the nowcasting working with xarray. In a later step we can add fancy features such as the accessors that have been proposed earlier.

mats-knmi commented 1 month ago

I finshed the conversion of the transformation.py and conversion.py to xarray in the original branch I created (https://github.com/pySTEPS/pysteps/tree/use-xarray-as-data-model).

From now on I will consider that branch to be the "main" branch of the xarray conversion and open separate pull requests to this branch for every next module that I convert, the first of these pull requests can be found here: https://github.com/pySTEPS/pysteps/pull/391.

I invite everone who wants to help to follow this same way of working.

ladc commented 1 month ago

The evaluation has been done and the work has started!

ladc commented 1 month ago

Outcome: a new project was created by @mats-knmi https://github.com/orgs/pySTEPS/projects/4 with a list of issues to be picked up - welcome to create any new issues