opendatacube / odc-algo

Apache License 2.0
5 stars 1 forks source link

Add rust geomedian impl #14

Open omad opened 2 months ago

omad commented 2 months ago

This is very work in progress, and I actually wish I'd taken a different approach.

The intention was to take a NaN aware, Python/NumPy Geomedian implementation written in Rust several years ago, and merge it into odc-algo as a seamless replacement for the hdstats Cython implementation.

The hdstats implementation works well, but the published source code doesn't match the version that is being used within DEA. binaries source

Existing Implementation The code in this repo handles taking a dask/xarray object, partitioning and reshaping it, and then calling hdstats. As far as I can tell, it's only used from odc-stats. This makes it all a bit annoying to test and work on, odc-stats -> odc-stats GM Plugin -> odc-algo -> hdstats

This Work Instead of modifying what's here, I wish I'd added a new Python module for calling the new Rust code, along with a new plugin within odc-stats. This would have made testing and comparing the implementations much easier, and is what I'd recommend doing rather than merging this PR as it stands.

robbibt commented 3 weeks ago

Hey @omad, just saw this bit:

As far as I can tell, it's only used from odc-stats.

We make use of xr_geomedian and int_geomedian from this repo throughout DEA Notebooks and in our upcoming DEA Intertidal Composites products - with this proposal ("I wish I'd added a new Python module for calling the new Rust code, along with a new plugin within odc-stats") would we still be able to do that?

omad commented 3 weeks ago

Hi @robbibt

We make use of xr_geomedian and int_geomedian from this repo throughout DEA Notebooks and in our upcoming DEA Intertidal Composites products - with this proposal ("I wish I'd added a new Python module for calling the new Rust code, along with a new plugin within odc-stats") would we still be able to do that?

Absolutely! I made this PR for visibility, not to merge.

I'm going to create a new PR, that exposes the new code from a different Python module, and leave the current modules as they are.

robbibt commented 3 weeks ago

Awesome - we'd love to switch to the new code though - perhaps we can create equivalent funcs in the new module instead