inbo / niche_vlaanderen

Python package to run the NICHE Vlaanderen model
https://inbo.github.io/niche_vlaanderen/
MIT License
5 stars 2 forks source link

zonal_statistics (and calibration) are slow #284

Closed johanvdw closed 2 years ago

johanvdw commented 2 years ago

Because we have approximately 28 bands, zonal statistics and calibration (which relies on zonal_statistics) are very slow.

This is due to a design in rasterstats: rasterization of the shapes will happen again for each grid. see eg: https://github.com/perrygeo/python-rasterstats/issues/124 where I propose a solution

We might be able to gain a bit by running parallel: https://github.com/csc-training/geocomputing/blob/master/python/zonal_stats/zonal_stats_parallel.py

johanvdw commented 2 years ago

For calibration it is an option to use only the vegetation types that are present in the polygon. I'm trying that approach first.

johanvdw commented 2 years ago

Before optimization: for small subset of data: 24 s per run

theroggy commented 2 years ago

Sorry to bump in here, but I got here via the issue/comment you posted regarding multiband zonal stats calculation in rasterstats.

Even though multi-band calculation sound interesting, I suppose any way of improving the performance is welcome. If this is the case, it might be interesting to check out this issue: https://github.com/perrygeo/python-rasterstats/issues/256

johanvdw commented 2 years ago

Switching from rasterstats 0.16 to 0.17 solves this issue. Calculation is now approximately 14x faster