mmaelicke / scikit-gstat

Geostatistical variogram estimation expansion in the scipy style
https://mmaelicke.github.io/scikit-gstat/
MIT License
223 stars 53 forks source link

Performance issues with ordinary kriging with dense sampling within variogram's range #127

Open Duke-of-Lizard opened 2 years ago

Duke-of-Lizard commented 2 years ago

Hello @mmaelicke, I am a colleague of @redhog. We use scikit-gstat a lot in our workflows, and looking to make my first contribution to this library. :)

I am often working with data that has a much denser (albeit irregular) spacing on the order of 10-30 m but the quantities I am interpolating often have a variogram range of 300-600 m. I find that using only the nearest 20-30 data points is sufficiently accurate for my purposes. However, even in sparse mode, OrdinaryKriging.transform still calculates distances between all pairs within the variogram range, so I'm computing roughly 10x the number of distance pairs that I need to. This slows down performance substantially.

https://github.com/mmaelicke/scikit-gstat/blob/126fe86e77896528b43176ae00df0f6f22aaf46b/skgstat/Kriging.py#L297

I'd like to optimize this step to ensure that we only fill up self.transform_coords the minimum amount to satisfy OrdinaryKriging's min_points and max_points parameters.

I was going to take this challenge on sometime in the coming months, but I welcome suggestions and input from any potential collaborators.

mmaelicke commented 2 years ago

Hey @Duke-of-Lizard, great to hear about your upcoming contributions! Great! I guess, as at the time OrdinaryKriging.transform is being called it is already clear which combinations of points will be taken into consideration, it should be conceptually pretty straightforward to use a sparse matrix inside MetricSpace and only calculate what's needed. MetricSpace was built around sparse matrices by @redhog anyway.

Yeah, so a first draft is more than welcome, then I am happy to contribute and comment wherever I can be helpful. In the light of potential upcoming enhancements, it might be worth it to discuss (ie. using zoom) other enhancements related to OrdinaryKriging, as there is still some stuff on the table. There is ie. #101 or #124 that both mention issues with multi-processing.