Fit of a sum of variogram models

rhugonnet commented 1 year ago

Hi again @mmaelicke,

Do you think allowing the fit of a combinaison (sum, product) of variogram models could be a functionality that has its place in scikit-gstat?

We use that a lot for analyzing multiple correlation ranges in satellite data (for example, https://github.com/GlacioHack/xdem/blob/main/xdem/spatialstats.py#L1622). I could try to add the functionality here! :smile:

It would combine well with RasterMetricSpace (for which I need to push some minor fixes + add better documentation, at least for the API, it's been a long time coming). We also tend to sample multiple experimental variograms to get an idea of the standard error in the experimental variance at each lag (as we have billions of pairwise samples), and use that to condition the fit (by passing it automatically as fit_sigma). Not sure how that'd fit here...

What do you think?

mmaelicke commented 1 year ago

Combining theoretical variogram models has been requested in a couple of instances. That would definitely suit the community. My first approach would be to start with the variogram decorator in skgstat.models: https://github.com/mmaelicke/scikit-gstat/blob/01e13feea26b9d22312516d2e1167b3d9881ad00/skgstat/models.py#L9

Then, the user would combine two models into a new model function and pass that on to the Variogram instance, which can handle custom model function anyway. I never tested that workflow in depth, so we should check that really everything is working. I am myself, not so much into the topic. So does the combined model come with extra parameters, ie. to weight the 'contribution' of each model? Then, we need to rework the fitting procedure, which fits the number of model parameters explicitly, with a large if-else, to check for stable and matern models. Here, I would then switch to inspect, to get the number of model parameters. The current models all follow the same signature and use the order of parameters: lag, range, sill, [optional: shape, smoothness] and the last is always the nugget.

For the second part, I am not sure if I got you right. Basically, it is about calculating many experimental variograms from a large sample, to estimate the uncertainty underlying the estimation, right? I am planning to add a uncertainty sub-module anyway. Here, I would have taken the approach to implement other functions that can do uncertainty propagation/estimation and finally return a Variogram. So, the variogram would be capable of holding uncertainty information, but does not implement means to estimate them. The estimating function could then calculate all the experimental variograms and combine them into the 'Result'. For this we need to make sure, that one can reliably prevent the Variogram from fitting a model all the time, because that would only be needed for the final one. I started something similar a while ago, which is largely undocumented and maybe not the best approach: https://github.com/mmaelicke/scikit-gstat/blob/01e13feea26b9d22312516d2e1167b3d9881ad00/skgstat/util/uncertainty.py#L12 I hope you get the idea in terms of implementation (although your suggestion goes into another direction). If you agree, we can maybe open a new issue to discuss implementation details.

Thanks for all the input! Best

Mirko

rhugonnet commented 1 year ago

Amazing, thanks for the detailed response! To answer your questions:

Weighting the models: Didn't think about this. I never had the need to use any, as the partial sills of each model did that job already when they were optimized. I'll try to check if that covers most of it, or if it's something we should worry about :thinking:.
Uncertainty through many experimental variograms: Yes, exactly! Great news that you already have something drafted, I'll start another issue to discuss this once I've looked at your implementation :slightly_smiling_face:. Yes for skipping the fit, actually it's already possible (and we should add this more clearly to the doc): https://github.com/mmaelicke/scikit-gstat/pull/113

I'll aim to start on the combination of several models after the summer holidays!

mmaelicke commented 1 year ago

Just to let you know: There is also the https://github.com/hydrocode-de/skgstat_uncertainty project, along with this publication: https://doi.org/10.1016/j.spasta.2023.100737, that deals mainly with observation uncertainty in variorums, as a small part of the puzzle. The uncertainty project is mainly a web-application (build in streamlit) and I am planning to extract the core features for processing uncertainty out of the repo and either have a core package that integrates into the web-app and SciKit-GStat, or even make it part of SciKit-GStat. That would enable users to use the methods developed there and described in the publication, without the overhead of the streamlit app and the overly complicated data model behind it.

Maybe we can streamline this and find a interface / api, how to define and handle uncertainty that suits your and mine already developed stuff.

mmaelicke / scikit-gstat

Fit of a sum of variogram models #159