Closed rhugonnet closed 1 year ago
Combining theoretical variogram models has been requested in a couple of instances. That would definitely suit the community.
My first approach would be to start with the variogram
decorator in skgstat.models
:
https://github.com/mmaelicke/scikit-gstat/blob/01e13feea26b9d22312516d2e1167b3d9881ad00/skgstat/models.py#L9
Then, the user would combine two models into a new model function and pass that on to the Variogram
instance, which can handle custom model function anyway. I never tested that workflow in depth, so we should check that really everything is working.
I am myself, not so much into the topic. So does the combined model come with extra parameters, ie. to weight the 'contribution' of each model?
Then, we need to rework the fitting procedure, which fits the number of model parameters explicitly, with a large if-else, to check for stable and matern models. Here, I would then switch to inspect
, to get the number of model parameters. The current models all follow the same signature and use the order of parameters: lag, range, sill, [optional: shape, smoothness] and the last is always the nugget.
For the second part, I am not sure if I got you right. Basically, it is about calculating many experimental variograms from a large sample, to estimate the uncertainty underlying the estimation, right?
I am planning to add a uncertainty sub-module anyway. Here, I would have taken the approach to implement other functions that can do uncertainty propagation/estimation and finally return a Variogram
. So, the variogram would be capable of holding uncertainty information, but does not implement means to estimate them. The estimating function could then calculate all the experimental variograms and combine them into the 'Result'. For this we need to make sure, that one can reliably prevent the Variogram
from fitting a model all the time, because that would only be needed for the final one.
I started something similar a while ago, which is largely undocumented and maybe not the best approach:
https://github.com/mmaelicke/scikit-gstat/blob/01e13feea26b9d22312516d2e1167b3d9881ad00/skgstat/util/uncertainty.py#L12
I hope you get the idea in terms of implementation (although your suggestion goes into another direction). If you agree, we can maybe open a new issue to discuss implementation details.
Thanks for all the input! Best
Mirko
Amazing, thanks for the detailed response! To answer your questions:
I'll aim to start on the combination of several models after the summer holidays!
Just to let you know: There is also the https://github.com/hydrocode-de/skgstat_uncertainty project, along with this publication: https://doi.org/10.1016/j.spasta.2023.100737, that deals mainly with observation uncertainty in variorums, as a small part of the puzzle. The uncertainty project is mainly a web-application (build in streamlit) and I am planning to extract the core features for processing uncertainty out of the repo and either have a core package that integrates into the web-app and SciKit-GStat, or even make it part of SciKit-GStat. That would enable users to use the methods developed there and described in the publication, without the overhead of the streamlit app and the overly complicated data model behind it.
Maybe we can streamline this and find a interface / api, how to define and handle uncertainty that suits your and mine already developed stuff.
Hi again @mmaelicke,
Do you think allowing the fit of a combinaison (sum, product) of variogram models could be a functionality that has its place in
scikit-gstat
?We use that a lot for analyzing multiple correlation ranges in satellite data (for example, https://github.com/GlacioHack/xdem/blob/main/xdem/spatialstats.py#L1622). I could try to add the functionality here! :smile:
It would combine well with
RasterMetricSpace
(for which I need to push some minor fixes + add better documentation, at least for the API, it's been a long time coming). We also tend to sample multiple experimental variograms to get an idea of the standard error in the experimental variance at each lag (as we have billions of pairwise samples), and use that to condition the fit (by passing it automatically asfit_sigma
). Not sure how that'd fit here...What do you think?