Closed rhugonnet closed 1 year ago
Attention: 2 lines
in your changes are missing coverage. Please review.
Comparison is base (
c2d63d1
) 89.75% compared to head (9690d39
) 90.74%.
:umbrella: View full report in Codecov by Sentry.
:loudspeaker: Have feedback on the report? Share it here.
Hi @rhugonnet,
Nice, thanks for all the effort! I am currently on vacation, but will be back next Tuesday. Is that early enough for you, if I only add a detailed response then?
A few quick thoughts: passing p0
and bounds
might be a viable extension for the user in any case. As you can also manually fit a variogram model using fit_range
on init, or range
on fit, I think we can add these parameters like that. I would use the kwargs
on init for that.
Maybe an option to define a sum-combination of pre-defined functions would be to add something like: model='spherical+gaussian'
. That would be convenient and does not break the current semantics, as the parameter accepts strings for pre-defined models anyway. It would need to internal refactoring, as the Variogram.model
property returns a function wrapped with models.variogram
, ie. to set the right __name__
attribute on the function object (to be used in describe()
and plotting). We would need some wrapper here, that can construct the actual model, and fill the __name__
property with something meaningful.
I would like to think about the parameters for a moment longer, and give you some thoughts on how to handle the parameters then. I think it would be problematic to change the structure of Variogram.parameters
as a number of interfaces ie. to gstools.CovModel
or gstatsim.Interpolate
rely on that list. But I am sure, we can come up with something useful here. Have to think about that after my vacation.... :)
Thanks again and you will hear from me soon, Mirko
Sounds good, I like all ideas!
No problem, priority to holiday and I'm busy on other things next week anyway, will continue the week after next :slightly_smiling_face: Enjoy your vacation @mmaelicke! :wink:
Sorry, was still working on your comments!
Just finalized the logic to support model names such as "spherical+gaussian" across the Variogram
methods that used _model.__name__
. Now I need to add more tests + doc, for now all existing tests pass. I'll do that tomorrow!
Alright, I think this PR is now at a good stage! :smile:
In short: Variogram
now fully supports model sums such as "spherical+gaussian", as well as custom models from callable!
To make this work for custom models: You've already seen 80% of the changes in the first draft, I just had to add a couple other exceptions in describe()
and parameters
, now it all works!
To make this work for sums: I introduced a _model_name
class attribute, and added two non-public functions: _get_argpos_sum_models
to get argument positions and _build_sum_models
to create the summed model. Those are then used by set_model()
, describe()
, fit()
and parameters
to properly set/get the arguments of each model in the sum.
By default, describe()
returns a dict
with effective_range_1
, sill_1
, nugget_1
, effective_range_2
, and so on... to make it "human-readable". And parameters
returns a list of those in order given by the model name ("spherical+gaussian" = spherical first, gaussian second).
Wrote a lot of tests, and added descriptions in the function docstrings!
Points left to finalize:
use_nugget
for a sum of models? The user might only want to use nuggets for a subset of them. Should we then allow a boolean iterable to be passed for use_nugget
? (right now untouched; all nugget or none)scipy.curve_fit
: https://github.com/mmaelicke/scikit-gstat/actions/runs/5933108779/job/16088046636). Any idea how to fix those?test_variogram
when calling the stable model (divide by zero). Will investigate this...Hi,
I really like your changes! I think it would be helpful if you could add an example, or maybe even a little tutorial notebook to the docs for others.
Concerning all the other points: I will think them through and give you my opinion on them.
Py 3.9 and 3.10 failing is happening for all branches right now, I will investigate what is going on there. Maybe that is related to bugfixes in sicpy, which might change the parameters estimated by curve_fit
. Maybe we should run unittests way more fuzzy, as curve_fit is not really SciKit-GStat's business. Not sure yet.
I'll come back to this soon
Thanks for the feedback, will start on the tutorial and wait for the other points!
curve_fit
error on Python>3.9, they're fixing it in SciPy: https://github.com/mmaelicke/scikit-gstat/issues/161#issuecomment-1689154412. I guess it's good if dependent packages like SciKit-GStat have tests relying on it, so we can report to them when this happens (which hopefully should be quite rare... they really need to improve their test suite, that was quite a big error!).mean_residual
in Python 3.6: it looks like a floating precision error due to an old NumPy adjustment that happened between 1.19
(Python 3.6
) and 1.21
(Python 3.7
). I couldn't track what change exactly. We could simply change the assertAlmostEqual
precision to 3 decimals here and for the 2 other statistical tests: https://github.com/mmaelicke/scikit-gstat/blob/main/skgstat/tests/test_variogram.py#L989?Hi @mmaelicke! Hope you had a great summer.
I finally finished this PR today:
use_nugget
parameter can stay a boolean, and the sum of models function can ignore all nuggets except the one of the last model to keep the logic in fit()
working! :smile: Added some tests to ensure this works as intended. Everything is passing on SciPy 1.11.1 except this fix: #162, still waiting for them to release 1.11.3 to resolve the other failures. :sweat_smile:
I'm thinking I could also add a tutorial/technical note on fitting multiple range using both RasterMetricSpace
+ sum of models. But maybe in a different PR where we also add RasterMetricSpace
to the API?
I will add my review as soon as possible.
I'm thinking I could also add a tutorial/technical note on fitting multiple range using both
RasterMetricSpace
+ sum of models. But maybe in a different PR where we also addRasterMetricSpace
to the API?
That sounds great! And yes, I would also do that in a separate PR.
Just did some tests in this PR changing the requirements.txt
, and it looks like the tests that fail for the stable_entropy
binning come from updates in NumPy 1.25.
Additionally, it looks like the fix in SciPy 1.11.3 did not fix the issue for some of our cases... We still need 1.11.1 to pass. They're discussing some additional issues introduced in 1.11.2 here: https://github.com/scipy/scipy/issues/19309.
I'll investigate and report a reproducible example of our issue in SciPy if I can get one!
Otherwise all passing with NumPy 1.24 and SciPy 1.11.1! :sweat_smile:
For the SciPy issue persisting with 1.11.3, I maanged to make a reproducible example and opened an issue on SciPy directly, see: https://github.com/mmaelicke/scikit-gstat/issues/161#issuecomment-1749490016
Investigating the NumPy one...
It looks like the NumPy issue started with 1.25, but I couldn't reproduce it locally or track down why exactly... Maybe we could simply reduce the floating precision of the stable_entropy bin estimation to 0 decimals? It's not very far off
It looks like the NumPy issue started with 1.25, but I couldn't reproduce it locally or track down why exactly... Maybe we could simply reduce the floating precision of the stable_entropy bin estimation to 0 decimals? It's not very far off
I was also not successful in finding the exact problem. I guess we go for less precision, but open an issue to remind ourselves to test again with newer numpy versions?
Yes, I agree! I removed the NumPy version pinning, opened #166 and changed the test precision to 0 decimals.
For SciPy: they are going to revert the changes to how it was in 1.11.1
and before, and release a 1.11.4
, but it might take some time. In the meantime, should we merge this PR with the version pinning in requirements.txt
to <=1.11.1 and open an issue to unpin once they make the release? It would allow to test the other opened PRs without having this issue in the CI!
I agree. Thanks for all the input, I am really glad that you are aware of all the changes happening in scipy.
Hey @rhugonnet,
The pre-commit
is now merged into main with #167. I just resolved the merge conflicts. There are some minor fixes, pre-commit complains about. I can't push on your fork, so could you fix the pre-commit stuff? Then we can finally merge and have a working repo again.
Thanks for all your effort
@mmaelicke Perfect, thanks! All done, we can merge :slightly_smiling_face: I'll open an issue to remind ourselves to unpin SciPy when their bug is fixed.
If you want to work directly on the PRs I opened, I think normally you can push on someone's fork (maybe depends on repo params? but usually works for me):
git remote add rhugonnet https://github.com/rhugonnet/scikit-gstat.git
git fetch rhugonnet --prune
git checkout rhugonnet/combine_models
# Will go in "detached HEAD" mode
# Make the changes and commit them
git push rhugonnet HEAD:combine_models
If you want to work directly on the PRs I opened, I think normally you can push on someone's fork (maybe depends on repo params? but usually works for me):
Ah cool, I was unaware of that. Nice to see all the checks green again. I'll merge and go home :)
Early draft! Stopped here to discuss the way forward, I prefer having your point of view before adjusting too many things in
Variogram
! :slightly_smiling_face:For now:
Variogram._is_model_custom
defined duringset_model()
. This allows to separate the definition of bounds and p0 inVariogram.fit()
, defined based the length ofargs
frominspect
.However, doing it the current way, bounds and p0 will be quite poorly defined (at this point the script cannot tell which parameter matches which dimension). The user would likely have to pass them manually to reach a good fitting performance. Additionally, the current implementation will raise issues in
Variogram.describe()
which then passes on toVariogram.parameters
.One way we could move forward would be to divide in 2 cases:
skgstat/models
, which would allow._get_bounds()
,.describe()
,.parameters
to function normally and make it easy for the user;bounds
andp0
to reach good fitting performance (in case those aren't passed).To implement 1., however, I see several questions:
model
argument? I'm not sure of the best option. A dictionary such asmodel={"op": np.sum, "models": ["spherical", "gaussian", "matern"]}
could do the job? But maybe this will get overly complex. Thinking about it, for something more complex than a sum, I'm not sure we'll be able to benefit much from knowing the nature of the models to infer "bounds" and "p0" efficiently anyways. We could simply leave it to case number 2, and add support just the sum for convenience, for which we can easily pass a list ["spherical", "gaussian", "matern"]?.parameters
? Right now it's alist
, should we simply concatenate into a longer one keeping always the same parameter order? Or switch to a dictionary?describe()
, I guess we can simply detect that it is a combination of models and return a more detailed list of parameters.To implement 2.:
bounds
andp0
to be passed by the user. I guess this should be both inVariogram.fit()
and__init__()
? We could addfit_bounds
andfit_p0
inVariogram.__init__
. But there's already a lot of arguments, so could also be inkwargs
. What do you think?That's all I noticed for now, but there might be a few additional things down the line! :sweat_smile:
Resolves #159