mmaelicke / scikit-gstat

Geostatistical variogram estimation expansion in the scipy style
https://mmaelicke.github.io/scikit-gstat/
MIT License
225 stars 53 forks source link

ValueError: Each lower bound must be strictly less than each upper bound. #34

Closed frigusgulo closed 4 years ago

frigusgulo commented 4 years ago

There seems to be an issue with the variogram fitting procedure. In "scipy/optimize/minpack.py", the call to the "least_squares" is returning the error "ValueError: Each lower bound must be strictly less than each upper bound". This inputs are randomly selected points from a digital elevation model.

mmaelicke commented 4 years ago

Hey Frankie,

Thanks for submitting this issue. I will need some more information on this. Do you have a minimal example to reproduce the error? Can you also please submit a copy or a screenshot of the traceback?

What I can say is that the fitting procedure of the Variogram class will set initial guesses for the variogram parameters from the samples. But I cannot imagine how the lower bound can be set larger than the upper bound, unless all observations used (inside maxlag setting) are NaN.

You can try as a first guess into the blue, try to switch fitting method to fit_method='lm', which will use Levenberg-Marquardt least squares.

Best, Mirko

frigusgulo commented 4 years ago

Screenshot from 2020-04-28 14-58-21

frigusgulo commented 4 years ago

This is the screenshot. I filtered for NaN values for the response value inputs, but it could be likely that the response vector is all zeros. I will switch to Levenberg-Marquardt and see how it fares. Thanks for this excellent library!

mmaelicke commented 4 years ago

Hey, Do you have a code example for reproducing the error?

What I can see from your Traceback is: you are passing in the maxlag=int(m/2). I can't see what ' m' is, I just can note that maxlag is the maximum distance at which point pairs are considered for vairogram estimation. So for the case that m > 2, which would be true if m is the number of samples or something like that, maxlag would be larger than 1 and following the [documentation of the Variogram.__init__(https://mmaelicke.github.io/scikit-gstat/reference/variogram.html#skgstat.Variogram.__init__) be interpreted as an absolute maximum distance (using the units of your coordinates).

If that is the case, no point pairs are found within maxlag. The default fitting procedure will set initial guesses for the parameters in __get_fit_bounds.

this looks like:

 bounds = [np.nanmax(x), np.nanmax(y)]

here: x = Variogram.bins[~np.isnan(Variogram.experimental)] and y = Variogram.experimental[~np.isnan(Variogram.experimental)]. Therefore, if all bins or experimental values are NaN, because no point pairs within maxlag were found, it will use the lower and upper bound of (0, np.NaN) for fitting, which will result in the shown error.

What you can do to get an idea if something in this direction is happening: Use the pdist function of scipy to calculate the distance matrix. Assuming the same variable names as shown in your Traceback.

from scipy.spatial.distance import pdist

dm = pdist(coords)
maxlag = int(m/2)
print(maxlag)

if (dm > maxlag).all():
  print('No point pairs can be found')
print('Total : %d     within: %d' % (len(dm), sum(dm > maxlag)))

That should show us, if there are enough point pairs within the maxlag. I hope this helps,

Mirko

frigusgulo commented 4 years ago

Mirko, I will make a good screen shot for reproducing the errors, and show the inputs as well. I am iterating over a raster with somewhat large swaths of 0 values and my suspicion is that all response values are the same (I will be adding gaussian noise to solve this problem).


From: Mirko Mälicke notifications@github.com Sent: Wednesday, April 29, 2020 12:56 AM To: mmaelicke/scikit-gstat scikit-gstat@noreply.github.com Cc: Frankie Dunbar franklyn.dunbar@umontana.edu; Author author@noreply.github.com Subject: Re: [mmaelicke/scikit-gstat] ValueError: Each lower bound must be strictly less than each upper bound. (#34)

Hey, Do you have a code example for reproducing the error?

What I can see from your Traceback is: you are passing in the maxlag=int(m/2). I can't see what ' m' is, I just can note that maxlag is the maximum distance at which point pairs are considered for vairogram estimation. So for the case that m > 2, which would be true if m is the number of samples or something like that, maxlag would be larger than 1 and following the [documentation of the Variogram.init(https://mmaelicke.github.io/scikit-gstat/reference/variogram.html#skgstat.Variogram.init) be interpreted as an absolute maximum distance (using the units of your coordinates).

If that is the case, no point pairs are found within maxlag. The default fitting procedure will set initial guesses for the parameters in __get_fit_boundshttps://github.com/mmaelicke/scikit-gstat/blob/566b1bbfa635484ce6f58ee1100c76f214877e64/skgstat/Variogram.py#L1166.

this looks like:

bounds = [np.nanmax(x), np.nanmax(y)]

here: x = Variogram.bins[~np.isnan(Variogram.experimental)] and y = Variogram.experimental[~np.isnan(Variogram.experimental)]. Therefore, if all bins or experimental values are NaN, because no point pairs within maxlag were found, it will use the lower and upper bound of (0, np.NaN) for fitting, which will result in the shown error.

What you can do to get an idea if something in this direction is happening: Use the pdist function of scipy to calculate the distance matrix. Assuming the same variable names as shown in your Traceback.

from scipy.spatial.distance import pdist

dm = pdist(coords) maxlag = int(m/2) print(maxlag)

if (dm > maxlag).all(): print('No point pairs can be found') print('Total : %d within: %d' % (len(dm), sum(dm > maxlag)))

That should show us, if there are enough point pairs within the maxlag. I hope this helps,

Mirko

— You are receiving this because you authored the thread. Reply to this email directly, view it on GitHubhttps://github.com/mmaelicke/scikit-gstat/issues/34#issuecomment-621023458, or unsubscribehttps://github.com/notifications/unsubscribe-auth/AKOECUC77XNJOCRK5CGOVODRO7FQ7ANCNFSM4MS73DVA.

mmaelicke commented 4 years ago

Ahhh, yeah. If all inputs are 0, the fit method will pass the lower and upper bound of (0, 0) and the error is raised.

These are all the nice cases you don't think of while developing. As soon as you can confirm that was actually happening and a bit of noise solves the problem, I will close the issue. I will nevertheless consider adding a warning if all inputs are of the same value.