tBuLi / symfit

Symbolic Fitting; fitting as it should be.
http://symfit.readthedocs.org
MIT License
232 stars 17 forks source link

symfit stdev are NaN #340

Open anna-zk opened 2 years ago

anna-zk commented 2 years ago

Hi, I have a problem with global fitting of two data sets (with four fitting parameters) with symfit Fit function, which gives me the fitting results, but all the standard deviations are nan. Here is my code:


from symfit import parameters, variables, Fit, Model, exp
import numpy as np

xdata = np.array([0.25, 0.5, 0.8, 1, 1.5, 2.5, 4, 6])
ydata = np.array([0.013086325798209482, 0.010144832077553744, 0.00807480228676198, 0.007560555533483845, 0.0061854497435148105, 0.005099733409299026, 0.0038812123438669768, 0.0031145317486463066])
ydata1 = np.array([0.012624136137849127, 0.00884329432480036, 0.006747803674437807, 0.006389813737979039, 0.004918439951078863, 0.0035903310403010594, 0.002554518199002905, 0.0020277651510502066])

Ra, Rb, Ka, Kb = parameters('Ra, Rb, Ka, Kb')
L0, Rof, Rof1 = variables('L0, Rof, Rof1')

s = 1.3
P0 = 0.01

##Alfa
a = s*Kb + 2*Ka + (Ka**2/s*Kb)
b = -(L0*s*Kb + P0*s*Kb + Ka*P0 + Ka*L0 + Ka*Kb + Ka*Kb*s)
c = P0*L0*s*Kb
delta = b**2 - 4*a*c
sdelta = delta ** (1/2)
ca = (-b - sdelta)/(2*a)

##Beta
a1 = Ka + 2*Kb*s + (Kb**2*s**2)/Ka
b1 = -(L0*Ka + P0*Ka + Kb*s*P0 + Kb*s*L0 + Ka*Kb*s*(1+(1/s)))
c1 = P0*L0*Ka
delta1 = b1**2 - 4*a1*c1
sdelta1 = delta1 ** (1/2)
cb = (-b1 - sdelta1)/(2*a1)

model1 = Model({
        Rof: Ra * ca / L0,
        Rof1: Rb * cb / L0,
})

fit = Fit(model1, L0=xdata, Rof=ydata, Rof1=ydata1) 
fit_result = fit.execute()
print(fit_result)

(i know that my model is a bit complicated...)

And here is the results that I get:

RuntimeWarning: invalid value encountered in sqrt return np.sqrt(self.variance(param))

Parameter Value Standard Deviation Ka 8.842217e-01 nan Kb 7.191712e-01 nan Ra 2.699504e+00 nan Rb 2.496971e+00 nan Status message Optimization terminated successfully. Number of iterations 41 Objective <symfit.core.objectives.LeastSquares object at 0x7f926fc8aa90> Minimizer <symfit.core.minimizers.BFGS object at 0x7f926fc3c8d0>

Goodness of fit qualifiers: chi_squared 6.016628549760758e-06 objective_value 3.008314274880379e-06 r_squared 0.9634526939817522

I will be grateful for any help!

pckroon commented 2 years ago

There could be many causes. Did you do a visual inspection of the resulting fit?

Does the resulting fit have a covariance matrix (fit_result.covariance_matrix)? Can you get a hessian from the objective? (hess = fit_result.objective.eval_hessian(**fit_result.params))) Can you invert the hessian matrix? (np.linalg.inv(hess)) This is probably where things start going wrong :) If your Hessian is singular/non invertible there is probably something wrong with your model. See also, for example, https://stats.stackexchange.com/questions/181515/what-does-hessian-is-singular-mean-in-sas-proc-nlin/181528

anna-zk commented 2 years ago

Indeed, there was a problem with my model, I found a mistake there, but after corrections the error remains. Here is the current code:

from symfit import parameters, variables, Fit, Model, exp
import numpy as np

xdata = np.array([0.25, 0.5, 0.8, 1, 1.5, 2.5, 4, 6])
ydata = np.array([0.01389280763166483, 0.010951313911009092, 0.008881284120217327, 0.008367037366939192, 0.006991931576970159, 0.005906215242754372, 0.004687694177322324, 0.003921013582101654])
ydata1 = np.array([0.013222900783408978, 0.009442058970360209, 0.007346568319997657, 0.006988578383538889, 0.005517204596638713, 0.004189095685860909, 0.003153282844562755, 0.002626529796610057])

Rbounda, Rboundb, Ka, Kb = parameters('Rbounda, Rboundb, Ka, Kb')
L0, Robsa, Robsb = variables('L0, Robsa, Robsb')

s = 1.3
P0 = 0.01
Rfreea = 0.0008064818334553477
Rfreeb = 0.0005987646455598504

a = s*Kb + 2*Ka + (Ka**2/(s*Kb))
b = -(L0*s*Kb + P0*s*Kb + Ka*P0 + Ka*L0 + Ka*Kb + Ka*Kb*s)
c = P0*L0*s*Kb
delta = b**2 - 4*a*c
sdelta = delta ** (1/2)
ca = (-b - sdelta)/(2*a)
La = s*((L0-ca*(1+(Ka/(s*Kb))))/(1+s))
Lb = La/s
cb = (Ka*ca)/(Kb*s)

model1 = Model({
        Robsa: Rfreea * La/(La+ca) + Rbounda * ca/(La+ca),
        Robsb: Rfreeb * Lb/(Lb+cb) + Rboundb * cb/(Lb+cb),
})

fit = Fit(model1, L0=xdata, Robsa=ydata, Robsb=ydata1) 
fit_result = fit.execute()

print(fit_result)

print('covariance_matrix', fit_result.covariance_matrix)
hess = fit_result.objective.eval_hessian(**fit_result.params)
print('hess', hess)
print('invert_hess', np.linalg.inv(hess))

And the current result, together with covariance matrix, hessian and inversion of the hessian, is below:

Parameter Value        Standard Deviation
Ka        4.494177e-01 5.808837e-02
Kb        2.406486e+04 nan
Rbounda   7.840503e-01 7.301278e-02
Rboundb   3.627573e+04 nan
Status message         Optimization terminated successfully.
Number of iterations   143
Objective              <symfit.core.objectives.LeastSquares object at 0x7f504ead96d0>
Minimizer              <symfit.core.minimizers.BFGS object at 0x7f5040539750>

Goodness of fit qualifiers:
chi_squared            5.875779424169528e-06
objective_value        2.937889712084764e-06
r_squared              0.9643082655120222
covariance_matrix [[ 3.37425824e-03  7.56349353e+04  3.41761415e-03  1.14015517e+05]
 [ 7.56349353e+04 -4.95053731e+11  7.70199754e+04 -7.49602721e+11]
 [ 3.41761415e-03  7.70199754e+04  5.33086667e-03  1.16104018e+05]
 [ 1.14015517e+05 -7.49602721e+11  1.16104018e+05 -1.13501425e+12]]
hess [[ 1.23903953e-03 -1.44730853e-08 -7.99195853e-04  9.60123981e-09]
 [-1.44730853e-08  6.40321239e-13  1.48229755e-13 -4.24344173e-13]
 [-7.99195853e-04  1.48229755e-13  7.85776782e-04  0.00000000e+00]
 [ 9.60123981e-09 -4.24344173e-13  0.00000000e+00  2.81214768e-13]]
invert_hess [[ 2.29706257e+03  5.14892952e+10  2.32657756e+03  7.76172884e+10]
 [ 5.14892952e+10 -3.37013149e+17  5.24321761e+10 -5.10300109e+17]
 [ 2.32657756e+03  5.24321761e+10  3.62904478e+03  7.90390580e+10]
 [ 7.76172884e+10 -5.10300109e+17  7.90390580e+10 -7.72673148e+17]]

I can see that some diagonal elements of the covariance matrix are negative, this is strange, right? Also, the result is now strange, Ka and Kb differ by 5 orders of magnitude, while I would expect them to be similar (up to the factor of 10, maybe).

anna-zk commented 2 years ago

In my previous post there was some problem with code formatting, now it should be OK:

from symfit import parameters, variables, Fit, Model, exp
import numpy as np

xdata = np.array([0.25, 0.5, 0.8, 1, 1.5, 2.5, 4, 6])
ydata = np.array([0.01389280763166483, 0.010951313911009092, 0.008881284120217327, 0.008367037366939192, 0.006991931576970159, 0.005906215242754372, 0.004687694177322324, 0.003921013582101654])
ydata1 = np.array([0.013222900783408978, 0.009442058970360209, 0.007346568319997657, 0.006988578383538889, 0.005517204596638713, 0.004189095685860909, 0.003153282844562755, 0.002626529796610057])

Rbounda, Rboundb, Ka, Kb = parameters('Rbounda, Rboundb, Ka, Kb')
L0, Robsa, Robsb = variables('L0, Robsa, Robsb')

s = 1.3
P0 = 0.01
Rfreea = 0.0008064818334553477
Rfreeb = 0.0005987646455598504

a = s*Kb + 2*Ka + (Ka**2/(s*Kb))
b = -(L0*s*Kb + P0*s*Kb + Ka*P0 + Ka*L0 + Ka*Kb + Ka*Kb*s)
c = P0*L0*s*Kb
delta = b**2 - 4*a*c
sdelta = delta ** (1/2)
ca = (-b - sdelta)/(2*a)
La = s*((L0-ca*(1+(Ka/(s*Kb))))/(1+s))
Lb = La/s
cb = (Ka*ca)/(Kb*s)

model1 = Model({
        Robsa: Rfreea * La/(La+ca) + Rbounda * ca/(La+ca),
        Robsb: Rfreeb * Lb/(Lb+cb) + Rboundb * cb/(Lb+cb),
})

fit = Fit(model1, L0=xdata, Robsa=ydata, Robsb=ydata1) 
fit_result = fit.execute()

print(fit_result)

print('covariance_matrix', fit_result.covariance_matrix)
hess = fit_result.objective.eval_hessian(**fit_result.params)
print('hess', hess)
print('invert_hess', np.linalg.inv(hess))
anna-zk commented 2 years ago

Well, I don't know why inserting the code does not work for me now, I hope you will understand it anyway.

pckroon commented 2 years ago

Did you do a visual inspection of this fit? :) ALWAYS look at fits, don't be complacent and settle for e.g. R^2 [1]. It's most likely a numerical or convergence issue. The variance (diagonal of the covariances) being negative is indeed a very big red flag.

Given that you are surprised by k having such different values, do you have an ballpark idea for the value they should have? If so, set those as initial values (Ka.value=1e-2), same for the other 2 parameters. That should help your fit converge. Symfit comes with an interactive guess contrib module which might help. Also consider adjusting the tolerance of the fit (execute(tol=1e-16)) if that doesn't help. If all that doesn't help, consider multiplying your model and y data with a factor to get the values between 0.1 and 1. That will affect your covariances though...

See https://stackoverflow.com/questions/28702631/scipy-curve-fit-returns-negative-variance for more details.

[1] https://en.wikipedia.org/wiki/Anscombe%27s_quartet

PS. code should be in so-called backticks (`), next to the number 1 on your keyboard.

anna-zk commented 2 years ago

Here are the plots of the experimental points together with the fits: file:///nhome/anna/Figure_1.png

The orange fit looks pretty fine, but the blue one deviates from data for larger 'x'. Setting both Ka and Kb equal 0.5 (initially) almost doesn't change the results: Parameter Value Standard Deviation Ka 4.571113e-01 6.023765e-02 Kb 1.294094e+04 nan Rbounda 7.918653e-01 7.541881e-02 Rboundb 1.932618e+04 nan

Still Kb is 10^5 larger than Ka. Similar results I get while setting tol=1e-16 (then I get also a communicate "Desired error not necessarily achieved due to precision loss." Multiplication of data (y) and model by a factor of 50 (so that my y values are now between 0.1 and 0.7) also does not change the results.

pckroon commented 2 years ago

Your initial guesses are not good (enough):

Initial values:
Ka: 0.75
Kb: 0.45
Rbounda: 2.0
Rboundb: 0.86

Parameter Value        Standard Deviation
Ka        4.528586e-01 1.697666e-01
Kb        7.983618e+03 3.629505e+06
Rbounda   7.875582e-01 1.820861e-01
Rboundb   1.198447e+04 5.449942e+06
Status message         Optimization terminated successfully.
Number of iterations   150
Objective              <symfit.core.objectives.LeastSquares object at 0x7fd83b92bca0>
Minimizer              <symfit.core.minimizers.BFGS object at 0x7fd83b92bf40>

Goodness of fit qualifiers:
chi_squared            5.874081979620764e-06
objective_value        2.937040989810382e-06
r_squared              0.9643185764402871
covariance_matrix [[ 2.88207112e-02 -4.25726822e+05  3.00104367e-02 -6.39425043e+05]
 [-4.25726822e+05  1.31733071e+13 -4.55215425e+05  1.97805923e+13]
 [ 3.00104367e-02 -4.55215425e+05  3.31553461e-02 -6.83706772e+05]
 [-6.39425043e+05  1.97805923e+13 -6.83706772e+05  2.97018707e+13]]
hess [[ 1.22108942e-03 -4.30622959e-08 -7.91110193e-04  2.86863288e-08]
 [-4.30622959e-08  5.79446561e-12  1.34790073e-12 -3.85987459e-12]
 [-7.91110193e-04  1.34790073e-12  7.78868903e-04  0.00000000e+00]
 [ 2.86863288e-08 -3.85987459e-12  0.00000000e+00  2.57118317e-12]]
invert_hess [[ 1.96256786e+04 -2.89901860e+11  2.04358310e+04 -4.35421259e+11]
 [-2.89901860e+11  8.97046189e+18 -3.09982344e+11  1.34697421e+19]
 [ 2.04358310e+04 -3.09982344e+11  2.25773806e+04 -4.65575234e+11]
 [-4.35421259e+11  1.34697421e+19 -4.65575234e+11  2.02257107e+19]]

Note that the errors I get for Kb and Rboundb are both very large, which should prompt you to take a critical look at your model. If you want to limit parameters to specific value ranges (e.g. Rboundb <= 10), set minimum and maximum values (e.g. Rboundb.max = 10)