tBuLi / symfit

Symbolic Fitting; fitting as it should be.
http://symfit.readthedocs.org
MIT License
234 stars 17 forks source link

ValueError for fit of ODEModel with additional data for t0 #339

Open MrAnimaniac opened 2 years ago

MrAnimaniac commented 2 years ago

I might have found a bug likely associate with simfit.ODEModel combined with fitting. More precisely, I assume it might be connected to ODEModel's the initial -dict. The issue can be reproduced using the first example provided in the simfit docs, “4.5 ODE Fitting”.

from symfit import Parameter, variables, Fit, D, ODEModel
import matplotlib.pyplot as plt
import numpy as np

tdata = np.array([0, 0, 10, 26, 44, 70, 120])
adata = 10e-4 * np.array([54, 54, 44, 34, 27, 20, 14])

a, b, t = variables('a, b, t')
k = Parameter('k', 0.1)

model_dict = {
    D(a, t): - k * a**2,
    D(b, t): k * a**2,
}

ode_model = ODEModel(model_dict, initial={t: tdata[0], a: adata[0], b: 0.0})

fit = Fit(ode_model, t=tdata, a=adata, b=None)
fit_result = fit.execute()

# Generate some data
tvec = np.linspace(0, 500, 1000)

A, B = ode_model(t=tvec, **fit_result.params)
plt.plot(tvec, A, label='[A]')
plt.plot(tvec, B, label='[B]')
plt.scatter(tdata, adata)
plt.legend()
plt.show()

will raise following error: ValueError: operands could not be broadcast together with remapped shapes [original->remapped]: (8,) and requested shape (7,)

However, it will behave as expected if you change tdata = np.array([0, 10, 10, 26, 44, 70, 120]). In this example, adata[0] will be repeatedly handled over to the algorithm. This issue, however, rises independently.

Furthermore, the difference between original (m) and requested shape (n) mentioned in the error message is proportional to the frequency of repeated t0-values (f): m - n = f-1.

pckroon commented 2 years ago

Interesting. What happens if you have no repeated values? I assume it then "just works". I think I found the cause, in ODEModel.eval_components. We split the time points in point larger and smaller than the initial points (because the integrator does not like an initial time point not at beginning), and integrate those bits separately [1]. So my guess is that one of the t=0 points gets duplicated in this step (paraphrasing):

t_bigger = t_like[t_like >= t_initial]
t_smaller = t_like[t_like <= t_initial][::-1]
t_total = np.concatenate((t_smaller[::-1][:-1], t_bigger))

Do you have for me the full traceback? Could you try with tdata = np.array([0, 10, 10, 26, 44, 70, 120]) and t_initial=10?

[1] TODO: only do the second integration if needed

MrAnimaniac commented 2 years ago

Could you try with tdata = np.array([0, 10, 10, 26, 44, 70, 120]) and t_initial=10?

yields to the same error.

What happens if you have no repeated values?

Yep. Solely important is that the t0-value defined in ODEModel’s initial-dict must be repeated in tdata more than once. For every further repetition, the [original->remapped]-shape grows by one. It is unimportant if the t0-value is the leading value of tdata. Your explanation sounds good to me.

I added a txt with my traceback. Please not, however, that I assume you could reproduce the same behaviour with the above mentioned example. traceback.txt