tBuLi / symfit

Symbolic Fitting; fitting as it should be.
http://symfit.readthedocs.org
MIT License
233 stars 17 forks source link

Slow Model building #271

Closed ValentinQ closed 4 years ago

ValentinQ commented 4 years ago

Hi, I'm trying to create a global fitting model with 51 PL decay that will have to be fitted with shared parameters( data.shape = (9219, 51) ). I have noted that the construction of the model become expenentially slower as the number of equations increases. as a quick demo

ys = variables(', '.join('y{}'.format(i) for i in range(0, data.shape[1])))
x = Variable('x')
As = parameters(', '.join('a{}'.format(i) for i in range(0, data.shape[1])))
Bs = parameters(', '.join('b{}'.format(i) for i in range(0, data.shape[1])))
t1 = Parameter('t1',value=300, min=200, max=600)
t2 = Parameter('t2',value=2000, min=700, max=3000)

a = []
for e in range(1,15):
    starts = time.time()
    model = Model({
        y : a*sy.exp(-(x/t1))+b*sy.exp(-(x/t2))
            for y, a,b in zip(ys[:e],As[:e],Bs[:e])
    })
    now = time.time()
    a.append(now-starts)
    print(now-starts)
plt.plot(a,'*')
plt.xlabel('number of equations')
plt.ylabel('time (s)')

result in the followig plot

is it an expected behaviour ?

I remember that previous release wasn't that slow plot1

tBuLi commented 4 years ago

Interesting issue, this is definitely unforeseen if it became worse with the new release. Would it possible for you to make the same plot in 0.4.x for comparison?

As a general comment, if it is possible to convert your model into a matrix equation then that will definitely be faster, so that might be worth trying in your case, see cell 4 here for an example of how to define a matrix based model. More examples of this should be added in the future.

pckroon commented 4 years ago

Quadratic growth I would immediately believe due to the calculation of a Hessian, which will be n_components*n_parameters**2 in size. So in your case, something that grows as n**3 is reasonable/expected. Could you have a look at memory use as you run this? If you run out of RAM, and your computer starts swapping to disk a drastic slowdown is to be expected.

EDIT: formatting

ValentinQ commented 4 years ago

Interesting issue, this is definitely unforeseen if it became worse with the new release. Would it possible for you to make the same plot in 0.4.x for comparison?

As a general comment, if it is possible to convert your model into a matrix equation then that will definitely be faster, so that might be worth trying in your case, see cell 4 here for an example of how to define a matrix based model. More examples of this should be added in the future.

I will try that, indeed the last run I did was on 0.4.6 and is was definitly faster, I managed to fit 20 curves without too much trouble and a resonable amount of time

tBuLi commented 4 years ago

This is also related to #267. The Hessian is definitely the problem, and since most minimizers don't even support it this behavior should be changed. I'm thinking of symbolically evaluating Jacobians and Hessians lazily, instead of by default like now. That way, only users who actually want to use a Hessian minimizer have to pay this price.