Add methods to compute derivatives to Scipy (Trac #1510)

scipy-gitbot commented 11 years ago

Original ticket http://projects.scipy.org/scipy/ticket/1510 on 2011-09-05 by trac user deil, assigned to unknown.

It would be nice if scipy included numerical differentiation methods. I'm not sure where it would fit best, maybe in scipy.optimize or in scipy.misc?

scipy.optimize already contains an approx_fprime method, which approximates the Jacobian. Having a second method approx_hessian to compute the Jacobian would be very useful, because the covariance matrix can be computed as the inverse of the Hessian, and having this in scipy would make it possible to compute fit parameter errors for other optimizers than scipy.optimize.leastsq.

There are already two numpy-based python packages that implement various variants of numerical differentiation routines:

http://code.google.com/p/numdifftools/

https://github.com/statsmodels/statsmodels/blob/master/scikits/statsmodels/sandbox/regression/numdiff.py

I am not enough familiar with the differences in the methods implemented to make a suggestion which methods would be most useful to add to scipy. Maybe the authors can comment?

The need for Hessians came up in this mailing list discussion:

http://mail.scipy.org/pipermail/scipy-user/2011-September/030459.html

scipy-gitbot commented 11 years ago

Attachment added by @pbrod on 2011-09-06: hessian_errors.png

scipy-gitbot commented 11 years ago

Attachment added by @pbrod on 2011-09-06: hessian_runtimes.png

scipy-gitbot commented 11 years ago

Attachment added by @pbrod on 2011-09-06: gradient_errors.png

scipy-gitbot commented 11 years ago

Attachment added by @pbrod on 2011-09-06: gradient_runtimes.png

scipy-gitbot commented 11 years ago

@pbrod wrote on 2011-09-09

I am the coauthor of numdifftools and I can not guarantee that numdifftools is bugfree. There are some situations where the result from numdifftools is not as accurate as requested. You will find one example in this post:

[http://www.mail-archive.com/numpy-discussion@scipy.org/msg28633.html]

Numdifftools does by default determine the best step-size in an automatic way. This is not foolproof, but works in most situations for well behaved functions. In the example referred to above the automatic stepsize broke down resulting in that the final stepsize was too big to give an accurate answer. (Please note the numdifftools now calculates the derivatives correctly in the above-mentioned example, but there can always be constructed examples where numdifftools will not calculate the correct derivative.)

In the same post [http://www.mail-archive.com/numpy-discussion@scipy.org/msg28642.html] Sebastian Walter made me aware of automatic differentiation tools, which are more robust and more stable than numerical differentiation. The con is (as it seems to me) that the call syntax is more complicated. Here is an overview of available AD software libraries:

[http://en.wikipedia.org/wiki/Automatic_differentiation#Software]

In numdifftools I have made an easy-to-use interface to ScientificPython and algopy [http://code.google.com/p/numdifftools/source/browse/trunk/numdifftools/nd_algopy.py] [http://code.google.com/p/numdifftools/source/browse/trunk/numdifftools/nd_scientific.py]

and compared them to numdifftools for this function (for dimensions of the problem (N) ranging from 1 to 100):

import numpy as np

class F:
    def __init__(self,N):
        A = np.arange(N*N,dtype=float).reshape((N,N))
        self.A = np.dot(A.T,A)

    def __call__(self, xi):
        x = np.array(xi)

        tmp = np.dot(self.A,x) #np.array([(x*self.A[i]).sum() for i in range(self.A.shape[0])]) 
        return 0.5*np.dot(x*x, tmp)

The accuracy and speed results are shown in the attached files.

scipy-gitbot commented 11 years ago

trac user deil wrote on 2011-09-09

I think very few authors would make the claim that they can guarantee that their software is bug-free. :-) It's better to have a method that in certain complicated cases gives inaccurate results, compared to having no method to compute derivatives at all in scipy.

You mentioned that automatic differentiation is more robust and stable than numeric differentiation.

Another criterion to consider for inclusion is code size: numdifftools has about 1300 lines (plus 300 for the nice algopy wrapper) and algopy has about 5000 (including comment and whitespace):

numdifftools
       7 __init__.py
    1308 core.py
      84 info.py
     299 nd_algopy.py

algopy
      68 __init__.py
     111 base_type.py
     306 exact_interpolation.py
     229 globalfuncs.py
     212 utils.py
    1003 tracer/tracer.py
    1708 utpm/algorithms.py
    1938 utpm/utpm.py

And an even more important consideration is ease of use. It seems to me that numdifftools is dead simple to use (because Jacobian etc. can be used as if they were functions), and algopy requires a few minutes to understand how to use it (because one has to instantiate a complicate-sounding object first and then extract the derivative), although as you show in your nd_algopy wrapper it is possible to use algopy through the same simple interface as numdifftools.

Actually, why not include both methods in scipy and the user can just choose (similar how scipy.optimize has many different optimizers)?

result = scipy.diff.gradient(function, method="numeric")
result = scipy.diff.gradient(function, method="automatic")

One point I'd like to make is that I am just a Scipy user that was complaining that there was no method to compute derivatives in scipy. Maybe one of the scipy developers can comment if derivative methods as available now in the BSD-licenced numdifftools and/or algopy packages would be welcome in Scipy!? And if so, where in Scipy would it fit best and do they require any API changes or further tests before inclusion?

Christoph

scipy-gitbot commented 11 years ago

trac user deil wrote on 2011-09-09

I've changed the ticket summary, component and keywords to reflect the fact that this enhancement request has expanded.

It's now basically: please add a scipy.diff package to compute derivatives.

The existing scipy.misc.derivative is only for 1D and could be replaced by a more feature-complete scipy.diff.

scipy-gitbot commented 11 years ago

Title changed from Add Hessian method to Scipy to Add methods to compute derivatives to Scipy by trac user deil on 2011-09-09

scipy-gitbot commented 11 years ago

@rgommers wrote on 2011-10-30

I agree that it would be useful to have better tools in scipy to compute derivatives. This probably deserves its own discussion on scipy-dev, the previous mention got lost in the scipy.optimize interface thread.

Looking at the code, it seems to me that algopy is the package that is more polished and also probably more robust/performant. On the other hand, it's a little harder to use. So there may be a place for both in scipy, if the authors of both packages agree with that. The packages may benefit from more users. Numdifftools would require some work I think before it could be included.

I'm not so worried about code size, both packages are small and pure Python. Code quality, documentation and test coverage matter more than size.

scipy-gitbot commented 11 years ago

trac user deil wrote on 2011-10-30

Both authors replied that they would welcome the inclusion of their package into scipy, but that they don't have time to actively work on the integration, but are available for questions and possibly some maintenance.

Ralf, if you agree that a separate scipy.diff module would be a good solution I would try to fork scipy and include both packages there and give them a uniform calling interface. Once that works I could send an email to scipy-dev to start a discussion.

scipy-gitbot commented 11 years ago

@rgommers wrote on 2011-10-30

I would start the discussion earlier if I were you. It would be a shame if you put in quite some effort, then at the end find out that there are objections or people want separate interfaces, or ...

scipy-gitbot commented 11 years ago

@rgommers wrote on 2011-10-30

Oh, and: great you are willing to work on this!

scipy-gitbot commented 11 years ago

@rgommers wrote on 2012-02-19

I'll note here that this was discussed in the thread http://permalink.gmane.org/gmane.comp.python.scientific.devel/15790, with positive feedback.

scipy-gitbot commented 11 years ago

@rgommers wrote on 2012-02-19

If this happens, gh-1091 should be closed and misc.derivative removed.

cdeil commented 10 years ago

In case someone has the time / skill to undertake this project of adding a scipy.diff ... here's another license-compatible package that looks promising: https://pypi.python.org/pypi/ad

I'll write an email to the author to make him aware ... just in case he's interested.

cc @b45ch1

jseabold commented 10 years ago

FWIW, we have some numerical differentiation routines that we're using that would be fairly easy to drop in.

https://github.com/statsmodels/statsmodels/blob/master/statsmodels/tools/numdiff.py

cdeil commented 10 years ago

The ad package and @tisimst are on github: https://github.com/tisimst/ad

rgommers commented 10 years ago

@argriffing looks like you have some interest in this discussion as well. And I see you're an expert, having made a lot of algopy commits:)

josef-pkt commented 10 years ago

My main question for including automatic differentiation in scipy are How mature are those packages now? Are they used by any other packages? Is there enough common ground between these packages, algopy, ad and the openopt packages, to have a design that fits in scipy? (given the backwards compatibility policy of scipy)

I recently looked at the packages by tisimst including ad, and I thought it is time that we start to use AD in statsmodels.to see how working with those compares to numerical differencing and analytical derivatives. As far as I remember, several other (commercial) statistics packages mention that they use AD.

argriffing commented 10 years ago

@argriffing looks like you have some interest in this discussion as well. And I see you're an expert, having made a lot of algopy commits:)

Yes, I've made some algopy commits, although I wouldn't call myself an expert :) The executive summary of my opinion is that if the less-than-sophisticated implementation of finite differences evaluations of derivatives is to be moved out of the misc module, and if we still want derivatives approximation in scipy, then it would make sense to use a more sophisticated mature BSD licensed finite-differences derivatives approximation package, probably as a separate scipy module. I think that numdifftools is good enough and small enough and mature enough to include, but I am too lazy to do this soon myself.

Just to make sure everyone's on the same page here's a list of some ways to do derivatives on the computer with varying generality and accuracy, with short descriptions:

Provide 1st order derivatives of some functions, like jv and jvp in scipy.special.
For most univariate functions, use a keyword argument to specify that you want the nth derivative, or possibly all derivatives of order <= n. This is not implemented in numpy or scipy, but I've coded this myself and it is not so bad. The most annoying part is the argument order and naming inconsistencies in numpy and scipy functions, for example the out= keyword argument.
Symbolic derivatives. This is like what you would do with sympy. The idea is to use a handful of derivative rules, plus the chain rule, and get a symbolic expression of the nth order derivative as output, given a symbolic expression of the function as input.
Finite-differences. This family of schemes to approximate derivatives is based on black-box function evaluation. It is also called "numerical differentiation" although sometimes that term could be ambiguous, because it would seem to be as opposed to "symbolic differentiation" but this dichotomy does not really capture the diversity of derivative evaluation algorithms. This family includes the simple (f(x+eps) - f(x))/eps approximation as well as fancier or higher order algorithms with special sauce like Richardson extrapolation. Numdifftools would be an example.
Automatic differentiation (also called algorithmic differentiation or AD). This method is not symbolic differentiation and is not based on finite-differences approximation. It is perhaps less well known than it deserves, probably because its name is so innocent that clever people reasonably but incorrectly guess what it is. Here's the wiki and a rant. Commercial software packages use this, for example in SAS and in the various commercial packages associated with operations research. People I know in those research fields take it for granted that if you can write down an expression for a smooth function then the computer can efficiently and accurately evaluate its derivatives at given points, and under the hood this works by automatic differentiation.

Scipy currently includes a less-sophisticated implementation of (4) in the misc module which may not be long for this world, as well as partially (1) and (2) for some special functions.

I think most people would agree that symbolic derivatives are not in scipy's scope.

Although automatic differentiation is more numerically accurate than methods based on finite differences, I would argue that the finite differences methods are not strictly dominated by automatic differentiation, mainly because finite differences is more robust to black-box function calls (for example to Fortran or C, or even to Cython if the function depends on the low-level memory interface). These would break automatic differentiation implementations based on polymorphism, which includes all Python AD implementations I've seen. Therefore it would not be a waste of time to put sophisticated finite-differences methods into scipy even if automatic differentiation is added at some point in the future when its packages are more mature and when objects with ndarray-like interfaces are more common and well-supported.

tisimst commented 10 years ago

I must agree with argriffing. Although I love the practical usefulness of AD ( having authored ad), the truth is, there are just too many practical applications that can't use it because of the use of compiled code to perform some part of the calculations. For example, all (or nearly all) of the numpy.linalg routines compiled lapack functions that are compiled in Fortran. The sad news? AD can't touch them from the outside. Those routines would have to (and certainly could be) rewritten with a Fortran AD package, but my meager guess is that that would buy us very little for all the effort. Who knows, though...

AD has its place, but I think that finite difference schemes are a must-have if we want to publish something that is very useful for the python computing community at-large in the most general sense. And, yes, I too agree that symbolic differentiation shouldn't be in scipy's realm.

b45ch1 commented 10 years ago

AD vs FD

I believe the point Alex (argriffing) was trying to make is that this is not a decision of one over the other.

The unique selling propositions of AD are:

the cheap gradient principle: the computational cost to evaluate the gradient is only a small constant times the cost to evaluate the function (reverse mode)
the numerical accuracy of the derivatives (Jacobian, gradient, Hessian, ...) is on par with the function evaluation and often close to the machine precision approx. 1.e-16
can compute higher-order derivatives

The unique selling proposition of FD are:

Easy to use
works seamlessly on black-box code

So there are cases when FD is the method of choice, and the other way around. Typically, for more complex problems, FD will not work well, and it really pays off to invest a little more time to apply an AD tool. So I would actually argue that in practice FD fails whereas AD consistently gives good results.

@tisimst The ability to differentiate functions from numpy.linalg is one of the USPs of AlgoPy. So it is certainly possible to differentiate those functions using AD ;)

adding an AD tool to scipy

I think that the current status quo is not that bad. If you need an AD tool, you can easily install one using pip. E.g., pip install algopy. How much easier can it get? What would the advantages be to add any AD tool to scipy from a users point of view?

argriffing commented 10 years ago

Hah, I just saw this and I didn't know misc.derivative was in so urgent need of repair, or I'd forgotten.

>>> import scipy
>>> import numdifftools as nd
>>> def cube(x): return x*x*x
... 
>>> scipy.misc.derivative(cube, 2)
13.0
>>> nd.Derivative(cube)(2)
array([ 12.])

Although it is technically correct that the numerical approximation of the derivative of f(x) = x^3 at x=2 could be said to be 13 if we are using a default eps=1, this would seem to violate the principle of least surprise to put it mildly. I think it would make more sense to have eps be a required argument, but I guess this doesn't matter if we are replacing this function anyway.

argriffing commented 10 years ago

Here's a numdifftools PR https://github.com/scipy/scipy/pull/2835. The docs and testing are incomplete (non-existent). I think that the only place misc.derivative was used inside scipy was in the calculation of pdf = derivative(cdf) in distributions.py. I've replaced that one case with a numdifftools derivative call.

BenFrantzDale commented 10 years ago

For numerical differentiation I just use scipy.ndimage.gaussian_filter. I've used that to make a grad and Hessian function before...

But that's not obvious.

rgommers commented 10 years ago

@b45ch1 from a user's point of view it would be easier to obtain (good luck trying to use pip if you're in a company behind a firewall) and more discoverable.

More importantly, it could be used in other parts of scipy and libraries like statsmodels that now use something home-cooked or bundled. Differentiation is a quite basic tool that has wide application - and like it or not, not many libraries are not willing to add dependencies on little packages like algopy or numdifftools. Maybe in a perfect world where distutils had been replaced by something robust:)

Scipy needs at least FD, and there are some advantages to having AD as well, although that's perhaps less critical. Of course there's also a maintenance cost, so it's not completely clear-cut.....

b45ch1 commented 10 years ago

Myself, I always have some issues when I try to install scipy ;) E.g., it took my hours to setup Travic CI for AlgoPy since the version that was available from apt-get was too outdated: 1) pip install scipy 2) compile/link error 3) add dependency 4) goto 2 In the end it didn't compile for some reason I forgot, so I resorted to use an unverified PPA that contained a more recent version of scipy. I would argue that if an AD tool doesn't have scipy as a dependency it is much more convenient for a user to install it using pip.

That being said, AlgoPy depends on Scipy so the user has to install scipy anyway. So from a users point of view it would actually be better to have both of them in one package.

argriffing commented 10 years ago

I predict that AD is ahead of its time, and that it will become mainstream within the next generation of whatever replaces numpy, and only for 1st and 2nd derivatives.

cdeil commented 10 years ago

Would implementing scipy.diff be a good project for GSoC project?

To me it seems so, because it's an important missing building block in scipy and it's very clear what to do and the difficulty / scope seems OK for GSoC. If the student is good, it could be extended with optional parts, e.g. to add a function to scipy.optimize to compute parameter errors for nonlinear fitting (compute inverse matrix of second derivatives of the objective function).

If yes, I'm very interested to see this realised, and I'd take the time to help with code review and testing / documentation, but I'm not a numerics person, so at least one other person that knows about the intricacies of numerical derivatives (and numdifftools in particular) would have to want to be a mentor.

rgommers commented 10 years ago

It could be a good project. Probably a bit late now. And hopefully we'll get this done before next years' GSoC. @argriffing made a good start already.

argriffing commented 10 years ago

fwiw @pbrod seems to have moved numdifftools development from google code to https://github.com/pbrod/numdifftools

argriffing commented 9 years ago

It could be a good project. Probably a bit late now. And hopefully we'll get this done before next years' GSoC.

IIRC the technical challenge in adding this to scipy is dealing with backward compatibility with the existing finite differences derivatives estimation in some stats functions. The current finite differences code is not as sophisticated as numdifftools, but it is broadcastable or uses broadcasting in a different way.

cdeil commented 9 years ago

I've added scipy.diff as a project idea for GSoC 2015: https://github.com/scipy/scipy/wiki/GSoC-project-ideas#implement-scipydiff

I'm interested to co-mentor, but I think this project should only be offered if there's a numerical derivative expert (which I'm not) willing to mentor. If anyone here is interested to do this this summer, please add your name.

rgommers commented 9 years ago

@cdeil thanks, good idea! I've added your name for now, we can always add more mentors when there is serious interest - one of the Scipy devs definitely also needs to co-mentor, and we can always try to find a domain expert if needed.

argriffing commented 9 years ago

I see that @pbrod has been working on numdifftools recently. This is great! I'd thought that he had more or less abandoned it to the internet.

maniteja123 commented 9 years ago

Hi everyone, I was wondering if someone could guide me on the GSoC idea of implementing numerical differentiation package, maybe give some examples as to why broadcasting behavior is elegantly handled in statsmodels and as to whether complex step is better or the adaptive step, or issues that are critical.

I have also written a mail to the mailing list regarding this http://mail.scipy.org/pipermail/scipy-dev/2015-March/020467.html.

I have just drafted a wiki page, https://github.com/maniteja123/GSoC/wiki , so that I can keep updating on the issues. Please do have a look at it. As of now, I have just put together the ideas and discussions from various threads and mailing lists. Thanks :)

shoyer commented 7 years ago

I love automatic differentiation, but I'm not sure it's a good project for SciPy (especially not for a GSOC project). There are lots of other great tools out there that implement automatic differentiation in Python, including TensorFlow, Theano and autograd.

rgommers commented 7 years ago

@shoyer I agree, it's better finite differences we need. that's what this issue, and the GSoC proposal @maniteja123 wrote 2 years ago, converged to

scipy / scipy

Add methods to compute derivatives to Scipy (Trac #1510) #2035

AD vs FD

adding an AD tool to scipy