stan-dev / docs

Documentation for the Stan language and CmdStan
https://mc-stan.org/docs/
Other
38 stars 112 forks source link

[FR] output which functions/distributions have derivatives #432

Open spinkney opened 3 years ago

spinkney commented 3 years ago

I'm loving the provenance in the new documentation. One thing I think that would be helpful is to know which functions have derivatives and which rely on autodiff. It helps to understand from a performance perspective and helps any devs know where some potentially easy performance improvements can be found.

I wonder if just looking in the derivative folders is enough? Not sure if it captures all the derivatives for overloaded functions but is a nice start.

WardBrian commented 3 years ago

I weirdly enjoy doing this kind of tedious doc work, so I can take a look. I’m thinking it would be best to have a dagger or something notate functions with derivates, with the default being auto diff?

bob-carpenter commented 3 years ago

All of our functions other than RNGs with real-valued returns support reverse-mode autodiff for each of their non-data-qualified arguments. Not all of them support forward-mode (e.g., the solvers). There are three possible cases for each autodiff style.

I believe what @spinkney is asking is whether there is a reverse-mode specialization for a function (or the function it delegates to, which makes looking for things tricky).

The reverse vs. forward is also important for figuring out where we'll be able to use nested Laplace approximations as those require higher-order autodiff. Also, we'll only be able to use autodiff for Hessians if there is forward-mode support.

WardBrian commented 3 years ago

Two questions:

  1. What's the best way to convey this information? A big table somewhere which lists every function? A set of symbols we use on each function signature, with a key in a table somewhere explaining @bob-carpenter's comments above?
  2. Is there anything close to an automatic way of getting (any of) this information?
spinkney commented 3 years ago

For functions

For distributions this is a bit more difficult. Though if partials or operands_and_partials is used then it's almost certainly calculating an analytical derivative. Maybe we can add tags in these files to make it easier?

WardBrian commented 3 years ago

Is it safe to assume that if one signature of a function has analytic [fwd|rev] derivatives that the vectorized versions/overloads will?

spinkney commented 3 years ago

Is it safe to assume that if one signature of a function has analytic [fwd|rev] derivatives that the vectorized versions/overloads will?

I believe so.

@andrjohns has done a lot of the vectorization work and @SteveBronder may know an easy way to identify analytical derivatives.

bob-carpenter commented 3 years ago

Is it safe to assume that if one signature of a function has analytic [fwd|rev] derivatives that the vectorized versions/overloads will?

It's possible to write a specialization for foo(var, double) but not for foo(var, var), but I think in cases where we wrote one, we wrote them all.

The bigger deal is that it's possible to further reduce computation for vectorizations of operations each of which has an analytic derivative. We compact into a single vari that calls all the chain() methods and thus avoid unnecessary virtual function calls. So not all analytic derivatives will be equally efficient.

Also, we've done things like moved from reverse-mode autodiffing our ODE solvers to using coupled systems (still not quite analytic derivatives---we use the solver) to adjoint methods (which is a proper reverse-mode type specialization).

In any case, this is just basic guidance for users so we don't want to make this too fine-grained.

analytic forward derivatives (more efficient) to analytic adjunct derivatives (most efficient).

WardBrian commented 2 years ago

Happy to help with this still but I'm not 100% sure I'm familiar enough with the CPP to do this alone, so I've unassigned