roualdes / bridgestan

BridgeStan provides efficient in-memory access through Python, Julia, and R to the methods of a Stan model.
https://roualdes.github.io/bridgestan
BSD 3-Clause "New" or "Revised" License
87 stars 12 forks source link

remaining autodiff functors #84

Open bob-carpenter opened 1 year ago

bob-carpenter commented 1 year ago

There are a bunch of autodiff functors that are implemented in Stan but not exposed yet in BridgeStan. The two most basic are already done. Most of them other than directional derivatives require forward-mode autodiff. Please feel free to add more requests to the list.

There's no inverse Hessian vector product in Stan. I'm not sure the best way to implement that---I think there are a lot of approaches because the direct way is so expensive (vector / Hessian in Stan).

WardBrian commented 1 year ago

The doc for grad_hessian requires that

The functor must implement fvar<fvar<var> > operator()(const Eigen::Matrix<fvar<fvar<var> >, Eigen::Dynamic, 1>&)

which the model_base log_probs currently do not. We'd need to seek upstream changes similar to what we did for the opt-in fvar Hessians in https://github.com/stan-dev/stan/pull/3144

For the others, they don't document their requirements nearly as nicely, but as far as I can tell they only require fvar<var> (e.g. no deeper nesting of fvars) which should be doable at the moment on the C++ level.

I'm not sure how nicely optional symbols play with the various language interfaces. We currently don't have any functions which exist conditionally, so I'd be curious how that plays out

bob-carpenter commented 1 year ago

There are two versions of Hessian.

  1. stan/math/mix/functor/hessian.hpp requires fvar<var> and does N evals for an N-dimensional density.

  2. stan/math/fwd/functor/hessian.hpp requires fvar<fvar<double>> and does N^2 evals of an N-dimensional density.

We want to use approach (1). They implement the same interface, so it's just a matter of changing the include.

WardBrian commented 1 year ago

grad_hessian as currently implemented doesn't seem to actually call either version of hessian directly, so it will probably be a bit more invasive of a change than just an include. Still an upstream change, just not the one I was originally imagining (and probably an easier one than fvar<fvar<var>> for log_prob)

It seems to me like third-order autodiff should require that extra level of nesting? But I may be imagining this incorrectly.

bob-carpenter commented 1 year ago

Is there any reason it's not calling one of the predefined versions? The finite diff version has the same functional pattern. It should be able to take the same model functor we create for other uses.

third-order autodiff should require that extra level of nesting?

Third-order autodiff in Stan requires fvar<fvar<var>> (N^2 reverse mode evals of second derivatives) or fvar<fvar<fvar<double>>> (N^3 forward-mode evals of second derivatives).

WardBrian commented 1 year ago

We must be talking past each other:

bob-carpenter commented 1 year ago

Yes, I misread "grad Hessian" as "Hessian". Maybe I'm misunderstanding what you mean by "upstream", but the grad_hessian functor with fvar<fvar<var>> should work for any Stan program for which the hessian functor with fvar<var> works.

WardBrian commented 1 year ago

The issue isn't that fewer models would work or anything, it's that at the moment we can only call overloads which exist in the model_base_crtp class, and log_prob does not yet have an overload for fvar<fvar<var>>. It would need to be added behind an #ifdef (this is what https://github.com/stan-dev/stan/pull/3144 did, just for fvar<var>) in stan-dev/stan

bob-carpenter commented 1 year ago

we can only call overloads which exist in the model_base_crtp

You'd think I'd remember this given that I coded it the first time around. The templated version is defined on the actual run-time class, but I keep forgetting that the class instance is assigned to a variable typed as the base class, which only knows the virtual functions (hence no templates). It'd be easy enough to add these the same way as last time. But you want to be careful because each layer is going to greatly extend the compile time, so we only want to turn these features on when they're going to be used.