Expose stan::math::hessian and stan::math::hessian_times_vector as methods in the stan::model class

rgiordan commented 5 years ago

Summary:

The stan::model class does not currently expose the stan::math::hessian and stan::math::hessian_times_vector of the log probability, though in principle it could easily do so in a manner similar to how the gradient is currently exposed.

Description:

See summary.

Reproducible Steps:

There are no steps to reproduce. I am filing this issue as the first step towards contributing a change as requested here.

Current Output:

N/A

Expected Output:

N/A

Additional Information:

I am working on this.

Current Version:

v2.18.0

bob-carpenter commented 5 years ago

Thanks for filing an issue, @rgiordan.

The gradient isn't currently exposed in the model class, so I'm unclear on what you are proposing. Could you break it down into something with a signature and location? For example, will it be a new method in the model class, a new utility function, or what?

Right now, we just create a functor out of the model class with bound output streams and loggers, then autodiff that using our functionals.

We're putting off most Hessian-related things until someone gets around to testing the matrix forward-mode autodiff.

rgiordan commented 5 years ago

I think probably I didn't express myself clearly enough. As I understand it, the gradient is exposed in stan::model::log_prob_grad. I propose making corresponding functions stan::model::log_prob_hessian_times_vector and stan::model::log_prob_hessian.

@avehtari requested that I finally get around to merging these changes so that users can compute improved ADVI covariances as in our recent paper. If Stan isn't ready to support Hessians in its public API, I should know sooner rather than later! However, even in that case, perhaps you would be willing to merge these functions into the C++ libraries and only expose them through rstan with a warning message or something?

bgoodri commented 5 years ago

Yes, the gradient method is exposed in RStan. At the moment, we are numerically differentiating the autodiffed gradient to get a Hessian at the optimum when necessary but I would agree it would be better to use autodiff throughout, even if it means multiple reverse mode sweeps.

On Thu, Nov 15, 2018 at 4:49 PM Ryan notifications@github.com wrote:

I think probably I didn't express myself clearly enough. As I understand it, the gradient is exposed in stan::model::log_prob_grad. I propose making corresponding functions stan::model::log_prob_hessian_times_vector and stan::model::log_prob_hessian.

@avehtari https://github.com/avehtari requested that I finally get around to merging these changes so that we can compute improved ADVI covariances as in our (recent paper)[ http://www.jmlr.org/papers/volume19/17-670/17-670.pdf]. If Stan isn't ready to support Hessians in its public API, I should know sooner rather than later! However, perhaps you would be willing to merge these functions into the C++ libraries and only expose them through rstan with a warning message or something?

— You are receiving this because you are subscribed to this thread. Reply to this email directly, view it on GitHub https://github.com/stan-dev/stan/issues/2692#issuecomment-439202566, or mute the thread https://github.com/notifications/unsubscribe-auth/ADOrqkat5j77qLXticNw9yFfqvht55NSks5uveFOgaJpZM4YiVT4 .

rgiordan commented 5 years ago

In light of https://github.com/stan-dev/rstan/issues/588, I wonder if I should be basing this change on the develop branch or on the master branch of stan. If it's based on develop, do I have to wait for a release to expose the functions in rstan?

bgoodri commented 5 years ago

Branches for features should be based off of develop but even then it would have to wait for Stan 2.19 to reach the masses. It is not too hard to build a rstan locally that uses the develop branch of Stan Math though.

On Thu, Nov 15, 2018 at 5:08 PM Ryan notifications@github.com wrote:

In light of stan-dev/rstan#588 https://github.com/stan-dev/rstan/issues/588, I wonder if I should be basing this change on the develop branch or on the master branch of stan. If it's based on develop, do I have to wait for a release to expose the functions in rstan?

— You are receiving this because you commented. Reply to this email directly, view it on GitHub https://github.com/stan-dev/stan/issues/2692#issuecomment-439208028, or mute the thread https://github.com/notifications/unsubscribe-auth/ADOrqkJ6Ol5Urs8-Az_Sbi6RIN3IaJU5ks5uveXIgaJpZM4YiVT4 .

rgiordan commented 5 years ago

Just to be clear, I'd need a branch of rstan that works with stan-dev/stan/develop, not with stan-dev/math/develop.

How would I do that?

bgoodri commented 5 years ago

I'll do it for you once stuff gets merged into Stan. It entails adjusting the headers that get included when making the parser in rstan.

On Thu, Nov 15, 2018 at 5:28 PM Ryan notifications@github.com wrote:

Just to be clear, I'd need a branch of rstan that works with stan-dev/stan/develop, not with stan-dev/math/develop.

How would I do that?

— You are receiving this because you commented. Reply to this email directly, view it on GitHub https://github.com/stan-dev/stan/issues/2692#issuecomment-439213467, or mute the thread https://github.com/notifications/unsubscribe-auth/ADOrqv666mZ4B9MjBouXVXwOoMcQoC4cks5uvep0gaJpZM4YiVT4 .

bob-carpenter commented 5 years ago

Correct. Stan's forward-mode autodiff has not been tested thoroughly. I don't even think it all compiles for matrix operations. And it definitely won't work for ODE solvers, algebraic solvers, etc.

That doesn't mean we can't release something experimental that uses it. The goal is to eventually get around to testing it, but you can imagine how hard it is to get someone to do this kind of task---everyone wants to build new functionality, not work on existing infrastructure.

Currently, the model concept (not a base class) is being reworked into a base class so that we can cut down on compile time. The plan is to follow C++ idioms and prefer standalone functions to methods in a class. Having said that, this assume the base class function is differentiable. For that, we're going to assume that it's defined for var types. To extend to Hessians, we'd have to do something like conditionally include fvar types.

We can't add new higher-order methods to our base class without dramatically increasing compile times, at least in any way that I know how to do.

That log_prob_grad thing is a backward-compatiblity artifact of my coming from a Java background. It'll almost certainly be going away in the Stan 3 base model class refator. We don't use it in the algorithms, but rather create a functor by binding in writers, loggers, etc., and then autodiff that using methods in lang/utility.

On Nov 15, 2018, at 4:49 PM, Ryan notifications@github.com wrote:

I think probably I didn't express myself clearly enough. As I understand it, the gradient is exposed in stan::model::log_prob_grad. I propose making corresponding functions stan::model::log_prob_hessian_times_vector and stan::model::log_prob_hessian.

@avehtari requested that I finally get around to merging these changes so that we can compute improved ADVI covariances as in our (recent paper)[http://www.jmlr.org/papers/volume19/17-670/17-670.pdf]. If Stan isn't ready to support Hessians in its public API, I should know sooner rather than later! However, perhaps you would be willing to merge these functions into the C++ libraries and only expose them through rstan with a warning message or something?

— You are receiving this because you commented. Reply to this email directly, view it on GitHub, or mute the thread.

rgiordan commented 5 years ago

Hopefully any ambiguity in this issue will be cleared by the PR:

https://github.com/stan-dev/stan/pull/2701

rgiordan commented 5 years ago

@bob-carpenter, I'd be interested in helping out testing forward-mode autodiff if it's something an outsider can handle. Where would one go to get started?

bbbales2 commented 5 years ago

@rgiordan I think this is the most recent post on this: https://discourse.mc-stan.org/t/higher-order-autodiff-testing-framework-rhmc/6543/14

The first thing is just agreeing what needs to be tested or not. Could be as simple as everything. And then the next step is getting a list of what things are and are not tested, and to what extent they need to be tested.

I think even if the tests were pretty minimal (just instantiating the function to be tested and checking a gradient against finite diff or reverse mode) would be sufficient. I think the biggest question is do all the functions instantiate with all combinations of valid arguments. @bob-carpenter @syclik Is there any code that is truly specialized for just forward mode right now? There are the distribution gradients, but those go through ops n' partials.

Then it'd be time to actually write the tests.

I'm all for doing a testing framework thing to make the tests super quick to write. Here's my attempt at one for reverse mode: https://github.com/stan-dev/math/pull/1021 . I'd like to get that in and in use. There are at least a couple other testing frameworks in the stan-math tests though. I think it'd be good to ferret them out if possible (just so there aren't a million of these things).

Anyway extending that the fwd/mixed would make it much easier to write tests for everything. Probably the hardest part of testing autodiff is making sure all combinations of valid arguments work, like all mixes of double vs. fvar types. Bob n' DLee might have more to say.

bbbales2 commented 5 years ago

Also, hello, and welcome aboard :D!

bob-carpenter commented 5 years ago

@bob-carpenter @syclik Is there any code that is truly specialized for just forward mode right now?

Yes, pretty much everything is overloaded (not specialized in the C++ sense) for forward mode, starting with the definition of operator+.

I think this is the most recent post on this: https://discourse.mc-stan.org/t/higher-order-autodiff-testing-framework-rhmc/6543/14

We prefer to have discussions on Discourse and only move to issues when there's a concrete plan.

I think it'd be good to ferret them [autoidiff testing frameworks] out if possible (just so there aren't a million of these things).

There are three:

The tests for unary vectorized functions which tests exhaustively for all autodiff up to fvar<fvar<var>>.
The core tests for operators and core unary functions, which tests exhaustively the same way as (1), but is cleaner.
The PR from @bbbales2, which sorts out the problems the above have with variadic/heterogeneous input/output types, but only tests reverse mode.

Also, hello, and welcome aboard :D!

@rgiordan has been involved for a while now!

rgiordan commented 5 years ago

We can't add new higher-order methods to our base class without dramatically increasing compile times, at least in any way that I know how to do.

@bob-carpenter, now that you can see exactly what I am proposing in https://github.com/stan-dev/stan/pull/2701, does your above comment about compile times mean this PR is a non-starter, even as an experimental feature without testing?

bob-carpenter commented 5 years ago

does your above comment about compile times mean this PR is a non-starter, even as an experimental feature without testing?

Not at all. We just need to make sure we have a way of turning off compilation of higher-order autodiff when it's not necessary. We'll have the same problem rolling out Riemannian HMC, which we hope to do soon-ish.

Also, after we rebuild the compiler in OCaml (should be done in a month or two), we're going to attack the compile time issue by precompiling all the algorithms. We'll also try to precompile a lot of the compute-intensive functions in the math lib (especially the matrix ops) that won't benefit much if at all from inlining.

WardBrian commented 1 year ago

Possible after https://github.com/stan-dev/stan/pull/3144

stan-dev / stan