stan-dev / stan

Stan development repository. The master branch contains the current release. The develop branch contains the latest stable development. See the Developer Process Wiki for details.
https://mc-stan.org
BSD 3-Clause "New" or "Revised" License
2.56k stars 365 forks source link

Functions With Improper Derivatives #731

Closed betanalpha closed 9 years ago

betanalpha commented 10 years ago

Right now the Stan autodiff implements functions with improper derivatives (i.e. the derivatives do not exist at certain points), which is formally improper. We have seen issues where people use functions like step, floor, and ceil in their models and the resulting sampling is very poor.

Personally I'd prefer to limit Stan to functions with proper derivatives only, but in any case we should have the discussion.

bob-carpenter commented 10 years ago

I understand.

But I'm not sure what you're proposing and whether it's for the autodiff library or just for the Stan modeling language.
I see two options that could go either way:

Option 1) Just remove the var versions of those functions.

Option 2) Throw exceptions or return NaN at points where they're not defined.

Both have the potential to break a lot of library code (Boost, Eigen). A quick look at Eigen shows they use floor and ceil ops on floating point values, then convert them to integers for loop bounds. So I guess we could also have:

Option 3) Change their signatures to return double values.

This would be super dangerous, I think.

On Jul 2, 2014, at 3:02 PM, Michael Betancourt notifications@github.com wrote:

Right now the Stan autodiff implements functions with improper derivatives (i.e. the derivatives do not exist at certain points), which is formally improper. We have seen issues where people use functions like step, floor, and ceil in their models and the resulting sampling is very poor.

Personally I'd prefer to limit Stan to functions with proper derivatives only, but in any case we should have the discussion.

— Reply to this email directly or view it on GitHub.

betanalpha commented 10 years ago

It's not breaking library code, it's isolating autodiff to where it is applicable. In your example the rounding to a loop bound is not continuous, and we should not propagate derivatives through it. I'd much rather the autodiff be super anal about where it can be applied and where it can't (blah blah blah folk theorem blah blah blah).

I also agree that having double returns is bad. For example, if we implemented

template double ceil(const var& v)

then users can do bad things like

var v1 = x; double y = ceil(v1); var v2 = y;

which yields an improper expression graph.

On Wed, Jul 2, 2014 at 2:28 PM, Bob Carpenter notifications@github.com wrote:

I understand.

But I'm not sure what you're proposing and whether it's for the autodiff library or just for the Stan modeling language. I see two options that could go either way:

Option 1) Just remove the var versions of those functions.

Option 2) Throw exceptions or return NaN at points where they're not defined.

Both have the potential to break a lot of library code (Boost, Eigen). A quick look at Eigen shows they use floor and ceil ops on floating point values, then convert them to integers for loop bounds. So I guess we could also have:

Option 3) Change their signatures to return double values.

This would be super dangerous, I think.

  • Bob

On Jul 2, 2014, at 3:02 PM, Michael Betancourt notifications@github.com wrote:

Right now the Stan autodiff implements functions with improper derivatives (i.e. the derivatives do not exist at certain points), which is formally improper. We have seen issues where people use functions like step, floor, and ceil in their models and the resulting sampling is very poor.

Personally I'd prefer to limit Stan to functions with proper derivatives only, but in any case we should have the discussion.

— Reply to this email directly or view it on GitHub.

— Reply to this email directly or view it on GitHub https://github.com/stan-dev/stan/issues/731#issuecomment-47775312.

betanalpha commented 10 years ago

In fact, for this reason alone we probably shouldn't expose as_value.

On Wed, Jul 2, 2014 at 2:38 PM, Michael Betancourt betanalpha@gmail.com wrote:

It's not breaking library code, it's isolating autodiff to where it is applicable. In your example the rounding to a loop bound is not continuous, and we should not propagate derivatives through it. I'd much rather the autodiff be super anal about where it can be applied and where it can't (blah blah blah folk theorem blah blah blah).

I also agree that having double returns is bad. For example, if we implemented

template double ceil(const var& v)

then users can do bad things like

var v1 = x; double y = ceil(v1); var v2 = y;

which yields an improper expression graph.

On Wed, Jul 2, 2014 at 2:28 PM, Bob Carpenter notifications@github.com wrote:

I understand.

But I'm not sure what you're proposing and whether it's for the autodiff library or just for the Stan modeling language. I see two options that could go either way:

Option 1) Just remove the var versions of those functions.

Option 2) Throw exceptions or return NaN at points where they're not defined.

Both have the potential to break a lot of library code (Boost, Eigen). A quick look at Eigen shows they use floor and ceil ops on floating point values, then convert them to integers for loop bounds. So I guess we could also have:

Option 3) Change their signatures to return double values.

This would be super dangerous, I think.

  • Bob

On Jul 2, 2014, at 3:02 PM, Michael Betancourt notifications@github.com wrote:

Right now the Stan autodiff implements functions with improper derivatives (i.e. the derivatives do not exist at certain points), which is formally improper. We have seen issues where people use functions like step, floor, and ceil in their models and the resulting sampling is very poor.

Personally I'd prefer to limit Stan to functions with proper derivatives only, but in any case we should have the discussion.

— Reply to this email directly or view it on GitHub.

— Reply to this email directly or view it on GitHub https://github.com/stan-dev/stan/issues/731#issuecomment-47775312.

bob-carpenter commented 10 years ago

On Jul 2, 2014, at 3:38 PM, Michael Betancourt notifications@github.com wrote:

It's not breaking library code,

If we remove the floor, round, abs, etc. from the autodiff library, it will break calls to Eigen and Boost with var argumnents. At least it will break them in the sense that they'll no longer compile.

For instance, Eigen uses ceil and floor operations to set bounds on loop iterations in some of its iterative algorithms, like eigendecomposition.

If we get rid of fabs(), we'll no longer be able to compute L1 distances or the double-exponential (aka Laplace) distribution.

And of course, it'll break auto-diff of any user-defined function that uses these functions, and hence make the library problematic for other people to drop in.

it's isolating autodiff to where it is applicable.

These functions are differentiable everywhere other than the discontinuities. But if we throw exceptions or return NaN rather than just returning an "interpolated" value, it won't solve any of our problems other than lack of propriety.

In your example the rounding to a loop bound is not continuous, and we should not propagate derivatives through it. I'd much rather the autodiff be super anal about where it can be applied and where it can't (blah blah blah folk theorem blah blah blah).

I also agree that having double returns is bad. For example, if we implemented

template double ceil(const var& v)

then users can do bad things like

var v1 = x; double y = ceil(v1); var v2 = y;

which yields an improper expression graph.

It's not "improper" in the sense that it will fail to compile or fail to run. You just don't get drivatives propagated from v2 back to x.

bob-carpenter commented 10 years ago

Expose to whom? We haven't put any effort into a clean user-facing interface with proper private/anonymous or anything.

On Jul 2, 2014, at 3:39 PM, Michael Betancourt notifications@github.com wrote:

In fact, for this reason alone we probably shouldn't expose as_value.

On Wed, Jul 2, 2014 at 2:38 PM, Michael Betancourt betanalpha@gmail.com wrote:

It's not breaking library code, it's isolating autodiff to where it is applicable. In your example the rounding to a loop bound is not continuous, and we should not propagate derivatives through it. I'd much rather the autodiff be super anal about where it can be applied and where it can't (blah blah blah folk theorem blah blah blah).

I also agree that having double returns is bad. For example, if we implemented

template double ceil(const var& v)

then users can do bad things like

var v1 = x; double y = ceil(v1); var v2 = y;

which yields an improper expression graph.

On Wed, Jul 2, 2014 at 2:28 PM, Bob Carpenter notifications@github.com wrote:

I understand.

But I'm not sure what you're proposing and whether it's for the autodiff library or just for the Stan modeling language. I see two options that could go either way:

Option 1) Just remove the var versions of those functions.

Option 2) Throw exceptions or return NaN at points where they're not defined.

Both have the potential to break a lot of library code (Boost, Eigen). A quick look at Eigen shows they use floor and ceil ops on floating point values, then convert them to integers for loop bounds. So I guess we could also have:

Option 3) Change their signatures to return double values.

This would be super dangerous, I think.

  • Bob

On Jul 2, 2014, at 3:02 PM, Michael Betancourt notifications@github.com wrote:

Right now the Stan autodiff implements functions with improper derivatives (i.e. the derivatives do not exist at certain points), which is formally improper. We have seen issues where people use functions like step, floor, and ceil in their models and the resulting sampling is very poor.

Personally I'd prefer to limit Stan to functions with proper derivatives only, but in any case we should have the discussion.

— Reply to this email directly or view it on GitHub.

— Reply to this email directly or view it on GitHub https://github.com/stan-dev/stan/issues/731#issuecomment-47775312.

— Reply to this email directly or view it on GitHub.

betanalpha commented 10 years ago

For instance, Eigen uses ceil and floor operations to set bounds on loop iterations in some of its iterative algorithms, like eigendecomposition.

Yup, and I consider this a feature. Differentiating through iterative algorithms is a bad idea (™), and we've already seen how it leads to poor performance if not outright failures.

The derivative of a function != the derivative of its numerical approximation. I'd much, much rather us implement proper derivatives for things like solve, invert, and decompose as it will be faster and much more stable.

If we get rid of fabs(), we'll no longer be able to compute L1 distances or the double-exponential (aka Laplace) distribution.

Which is known to cause issues in Bayesian models. The right answer here is to, for example, implement a smoothed L1 regularization function.

And of course, it'll break auto-diff of any user-defined function that uses these functions, and hence make the library problematic for other people to drop in.

Problematic for them to drop in and get a bad answer? I'd say that an algorithm that returns an bad answer without even a warning is far worse than an algorithm that properly restricts inputs to valid functions.

These functions are differentiable everywhere other than the discontinuities. But if we throw exceptions or return NaN rather than just returning an "interpolated" value, it won't solve any of our problems other than lack of propriety.

It's even worse than that, because the problems don't arise when evaluating functions at discontinuities but rather when passing by discontinuities. The specifics depend on the algorithm, but the point is that just returning the smooth answer and throwing out the occasional NaN will not give a valid implementation.

It's not "improper" in the sense that it will fail to compile or fail to run. You just don't get drivatives propagated from v2 back to x.

Which I would argue is not a proper representation of the expression graph one would naively expect.

Expose to whom? We haven't put any effort into a clean user-facing interface with proper private/anonymous or anything.

Sorry, I'm thinking of the new autodiff where we can be a little more careful about these things from the beginning.

andrewgelman commented 10 years ago

It’s also tricky because we’ve been talking about putting Metrop and slice sampling into Stan at some point, so we don’t want to go restricting the language where we don’t have to.

On Jul 2, 2014, at 4:10 PM, Michael Betancourt notifications@github.com wrote:

For instance, Eigen uses ceil and floor operations to set bounds on loop iterations in some of its iterative algorithms, like eigendecomposition.

Yup, and I consider this a feature. Differentiating through iterative algorithms is a bad idea (™), and we've already seen how it leads to poor performance if not outright failures.

The derivative of a function != the derivative of its numerical approximation. I'd much, much rather us implement proper derivatives for things like solve, invert, and decompose as it will be faster and much more stable.

If we get rid of fabs(), we'll no longer be able to compute L1 distances or the double-exponential (aka Laplace) distribution.

Which is known to cause issues in Bayesian models. The right answer here is to, for example, implement a smoothed L1 regularization function.

And of course, it'll break auto-diff of any user-defined function that uses these functions, and hence make the library problematic for other people to drop in.

Problematic for them to drop in and get a bad answer? I'd say that an algorithm that returns an bad answer without even a warning is far worse than an algorithm that properly restricts inputs to valid functions.

These functions are differentiable everywhere other than the discontinuities. But if we throw exceptions or return NaN rather than just returning an "interpolated" value, it won't solve any of our problems other than lack of propriety.

It's even worse than that, because the problems don't arise when evaluating functions at discontinuities but rather when passing by discontinuities. The specifics depend on the algorithm, but the point is that just returning the smooth answer and throwing out the occasional NaN will not give a valid implementation.

It's not "improper" in the sense that it will fail to compile or fail to run. You just don't get drivatives propagated from v2 back to x.

Which I would argue is not a proper representation of the expression graph one would naively expect.

Expose to whom? We haven't put any effort into a clean user-facing interface with proper private/anonymous or anything.

Sorry, I'm thinking of the new autodiff where we can be a little more careful about these things from the beginning. — Reply to this email directly or view it on GitHub.

betanalpha commented 10 years ago

It’s easy enough to wrap all the autodiff functions in shells that just throw exceptions so the log prob functions work fine with doubles, we just have to make a decision on whether to be this strict or not.

Of course there will be the problem that not every model with work with every algorithm, and people will complain and be confused by this.

On Jul 2, 2014, at 4:55 PM, Andrew Gelman notifications@github.com wrote:

It’s also tricky because we’ve been talking about putting Metrop and slice sampling into Stan at some point, so we don’t want to go restricting the language where we don’t have to.

On Jul 2, 2014, at 4:10 PM, Michael Betancourt notifications@github.com wrote:

For instance, Eigen uses ceil and floor operations to set bounds on loop iterations in some of its iterative algorithms, like eigendecomposition.

Yup, and I consider this a feature. Differentiating through iterative algorithms is a bad idea (™), and we've already seen how it leads to poor performance if not outright failures.

The derivative of a function != the derivative of its numerical approximation. I'd much, much rather us implement proper derivatives for things like solve, invert, and decompose as it will be faster and much more stable.

If we get rid of fabs(), we'll no longer be able to compute L1 distances or the double-exponential (aka Laplace) distribution.

Which is known to cause issues in Bayesian models. The right answer here is to, for example, implement a smoothed L1 regularization function.

And of course, it'll break auto-diff of any user-defined function that uses these functions, and hence make the library problematic for other people to drop in.

Problematic for them to drop in and get a bad answer? I'd say that an algorithm that returns an bad answer without even a warning is far worse than an algorithm that properly restricts inputs to valid functions.

These functions are differentiable everywhere other than the discontinuities. But if we throw exceptions or return NaN rather than just returning an "interpolated" value, it won't solve any of our problems other than lack of propriety.

It's even worse than that, because the problems don't arise when evaluating functions at discontinuities but rather when passing by discontinuities. The specifics depend on the algorithm, but the point is that just returning the smooth answer and throwing out the occasional NaN will not give a valid implementation.

It's not "improper" in the sense that it will fail to compile or fail to run. You just don't get drivatives propagated from v2 back to x.

Which I would argue is not a proper representation of the expression graph one would naively expect.

Expose to whom? We haven't put any effort into a clean user-facing interface with proper private/anonymous or anything.

Sorry, I'm thinking of the new autodiff where we can be a little more careful about these things from the beginning. — Reply to this email directly or view it on GitHub.

— Reply to this email directly or view it on GitHub.

bob-carpenter commented 10 years ago

We already have this problem with just HMC by itself. Not every model fits with it. Then there's optimization, which can "fit" a more restricted set of models, but is more scalable.

RHMC and MML will just make the problem worse with their O(N^3) scalability issues. Then VB and EP will also raise all sorts of problems in terms of which models can be fit and which can't.

Statisticians are used to a grab bag of model-specific techniques. As are computer scientists.

At least until someone invents the one true solver, we'll just have to deal with this issue.

On Jul 2, 2014, at 1:49 PM, Michael Betancourt notifications@github.com wrote:

It’s easy enough to wrap all the autodiff functions in shells that just throw exceptions so the log prob functions work fine with doubles, we just have to make a decision on whether to be this strict or not.

Of course there will be the problem that not every model with work with every algorithm, and people will complain and be confused by this.

On Jul 2, 2014, at 4:55 PM, Andrew Gelman notifications@github.com wrote:

It’s also tricky because we’ve been talking about putting Metrop and slice sampling into Stan at some point, so we don’t want to go restricting the language where we don’t have to.

On Jul 2, 2014, at 4:10 PM, Michael Betancourt notifications@github.com wrote:

For instance, Eigen uses ceil and floor operations to set bounds on loop iterations in some of its iterative algorithms, like eigendecomposition.

Yup, and I consider this a feature. Differentiating through iterative algorithms is a bad idea (™), and we've already seen how it leads to poor performance if not outright failures.

The derivative of a function != the derivative of its numerical approximation. I'd much, much rather us implement proper derivatives for things like solve, invert, and decompose as it will be faster and much more stable.

If we get rid of fabs(), we'll no longer be able to compute L1 distances or the double-exponential (aka Laplace) distribution.

Which is known to cause issues in Bayesian models. The right answer here is to, for example, implement a smoothed L1 regularization function.

And of course, it'll break auto-diff of any user-defined function that uses these functions, and hence make the library problematic for other people to drop in.

Problematic for them to drop in and get a bad answer? I'd say that an algorithm that returns an bad answer without even a warning is far worse than an algorithm that properly restricts inputs to valid functions.

These functions are differentiable everywhere other than the discontinuities. But if we throw exceptions or return NaN rather than just returning an "interpolated" value, it won't solve any of our problems other than lack of propriety.

It's even worse than that, because the problems don't arise when evaluating functions at discontinuities but rather when passing by discontinuities. The specifics depend on the algorithm, but the point is that just returning the smooth answer and throwing out the occasional NaN will not give a valid implementation.

It's not "improper" in the sense that it will fail to compile or fail to run. You just don't get drivatives propagated from v2 back to x.

Which I would argue is not a proper representation of the expression graph one would naively expect.

Expose to whom? We haven't put any effort into a clean user-facing interface with proper private/anonymous or anything.

Sorry, I'm thinking of the new autodiff where we can be a little more careful about these things from the beginning. — Reply to this email directly or view it on GitHub.

— Reply to this email directly or view it on GitHub.

— Reply to this email directly or view it on GitHub.

betanalpha commented 10 years ago

I’m not even trying to go that deep — I am just proposing that we ensure our autodiff library doesn’t claim to take derivatives that it can’t. This is relevant to any algorithm that uses derivatives which all implicitly assume that the objective function is smooth.

On Jul 2, 2014, at 7:51 PM, Bob Carpenter notifications@github.com wrote:

We already have this problem with just HMC by itself. Not every model fits with it. Then there's optimization, which can "fit" a more restricted set of models, but is more scalable.

RHMC and MML will just make the problem worse with their O(N^3) scalability issues. Then VB and EP will also raise all sorts of problems in terms of which models can be fit and which can't.

Statisticians are used to a grab bag of model-specific techniques. As are computer scientists.

At least until someone invents the one true solver, we'll just have to deal with this issue.

  • Bob

On Jul 2, 2014, at 1:49 PM, Michael Betancourt notifications@github.com wrote:

It’s easy enough to wrap all the autodiff functions in shells that just throw exceptions so the log prob functions work fine with doubles, we just have to make a decision on whether to be this strict or not.

Of course there will be the problem that not every model with work with every algorithm, and people will complain and be confused by this.

On Jul 2, 2014, at 4:55 PM, Andrew Gelman notifications@github.com wrote:

It’s also tricky because we’ve been talking about putting Metrop and slice sampling into Stan at some point, so we don’t want to go restricting the language where we don’t have to.

On Jul 2, 2014, at 4:10 PM, Michael Betancourt notifications@github.com wrote:

For instance, Eigen uses ceil and floor operations to set bounds on loop iterations in some of its iterative algorithms, like eigendecomposition.

Yup, and I consider this a feature. Differentiating through iterative algorithms is a bad idea (™), and we've already seen how it leads to poor performance if not outright failures.

The derivative of a function != the derivative of its numerical approximation. I'd much, much rather us implement proper derivatives for things like solve, invert, and decompose as it will be faster and much more stable.

If we get rid of fabs(), we'll no longer be able to compute L1 distances or the double-exponential (aka Laplace) distribution.

Which is known to cause issues in Bayesian models. The right answer here is to, for example, implement a smoothed L1 regularization function.

And of course, it'll break auto-diff of any user-defined function that uses these functions, and hence make the library problematic for other people to drop in.

Problematic for them to drop in and get a bad answer? I'd say that an algorithm that returns an bad answer without even a warning is far worse than an algorithm that properly restricts inputs to valid functions.

These functions are differentiable everywhere other than the discontinuities. But if we throw exceptions or return NaN rather than just returning an "interpolated" value, it won't solve any of our problems other than lack of propriety.

It's even worse than that, because the problems don't arise when evaluating functions at discontinuities but rather when passing by discontinuities. The specifics depend on the algorithm, but the point is that just returning the smooth answer and throwing out the occasional NaN will not give a valid implementation.

It's not "improper" in the sense that it will fail to compile or fail to run. You just don't get drivatives propagated from v2 back to x.

Which I would argue is not a proper representation of the expression graph one would naively expect.

Expose to whom? We haven't put any effort into a clean user-facing interface with proper private/anonymous or anything.

Sorry, I'm thinking of the new autodiff where we can be a little more careful about these things from the beginning. — Reply to this email directly or view it on GitHub.

— Reply to this email directly or view it on GitHub.

— Reply to this email directly or view it on GitHub.

— Reply to this email directly or view it on GitHub.

andrewgelman commented 10 years ago

My general preference is to have the algorithm work with arbitrary functions and return an error when the derivative does not exist, rather than to restrict the class of functions we can use.

On Jul 2, 2014, at 9:48 PM, Michael Betancourt notifications@github.com wrote:

I’m not even trying to go that deep — I am just proposing that we ensure our autodiff library doesn’t claim to take derivatives that it can’t. This is relevant to any algorithm that uses derivatives which all implicitly assume that the objective function is smooth.

On Jul 2, 2014, at 7:51 PM, Bob Carpenter notifications@github.com wrote:

We already have this problem with just HMC by itself. Not every model fits with it. Then there's optimization, which can "fit" a more restricted set of models, but is more scalable.

RHMC and MML will just make the problem worse with their O(N^3) scalability issues. Then VB and EP will also raise all sorts of problems in terms of which models can be fit and which can't.

Statisticians are used to a grab bag of model-specific techniques. As are computer scientists.

At least until someone invents the one true solver, we'll just have to deal with this issue.

  • Bob

On Jul 2, 2014, at 1:49 PM, Michael Betancourt notifications@github.com wrote:

It’s easy enough to wrap all the autodiff functions in shells that just throw exceptions so the log prob functions work fine with doubles, we just have to make a decision on whether to be this strict or not.

Of course there will be the problem that not every model with work with every algorithm, and people will complain and be confused by this.

On Jul 2, 2014, at 4:55 PM, Andrew Gelman notifications@github.com wrote:

It’s also tricky because we’ve been talking about putting Metrop and slice sampling into Stan at some point, so we don’t want to go restricting the language where we don’t have to.

On Jul 2, 2014, at 4:10 PM, Michael Betancourt notifications@github.com wrote:

For instance, Eigen uses ceil and floor operations to set bounds on loop iterations in some of its iterative algorithms, like eigendecomposition.

Yup, and I consider this a feature. Differentiating through iterative algorithms is a bad idea (™), and we've already seen how it leads to poor performance if not outright failures.

The derivative of a function != the derivative of its numerical approximation. I'd much, much rather us implement proper derivatives for things like solve, invert, and decompose as it will be faster and much more stable.

If we get rid of fabs(), we'll no longer be able to compute L1 distances or the double-exponential (aka Laplace) distribution.

Which is known to cause issues in Bayesian models. The right answer here is to, for example, implement a smoothed L1 regularization function.

And of course, it'll break auto-diff of any user-defined function that uses these functions, and hence make the library problematic for other people to drop in.

Problematic for them to drop in and get a bad answer? I'd say that an algorithm that returns an bad answer without even a warning is far worse than an algorithm that properly restricts inputs to valid functions.

These functions are differentiable everywhere other than the discontinuities. But if we throw exceptions or return NaN rather than just returning an "interpolated" value, it won't solve any of our problems other than lack of propriety.

It's even worse than that, because the problems don't arise when evaluating functions at discontinuities but rather when passing by discontinuities. The specifics depend on the algorithm, but the point is that just returning the smooth answer and throwing out the occasional NaN will not give a valid implementation.

It's not "improper" in the sense that it will fail to compile or fail to run. You just don't get drivatives propagated from v2 back to x.

Which I would argue is not a proper representation of the expression graph one would naively expect.

Expose to whom? We haven't put any effort into a clean user-facing interface with proper private/anonymous or anything.

Sorry, I'm thinking of the new autodiff where we can be a little more careful about these things from the beginning. — Reply to this email directly or view it on GitHub.

— Reply to this email directly or view it on GitHub.

— Reply to this email directly or view it on GitHub.

— Reply to this email directly or view it on GitHub.

— Reply to this email directly or view it on GitHub.

bob-carpenter commented 10 years ago

From the user perspective, they write down a model and for some reason it doesn't fit.

Case 1: Model is too hard for algorithm X.

Case 2: Algorithm X needs a function not supported by the model.

We already have Case 1 and it's problematic to diagnose. Case 2 is at least easy to diagnose and provide an error message for --- the function gets called and fails. We list which functions are unavailable for which algorithms and point the user to that.

On Jul 2, 2014, at 3:48 PM, Michael Betancourt notifications@github.com wrote:

I’m not even trying to go that deep — I am just proposing that we ensure our autodiff library doesn’t claim to take derivatives that it can’t. This is relevant to any algorithm that uses derivatives which all implicitly assume that the objective function is smooth.

On Jul 2, 2014, at 7:51 PM, Bob Carpenter notifications@github.com wrote:

We already have this problem with just HMC by itself. Not every model fits with it. Then there's optimization, which can "fit" a more restricted set of models, but is more scalable.

RHMC and MML will just make the problem worse with their O(N^3) scalability issues. Then VB and EP will also raise all sorts of problems in terms of which models can be fit and which can't.

Statisticians are used to a grab bag of model-specific techniques. As are computer scientists.

At least until someone invents the one true solver, we'll just have to deal with this issue.

  • Bob

On Jul 2, 2014, at 1:49 PM, Michael Betancourt notifications@github.com wrote:

It’s easy enough to wrap all the autodiff functions in shells that just throw exceptions so the log prob functions work fine with doubles, we just have to make a decision on whether to be this strict or not.

Of course there will be the problem that not every model with work with every algorithm, and people will complain and be confused by this.

On Jul 2, 2014, at 4:55 PM, Andrew Gelman notifications@github.com wrote:

It’s also tricky because we’ve been talking about putting Metrop and slice sampling into Stan at some point, so we don’t want to go restricting the language where we don’t have to.

On Jul 2, 2014, at 4:10 PM, Michael Betancourt notifications@github.com wrote:

For instance, Eigen uses ceil and floor operations to set bounds on loop iterations in some of its iterative algorithms, like eigendecomposition.

Yup, and I consider this a feature. Differentiating through iterative algorithms is a bad idea (™), and we've already seen how it leads to poor performance if not outright failures.

The derivative of a function != the derivative of its numerical approximation. I'd much, much rather us implement proper derivatives for things like solve, invert, and decompose as it will be faster and much more stable.

If we get rid of fabs(), we'll no longer be able to compute L1 distances or the double-exponential (aka Laplace) distribution.

Which is known to cause issues in Bayesian models. The right answer here is to, for example, implement a smoothed L1 regularization function.

And of course, it'll break auto-diff of any user-defined function that uses these functions, and hence make the library problematic for other people to drop in.

Problematic for them to drop in and get a bad answer? I'd say that an algorithm that returns an bad answer without even a warning is far worse than an algorithm that properly restricts inputs to valid functions.

These functions are differentiable everywhere other than the discontinuities. But if we throw exceptions or return NaN rather than just returning an "interpolated" value, it won't solve any of our problems other than lack of propriety.

It's even worse than that, because the problems don't arise when evaluating functions at discontinuities but rather when passing by discontinuities. The specifics depend on the algorithm, but the point is that just returning the smooth answer and throwing out the occasional NaN will not give a valid implementation.

It's not "improper" in the sense that it will fail to compile or fail to run. You just don't get drivatives propagated from v2 back to x.

Which I would argue is not a proper representation of the expression graph one would naively expect.

Expose to whom? We haven't put any effort into a clean user-facing interface with proper private/anonymous or anything.

Sorry, I'm thinking of the new autodiff where we can be a little more careful about these things from the beginning. — Reply to this email directly or view it on GitHub.

— Reply to this email directly or view it on GitHub.

— Reply to this email directly or view it on GitHub.

— Reply to this email directly or view it on GitHub.

— Reply to this email directly or view it on GitHub.

betanalpha commented 10 years ago

From the user perspective, they write down a model and for some reason it doesn't fit.

Case 1: Model is too hard for algorithm X.

Case 2: Algorithm X needs a function not supported by the model.

We already have Case 1 and it's problematic to diagnose. Case 2 is at least easy to diagnose and provide an error message for --- the function gets called and fails. We list which functions are unavailable for which algorithms and point the user to that.

Right, this is basically for what I am advocating. We implement all the functions for var but discontinuous functions throw an exception when you try to use them, so they compile (and can be used for simple stuff) but fail when you try to use them for HMC/gradient based optimizations.

Incidentally, this would be the same behavior if we have a function with only first-order derivatives but what to run a higher-order algorithm. The new stuff has exceptions built in to throw so that the user is aware (a) which function failed and (b) why.

bob-carpenter commented 10 years ago

OK --- then it's just a discussion about whether we want to eliminate functions that are not everywhere differentiable from HMC.

The other relevant case is conditionals involving parameters. This lets you sneak in non-differentiable functions. For example, with abs:

real abs_a; if (a >= 0) abs_a <- a; else abs_a <- -a;

On Jul 2, 2014, at 3:57 PM, Michael Betancourt notifications@github.com wrote:

From the user perspective, they write down a model and for some reason it doesn't fit.

Case 1: Model is too hard for algorithm X.

Case 2: Algorithm X needs a function not supported by the model.

We already have Case 1 and it's problematic to diagnose. Case 2 is at least easy to diagnose and provide an error message for --- the function gets called and fails. We list which functions are unavailable for which algorithms and point the user to that.

Right, this is basically for what I am advocating. We implement all the functions for var but discontinuous functions throw an exception when you try to use them, so they compile (and can be used for simple stuff) but fail when you try to use them for HMC/gradient based optimizations.

Incidentally, this would be the same behavior if we have a function with only first-order derivatives but what to run a higher-order algorithm. The new stuff has exceptions built in to throw so that the user is aware (a) which function failed and (b) why. — Reply to this email directly or view it on GitHub.

betanalpha commented 10 years ago

Yeah, they make me uncomfortable, too, although that’s much harder to check. I don’t think we can enforce it from within the autodiff library alone.

On Jul 2, 2014, at 9:06 PM, Bob Carpenter notifications@github.com wrote:

OK --- then it's just a discussion about whether we want to eliminate functions that are not everywhere differentiable from HMC.

The other relevant case is conditionals involving parameters. This lets you sneak in non-differentiable functions. For example, with abs:

real abs_a; if (a >= 0) abs_a <- a; else abs_a <- -a;

  • Bob

On Jul 2, 2014, at 3:57 PM, Michael Betancourt notifications@github.com wrote:

From the user perspective, they write down a model and for some reason it doesn't fit.

Case 1: Model is too hard for algorithm X.

Case 2: Algorithm X needs a function not supported by the model.

We already have Case 1 and it's problematic to diagnose. Case 2 is at least easy to diagnose and provide an error message for --- the function gets called and fails. We list which functions are unavailable for which algorithms and point the user to that.

Right, this is basically for what I am advocating. We implement all the functions for var but discontinuous functions throw an exception when you try to use them, so they compile (and can be used for simple stuff) but fail when you try to use them for HMC/gradient based optimizations.

Incidentally, this would be the same behavior if we have a function with only first-order derivatives but what to run a higher-order algorithm. The new stuff has exceptions built in to throw so that the user is aware (a) which function failed and (b) why. — Reply to this email directly or view it on GitHub.

— Reply to this email directly or view it on GitHub.

bob-carpenter commented 10 years ago

I'm now back in my usual bind. I submitted a pull request with a patch to functions. I think it's OK, but our testing's backed up enough that it'll be days before it can get reviewed, tested, and merged.

In the meantime, there's a new bug I want to fix.

I need to work from the previous fixes, as there's a huge overlap in code and I don't want a merge trainwreck on my hands.

So, I ask again because I forgot the answer from last time, what should I do?

  1. just cool my heels and wait, or
  2. belay the last pull request, roll the issue into the old issue, and fix everything at once, or.
  3. branch from the bugfix branch for the last issue.

And if the answer is (3), how do I create the pull request for the fix? Is the target develop or the previous bugfix branch?

On Jul 2, 2014, at 12:36 PM, Jeffrey Arnold notifications@github.com wrote:

Both the BNF and the parser allow void for argument types, which I don't think makes sense. The parser doesn't throw an error, but an error occurs when the C++ code is compiled.

Example using stan 2.3.0:

functions { void foo(void a) { print("hello, world!"); } } model { foo(); }

produces the following error,

$ make foo

--- Translating Stan model to C++ code --- bin/stanc foo.stan --o=foo.cpp Model name=foo_model Input file=foo.stan Output file=foo.cpp clang++ -I src -isystem lib/eigen_3.2.0 -isystem lib/boost_1.54.0 -Wall -DBOOST_RESULT_OF_USE_TR1 -DBOOST_NO_DECLTYPE -DBOOST_DISABLE_ASSERTS -pipe -DEIGEN_NO_DEBUG -Wno-unused-function -Wno-uninitialized -Wno-c++11-long-long -c -O3 -o foo.o foo.cpp foo.cpp:40:15: error: cannot form a reference to 'void' foo(const void& a, std::ostream* pstream__) { ^ 1 error generated. make: *\ [foo.o] Error 1

— Reply to this email directly or view it on GitHub.

bob-carpenter commented 10 years ago

It's easy to enforce from the language side. We know where variables are defined and there's a function to check if an expression involves a parameter (transformed parameter, or local variable in the transformed parameter or model blocks).

The one sticking point before was interpolation, but I think the answer there may be to write a special BUGS-like interpolation function.

On Jul 2, 2014, at 4:15 PM, Michael Betancourt notifications@github.com wrote:

Yeah, they make me uncomfortable, too, although that’s much harder to check. I don’t think we can enforce it from within the autodiff library alone.

On Jul 2, 2014, at 9:06 PM, Bob Carpenter notifications@github.com wrote:

OK --- then it's just a discussion about whether we want to eliminate functions that are not everywhere differentiable from HMC.

The other relevant case is conditionals involving parameters. This lets you sneak in non-differentiable functions. For example, with abs:

real abs_a; if (a >= 0) abs_a <- a; else abs_a <- -a;

  • Bob

On Jul 2, 2014, at 3:57 PM, Michael Betancourt notifications@github.com wrote:

From the user perspective, they write down a model and for some reason it doesn't fit.

Case 1: Model is too hard for algorithm X.

Case 2: Algorithm X needs a function not supported by the model.

We already have Case 1 and it's problematic to diagnose. Case 2 is at least easy to diagnose and provide an error message for --- the function gets called and fails. We list which functions are unavailable for which algorithms and point the user to that.

Right, this is basically for what I am advocating. We implement all the functions for var but discontinuous functions throw an exception when you try to use them, so they compile (and can be used for simple stuff) but fail when you try to use them for HMC/gradient based optimizations.

Incidentally, this would be the same behavior if we have a function with only first-order derivatives but what to run a higher-order algorithm. The new stuff has exceptions built in to throw so that the user is aware (a) which function failed and (b) why. — Reply to this email directly or view it on GitHub.

— Reply to this email directly or view it on GitHub.

— Reply to this email directly or view it on GitHub.

betanalpha commented 10 years ago

This is exactly for what I’ve been advocating for some time — custom spline, etc functions that exactly enforce the desired smoothness properties.

On Jul 2, 2014, at 9:18 PM, Bob Carpenter notifications@github.com wrote:

The one sticking point before was interpolation, but I think the answer there may be to write a special BUGS-like interpolation function.

syclik commented 9 years ago

This issue was moved to stan-dev/math#33