Functions With Improper Derivatives

stan-dev / math

The Stan Math Library is a C++ template library for automatic differentiation of any order using forward, reverse, and mixed modes. It includes a range of built-in functions for probabilistic modeling, linear algebra, and equation solving.

https://mc-stan.org

BSD 3-Clause "New" or "Revised" License

723 stars 183 forks source link

Functions With Improper Derivatives #33

Closed syclik closed 7 years ago

syclik commented 9 years ago

From @betanalpha on July 2, 2014 13:2

Right now the Stan autodiff implements functions with improper derivatives (i.e. the derivatives do not exist at certain points), which is formally improper. We have seen issues where people use functions like step, floor, and ceil in their models and the resulting sampling is very poor.

Personally I'd prefer to limit Stan to functions with proper derivatives only, but in any case we should have the discussion.

Copied from original issue: stan-dev/stan#731

syclik commented 9 years ago

From @bob-carpenter on July 2, 2014 13:28

I understand.

But I'm not sure what you're proposing and whether it's for the autodiff library or just for the Stan modeling language.
I see two options that could go either way:

Option 1) Just remove the var versions of those functions.

Option 2) Throw exceptions or return NaN at points where they're not defined.

Both have the potential to break a lot of library code (Boost, Eigen). A quick look at Eigen shows they use floor and ceil ops on floating point values, then convert them to integers for loop bounds. So I guess we could also have:

Option 3) Change their signatures to return double values.

This would be super dangerous, I think.

On Jul 2, 2014, at 3:02 PM, Michael Betancourt notifications@github.com wrote:

Right now the Stan autodiff implements functions with improper derivatives (i.e. the derivatives do not exist at certain points), which is formally improper. We have seen issues where people use functions like step, floor, and ceil in their models and the resulting sampling is very poor.

Personally I'd prefer to limit Stan to functions with proper derivatives only, but in any case we should have the discussion.

— Reply to this email directly or view it on GitHub.

syclik commented 9 years ago

From @betanalpha on July 2, 2014 13:38

It's not breaking library code, it's isolating autodiff to where it is applicable. In your example the rounding to a loop bound is not continuous, and we should not propagate derivatives through it. I'd much rather the autodiff be super anal about where it can be applied and where it can't (blah blah blah folk theorem blah blah blah).

I also agree that having double returns is bad. For example, if we implemented

template double ceil(const var& v)

then users can do bad things like

var v1 = x; double y = ceil(v1); var v2 = y;

which yields an improper expression graph.

On Wed, Jul 2, 2014 at 2:28 PM, Bob Carpenter notifications@github.com wrote:

I understand.

But I'm not sure what you're proposing and whether it's for the autodiff library or just for the Stan modeling language. I see two options that could go either way:

Option 1) Just remove the var versions of those functions.

Option 2) Throw exceptions or return NaN at points where they're not defined.

Both have the potential to break a lot of library code (Boost, Eigen). A quick look at Eigen shows they use floor and ceil ops on floating point values, then convert them to integers for loop bounds. So I guess we could also have:

Option 3) Change their signatures to return double values.

This would be super dangerous, I think.

Bob

On Jul 2, 2014, at 3:02 PM, Michael Betancourt notifications@github.com wrote:

Right now the Stan autodiff implements functions with improper derivatives (i.e. the derivatives do not exist at certain points), which is formally improper. We have seen issues where people use functions like step, floor, and ceil in their models and the resulting sampling is very poor.

Personally I'd prefer to limit Stan to functions with proper derivatives only, but in any case we should have the discussion.

— Reply to this email directly or view it on GitHub.

— Reply to this email directly or view it on GitHub https://github.com/stan-dev/stan/issues/731#issuecomment-47775312.

syclik commented 9 years ago

From @betanalpha on July 2, 2014 13:39

In fact, for this reason alone we probably shouldn't expose as_value.

On Wed, Jul 2, 2014 at 2:38 PM, Michael Betancourt betanalpha@gmail.com wrote:

It's not breaking library code, it's isolating autodiff to where it is applicable. In your example the rounding to a loop bound is not continuous, and we should not propagate derivatives through it. I'd much rather the autodiff be super anal about where it can be applied and where it can't (blah blah blah folk theorem blah blah blah).

I also agree that having double returns is bad. For example, if we implemented

template double ceil(const var& v)

then users can do bad things like

var v1 = x; double y = ceil(v1); var v2 = y;

which yields an improper expression graph.

On Wed, Jul 2, 2014 at 2:28 PM, Bob Carpenter notifications@github.com wrote:

I understand.

But I'm not sure what you're proposing and whether it's for the autodiff library or just for the Stan modeling language. I see two options that could go either way:

Option 1) Just remove the var versions of those functions.

Option 2) Throw exceptions or return NaN at points where they're not defined.

Both have the potential to break a lot of library code (Boost, Eigen). A quick look at Eigen shows they use floor and ceil ops on floating point values, then convert them to integers for loop bounds. So I guess we could also have:

Option 3) Change their signatures to return double values.

This would be super dangerous, I think.

Bob

On Jul 2, 2014, at 3:02 PM, Michael Betancourt notifications@github.com wrote:

Right now the Stan autodiff implements functions with improper derivatives (i.e. the derivatives do not exist at certain points), which is formally improper. We have seen issues where people use functions like step, floor, and ceil in their models and the resulting sampling is very poor.

Personally I'd prefer to limit Stan to functions with proper derivatives only, but in any case we should have the discussion.

— Reply to this email directly or view it on GitHub.

— Reply to this email directly or view it on GitHub https://github.com/stan-dev/stan/issues/731#issuecomment-47775312.

syclik commented 9 years ago

From @bob-carpenter on July 2, 2014 13:53

On Jul 2, 2014, at 3:38 PM, Michael Betancourt notifications@github.com wrote:

It's not breaking library code,

If we remove the floor, round, abs, etc. from the autodiff library, it will break calls to Eigen and Boost with var argumnents. At least it will break them in the sense that they'll no longer compile.

For instance, Eigen uses ceil and floor operations to set bounds on loop iterations in some of its iterative algorithms, like eigendecomposition.

If we get rid of fabs(), we'll no longer be able to compute L1 distances or the double-exponential (aka Laplace) distribution.

And of course, it'll break auto-diff of any user-defined function that uses these functions, and hence make the library problematic for other people to drop in.

it's isolating autodiff to where it is applicable.

These functions are differentiable everywhere other than the discontinuities. But if we throw exceptions or return NaN rather than just returning an "interpolated" value, it won't solve any of our problems other than lack of propriety.

In your example the rounding to a loop bound is not continuous, and we should not propagate derivatives through it. I'd much rather the autodiff be super anal about where it can be applied and where it can't (blah blah blah folk theorem blah blah blah).

I also agree that having double returns is bad. For example, if we implemented

template double ceil(const var& v)

then users can do bad things like

var v1 = x; double y = ceil(v1); var v2 = y;

which yields an improper expression graph.

It's not "improper" in the sense that it will fail to compile or fail to run. You just don't get drivatives propagated from v2 back to x.

syclik commented 9 years ago

From @bob-carpenter on July 2, 2014 14:1

Expose to whom? We haven't put any effort into a clean user-facing interface with proper private/anonymous or anything.

On Jul 2, 2014, at 3:39 PM, Michael Betancourt notifications@github.com wrote:

In fact, for this reason alone we probably shouldn't expose as_value.

On Wed, Jul 2, 2014 at 2:38 PM, Michael Betancourt betanalpha@gmail.com wrote:

It's not breaking library code, it's isolating autodiff to where it is applicable. In your example the rounding to a loop bound is not continuous, and we should not propagate derivatives through it. I'd much rather the autodiff be super anal about where it can be applied and where it can't (blah blah blah folk theorem blah blah blah).

I also agree that having double returns is bad. For example, if we implemented

template double ceil(const var& v)

then users can do bad things like

var v1 = x; double y = ceil(v1); var v2 = y;

which yields an improper expression graph.

On Wed, Jul 2, 2014 at 2:28 PM, Bob Carpenter notifications@github.com wrote:

I understand.

But I'm not sure what you're proposing and whether it's for the autodiff library or just for the Stan modeling language. I see two options that could go either way:

Option 1) Just remove the var versions of those functions.

Option 2) Throw exceptions or return NaN at points where they're not defined.

Both have the potential to break a lot of library code (Boost, Eigen). A quick look at Eigen shows they use floor and ceil ops on floating point values, then convert them to integers for loop bounds. So I guess we could also have:

Option 3) Change their signatures to return double values.

This would be super dangerous, I think.

Bob

On Jul 2, 2014, at 3:02 PM, Michael Betancourt notifications@github.com wrote:

Right now the Stan autodiff implements functions with improper derivatives (i.e. the derivatives do not exist at certain points), which is formally improper. We have seen issues where people use functions like step, floor, and ceil in their models and the resulting sampling is very poor.

Personally I'd prefer to limit Stan to functions with proper derivatives only, but in any case we should have the discussion.

— Reply to this email directly or view it on GitHub.

— Reply to this email directly or view it on GitHub https://github.com/stan-dev/stan/issues/731#issuecomment-47775312.

— Reply to this email directly or view it on GitHub.

syclik commented 9 years ago

From @betanalpha on July 2, 2014 14:10

For instance, Eigen uses ceil and floor operations to set bounds on loop iterations in some of its iterative algorithms, like eigendecomposition.

Yup, and I consider this a feature. Differentiating through iterative algorithms is a bad idea (™), and we've already seen how it leads to poor performance if not outright failures.

The derivative of a function != the derivative of its numerical approximation. I'd much, much rather us implement proper derivatives for things like solve, invert, and decompose as it will be faster and much more stable.

If we get rid of fabs(), we'll no longer be able to compute L1 distances or the double-exponential (aka Laplace) distribution.

Which is known to cause issues in Bayesian models. The right answer here is to, for example, implement a smoothed L1 regularization function.

And of course, it'll break auto-diff of any user-defined function that uses these functions, and hence make the library problematic for other people to drop in.

Problematic for them to drop in and get a bad answer? I'd say that an algorithm that returns an bad answer without even a warning is far worse than an algorithm that properly restricts inputs to valid functions.

These functions are differentiable everywhere other than the discontinuities. But if we throw exceptions or return NaN rather than just returning an "interpolated" value, it won't solve any of our problems other than lack of propriety.

It's even worse than that, because the problems don't arise when evaluating functions at discontinuities but rather when passing by discontinuities. The specifics depend on the algorithm, but the point is that just returning the smooth answer and throwing out the occasional NaN will not give a valid implementation.

It's not "improper" in the sense that it will fail to compile or fail to run. You just don't get drivatives propagated from v2 back to x.

Which I would argue is not a proper representation of the expression graph one would naively expect.

Expose to whom? We haven't put any effort into a clean user-facing interface with proper private/anonymous or anything.

Sorry, I'm thinking of the new autodiff where we can be a little more careful about these things from the beginning.

syclik commented 9 years ago

From @andrewgelman on July 2, 2014 15:55

It’s also tricky because we’ve been talking about putting Metrop and slice sampling into Stan at some point, so we don’t want to go restricting the language where we don’t have to.

On Jul 2, 2014, at 4:10 PM, Michael Betancourt notifications@github.com wrote:

For instance, Eigen uses ceil and floor operations to set bounds on loop iterations in some of its iterative algorithms, like eigendecomposition.

Yup, and I consider this a feature. Differentiating through iterative algorithms is a bad idea (™), and we've already seen how it leads to poor performance if not outright failures.

The derivative of a function != the derivative of its numerical approximation. I'd much, much rather us implement proper derivatives for things like solve, invert, and decompose as it will be faster and much more stable.

If we get rid of fabs(), we'll no longer be able to compute L1 distances or the double-exponential (aka Laplace) distribution.

Which is known to cause issues in Bayesian models. The right answer here is to, for example, implement a smoothed L1 regularization function.

And of course, it'll break auto-diff of any user-defined function that uses these functions, and hence make the library problematic for other people to drop in.

Problematic for them to drop in and get a bad answer? I'd say that an algorithm that returns an bad answer without even a warning is far worse than an algorithm that properly restricts inputs to valid functions.

These functions are differentiable everywhere other than the discontinuities. But if we throw exceptions or return NaN rather than just returning an "interpolated" value, it won't solve any of our problems other than lack of propriety.

It's even worse than that, because the problems don't arise when evaluating functions at discontinuities but rather when passing by discontinuities. The specifics depend on the algorithm, but the point is that just returning the smooth answer and throwing out the occasional NaN will not give a valid implementation.

It's not "improper" in the sense that it will fail to compile or fail to run. You just don't get drivatives propagated from v2 back to x.

Which I would argue is not a proper representation of the expression graph one would naively expect.

Expose to whom? We haven't put any effort into a clean user-facing interface with proper private/anonymous or anything.

Sorry, I'm thinking of the new autodiff where we can be a little more careful about these things from the beginning. — Reply to this email directly or view it on GitHub.

syclik commented 9 years ago

From @betanalpha on July 2, 2014 17:49

It’s easy enough to wrap all the autodiff functions in shells that just throw exceptions so the log prob functions work fine with doubles, we just have to make a decision on whether to be this strict or not.

Of course there will be the problem that not every model with work with every algorithm, and people will complain and be confused by this.

On Jul 2, 2014, at 4:55 PM, Andrew Gelman notifications@github.com wrote:

It’s also tricky because we’ve been talking about putting Metrop and slice sampling into Stan at some point, so we don’t want to go restricting the language where we don’t have to.

On Jul 2, 2014, at 4:10 PM, Michael Betancourt notifications@github.com wrote:

For instance, Eigen uses ceil and floor operations to set bounds on loop iterations in some of its iterative algorithms, like eigendecomposition.

Yup, and I consider this a feature. Differentiating through iterative algorithms is a bad idea (™), and we've already seen how it leads to poor performance if not outright failures.

The derivative of a function != the derivative of its numerical approximation. I'd much, much rather us implement proper derivatives for things like solve, invert, and decompose as it will be faster and much more stable.

If we get rid of fabs(), we'll no longer be able to compute L1 distances or the double-exponential (aka Laplace) distribution.

Which is known to cause issues in Bayesian models. The right answer here is to, for example, implement a smoothed L1 regularization function.

And of course, it'll break auto-diff of any user-defined function that uses these functions, and hence make the library problematic for other people to drop in.

Problematic for them to drop in and get a bad answer? I'd say that an algorithm that returns an bad answer without even a warning is far worse than an algorithm that properly restricts inputs to valid functions.

These functions are differentiable everywhere other than the discontinuities. But if we throw exceptions or return NaN rather than just returning an "interpolated" value, it won't solve any of our problems other than lack of propriety.

It's even worse than that, because the problems don't arise when evaluating functions at discontinuities but rather when passing by discontinuities. The specifics depend on the algorithm, but the point is that just returning the smooth answer and throwing out the occasional NaN will not give a valid implementation.

It's not "improper" in the sense that it will fail to compile or fail to run. You just don't get drivatives propagated from v2 back to x.

Which I would argue is not a proper representation of the expression graph one would naively expect.

Expose to whom? We haven't put any effort into a clean user-facing interface with proper private/anonymous or anything.

Sorry, I'm thinking of the new autodiff where we can be a little more careful about these things from the beginning. — Reply to this email directly or view it on GitHub.

— Reply to this email directly or view it on GitHub.

syclik commented 9 years ago

From @bob-carpenter on July 2, 2014 18:51

We already have this problem with just HMC by itself. Not every model fits with it. Then there's optimization, which can "fit" a more restricted set of models, but is more scalable.

RHMC and MML will just make the problem worse with their O(N^3) scalability issues. Then VB and EP will also raise all sorts of problems in terms of which models can be fit and which can't.

Statisticians are used to a grab bag of model-specific techniques. As are computer scientists.

At least until someone invents the one true solver, we'll just have to deal with this issue.

On Jul 2, 2014, at 1:49 PM, Michael Betancourt notifications@github.com wrote:

It’s easy enough to wrap all the autodiff functions in shells that just throw exceptions so the log prob functions work fine with doubles, we just have to make a decision on whether to be this strict or not.

Of course there will be the problem that not every model with work with every algorithm, and people will complain and be confused by this.

On Jul 2, 2014, at 4:55 PM, Andrew Gelman notifications@github.com wrote:

It’s also tricky because we’ve been talking about putting Metrop and slice sampling into Stan at some point, so we don’t want to go restricting the language where we don’t have to.

On Jul 2, 2014, at 4:10 PM, Michael Betancourt notifications@github.com wrote:

For instance, Eigen uses ceil and floor operations to set bounds on loop iterations in some of its iterative algorithms, like eigendecomposition.

Yup, and I consider this a feature. Differentiating through iterative algorithms is a bad idea (™), and we've already seen how it leads to poor performance if not outright failures.

The derivative of a function != the derivative of its numerical approximation. I'd much, much rather us implement proper derivatives for things like solve, invert, and decompose as it will be faster and much more stable.

If we get rid of fabs(), we'll no longer be able to compute L1 distances or the double-exponential (aka Laplace) distribution.

Which is known to cause issues in Bayesian models. The right answer here is to, for example, implement a smoothed L1 regularization function.

And of course, it'll break auto-diff of any user-defined function that uses these functions, and hence make the library problematic for other people to drop in.

Problematic for them to drop in and get a bad answer? I'd say that an algorithm that returns an bad answer without even a warning is far worse than an algorithm that properly restricts inputs to valid functions.

These functions are differentiable everywhere other than the discontinuities. But if we throw exceptions or return NaN rather than just returning an "interpolated" value, it won't solve any of our problems other than lack of propriety.

It's even worse than that, because the problems don't arise when evaluating functions at discontinuities but rather when passing by discontinuities. The specifics depend on the algorithm, but the point is that just returning the smooth answer and throwing out the occasional NaN will not give a valid implementation.

It's not "improper" in the sense that it will fail to compile or fail to run. You just don't get drivatives propagated from v2 back to x.

Which I would argue is not a proper representation of the expression graph one would naively expect.

Expose to whom? We haven't put any effort into a clean user-facing interface with proper private/anonymous or anything.

Sorry, I'm thinking of the new autodiff where we can be a little more careful about these things from the beginning. — Reply to this email directly or view it on GitHub.

— Reply to this email directly or view it on GitHub.

— Reply to this email directly or view it on GitHub.

syclik commented 9 years ago

From @betanalpha on July 2, 2014 19:48

I’m not even trying to go that deep — I am just proposing that we ensure our autodiff library doesn’t claim to take derivatives that it can’t. This is relevant to any algorithm that uses derivatives which all implicitly assume that the objective function is smooth.

On Jul 2, 2014, at 7:51 PM, Bob Carpenter notifications@github.com wrote:

We already have this problem with just HMC by itself. Not every model fits with it. Then there's optimization, which can "fit" a more restricted set of models, but is more scalable.

RHMC and MML will just make the problem worse with their O(N^3) scalability issues. Then VB and EP will also raise all sorts of problems in terms of which models can be fit and which can't.

Statisticians are used to a grab bag of model-specific techniques. As are computer scientists.

At least until someone invents the one true solver, we'll just have to deal with this issue.

Bob

On Jul 2, 2014, at 1:49 PM, Michael Betancourt notifications@github.com wrote:

It’s easy enough to wrap all the autodiff functions in shells that just throw exceptions so the log prob functions work fine with doubles, we just have to make a decision on whether to be this strict or not.

Of course there will be the problem that not every model with work with every algorithm, and people will complain and be confused by this.

On Jul 2, 2014, at 4:55 PM, Andrew Gelman notifications@github.com wrote:

It’s also tricky because we’ve been talking about putting Metrop and slice sampling into Stan at some point, so we don’t want to go restricting the language where we don’t have to.

On Jul 2, 2014, at 4:10 PM, Michael Betancourt notifications@github.com wrote:

For instance, Eigen uses ceil and floor operations to set bounds on loop iterations in some of its iterative algorithms, like eigendecomposition.

Yup, and I consider this a feature. Differentiating through iterative algorithms is a bad idea (™), and we've already seen how it leads to poor performance if not outright failures.

The derivative of a function != the derivative of its numerical approximation. I'd much, much rather us implement proper derivatives for things like solve, invert, and decompose as it will be faster and much more stable.

If we get rid of fabs(), we'll no longer be able to compute L1 distances or the double-exponential (aka Laplace) distribution.

Which is known to cause issues in Bayesian models. The right answer here is to, for example, implement a smoothed L1 regularization function.

And of course, it'll break auto-diff of any user-defined function that uses these functions, and hence make the library problematic for other people to drop in.

Problematic for them to drop in and get a bad answer? I'd say that an algorithm that returns an bad answer without even a warning is far worse than an algorithm that properly restricts inputs to valid functions.

These functions are differentiable everywhere other than the discontinuities. But if we throw exceptions or return NaN rather than just returning an "interpolated" value, it won't solve any of our problems other than lack of propriety.

It's even worse than that, because the problems don't arise when evaluating functions at discontinuities but rather when passing by discontinuities. The specifics depend on the algorithm, but the point is that just returning the smooth answer and throwing out the occasional NaN will not give a valid implementation.

It's not "improper" in the sense that it will fail to compile or fail to run. You just don't get drivatives propagated from v2 back to x.

Which I would argue is not a proper representation of the expression graph one would naively expect.

Expose to whom? We haven't put any effort into a clean user-facing interface with proper private/anonymous or anything.

Sorry, I'm thinking of the new autodiff where we can be a little more careful about these things from the beginning. — Reply to this email directly or view it on GitHub.

— Reply to this email directly or view it on GitHub.

— Reply to this email directly or view it on GitHub.

— Reply to this email directly or view it on GitHub.

syclik commented 9 years ago

From @andrewgelman on July 2, 2014 19:50

My general preference is to have the algorithm work with arbitrary functions and return an error when the derivative does not exist, rather than to restrict the class of functions we can use.

On Jul 2, 2014, at 9:48 PM, Michael Betancourt notifications@github.com wrote:

I’m not even trying to go that deep — I am just proposing that we ensure our autodiff library doesn’t claim to take derivatives that it can’t. This is relevant to any algorithm that uses derivatives which all implicitly assume that the objective function is smooth.

On Jul 2, 2014, at 7:51 PM, Bob Carpenter notifications@github.com wrote:

We already have this problem with just HMC by itself. Not every model fits with it. Then there's optimization, which can "fit" a more restricted set of models, but is more scalable.

RHMC and MML will just make the problem worse with their O(N^3) scalability issues. Then VB and EP will also raise all sorts of problems in terms of which models can be fit and which can't.

Statisticians are used to a grab bag of model-specific techniques. As are computer scientists.

At least until someone invents the one true solver, we'll just have to deal with this issue.

Bob

On Jul 2, 2014, at 1:49 PM, Michael Betancourt notifications@github.com wrote:

It’s easy enough to wrap all the autodiff functions in shells that just throw exceptions so the log prob functions work fine with doubles, we just have to make a decision on whether to be this strict or not.

Of course there will be the problem that not every model with work with every algorithm, and people will complain and be confused by this.

On Jul 2, 2014, at 4:55 PM, Andrew Gelman notifications@github.com wrote:

It’s also tricky because we’ve been talking about putting Metrop and slice sampling into Stan at some point, so we don’t want to go restricting the language where we don’t have to.

On Jul 2, 2014, at 4:10 PM, Michael Betancourt notifications@github.com wrote:

For instance, Eigen uses ceil and floor operations to set bounds on loop iterations in some of its iterative algorithms, like eigendecomposition.

Yup, and I consider this a feature. Differentiating through iterative algorithms is a bad idea (™), and we've already seen how it leads to poor performance if not outright failures.

The derivative of a function != the derivative of its numerical approximation. I'd much, much rather us implement proper derivatives for things like solve, invert, and decompose as it will be faster and much more stable.

If we get rid of fabs(), we'll no longer be able to compute L1 distances or the double-exponential (aka Laplace) distribution.

Which is known to cause issues in Bayesian models. The right answer here is to, for example, implement a smoothed L1 regularization function.

And of course, it'll break auto-diff of any user-defined function that uses these functions, and hence make the library problematic for other people to drop in.

Problematic for them to drop in and get a bad answer? I'd say that an algorithm that returns an bad answer without even a warning is far worse than an algorithm that properly restricts inputs to valid functions.

These functions are differentiable everywhere other than the discontinuities. But if we throw exceptions or return NaN rather than just returning an "interpolated" value, it won't solve any of our problems other than lack of propriety.

It's even worse than that, because the problems don't arise when evaluating functions at discontinuities but rather when passing by discontinuities. The specifics depend on the algorithm, but the point is that just returning the smooth answer and throwing out the occasional NaN will not give a valid implementation.

It's not "improper" in the sense that it will fail to compile or fail to run. You just don't get drivatives propagated from v2 back to x.

Which I would argue is not a proper representation of the expression graph one would naively expect.

Expose to whom? We haven't put any effort into a clean user-facing interface with proper private/anonymous or anything.

Sorry, I'm thinking of the new autodiff where we can be a little more careful about these things from the beginning. — Reply to this email directly or view it on GitHub.

— Reply to this email directly or view it on GitHub.

— Reply to this email directly or view it on GitHub.

— Reply to this email directly or view it on GitHub.

— Reply to this email directly or view it on GitHub.

syclik commented 9 years ago

From @bob-carpenter on July 2, 2014 19:53

From the user perspective, they write down a model and for some reason it doesn't fit.

Case 1: Model is too hard for algorithm X.

Case 2: Algorithm X needs a function not supported by the model.

We already have Case 1 and it's problematic to diagnose. Case 2 is at least easy to diagnose and provide an error message for --- the function gets called and fails. We list which functions are unavailable for which algorithms and point the user to that.

On Jul 2, 2014, at 3:48 PM, Michael Betancourt notifications@github.com wrote:

I’m not even trying to go that deep — I am just proposing that we ensure our autodiff library doesn’t claim to take derivatives that it can’t. This is relevant to any algorithm that uses derivatives which all implicitly assume that the objective function is smooth.

On Jul 2, 2014, at 7:51 PM, Bob Carpenter notifications@github.com wrote:

We already have this problem with just HMC by itself. Not every model fits with it. Then there's optimization, which can "fit" a more restricted set of models, but is more scalable.

RHMC and MML will just make the problem worse with their O(N^3) scalability issues. Then VB and EP will also raise all sorts of problems in terms of which models can be fit and which can't.

Statisticians are used to a grab bag of model-specific techniques. As are computer scientists.

At least until someone invents the one true solver, we'll just have to deal with this issue.

Bob

On Jul 2, 2014, at 1:49 PM, Michael Betancourt notifications@github.com wrote:

It’s easy enough to wrap all the autodiff functions in shells that just throw exceptions so the log prob functions work fine with doubles, we just have to make a decision on whether to be this strict or not.

Of course there will be the problem that not every model with work with every algorithm, and people will complain and be confused by this.

On Jul 2, 2014, at 4:55 PM, Andrew Gelman notifications@github.com wrote:

It’s also tricky because we’ve been talking about putting Metrop and slice sampling into Stan at some point, so we don’t want to go restricting the language where we don’t have to.

On Jul 2, 2014, at 4:10 PM, Michael Betancourt notifications@github.com wrote:

For instance, Eigen uses ceil and floor operations to set bounds on loop iterations in some of its iterative algorithms, like eigendecomposition.

Yup, and I consider this a feature. Differentiating through iterative algorithms is a bad idea (™), and we've already seen how it leads to poor performance if not outright failures.

The derivative of a function != the derivative of its numerical approximation. I'd much, much rather us implement proper derivatives for things like solve, invert, and decompose as it will be faster and much more stable.

If we get rid of fabs(), we'll no longer be able to compute L1 distances or the double-exponential (aka Laplace) distribution.

Which is known to cause issues in Bayesian models. The right answer here is to, for example, implement a smoothed L1 regularization function.

And of course, it'll break auto-diff of any user-defined function that uses these functions, and hence make the library problematic for other people to drop in.

Problematic for them to drop in and get a bad answer? I'd say that an algorithm that returns an bad answer without even a warning is far worse than an algorithm that properly restricts inputs to valid functions.

These functions are differentiable everywhere other than the discontinuities. But if we throw exceptions or return NaN rather than just returning an "interpolated" value, it won't solve any of our problems other than lack of propriety.

It's even worse than that, because the problems don't arise when evaluating functions at discontinuities but rather when passing by discontinuities. The specifics depend on the algorithm, but the point is that just returning the smooth answer and throwing out the occasional NaN will not give a valid implementation.

It's not "improper" in the sense that it will fail to compile or fail to run. You just don't get drivatives propagated from v2 back to x.

Which I would argue is not a proper representation of the expression graph one would naively expect.

Expose to whom? We haven't put any effort into a clean user-facing interface with proper private/anonymous or anything.

Sorry, I'm thinking of the new autodiff where we can be a little more careful about these things from the beginning. — Reply to this email directly or view it on GitHub.

— Reply to this email directly or view it on GitHub.

— Reply to this email directly or view it on GitHub.

— Reply to this email directly or view it on GitHub.

— Reply to this email directly or view it on GitHub.

syclik commented 9 years ago

From @betanalpha on July 2, 2014 19:57

From the user perspective, they write down a model and for some reason it doesn't fit.

Case 1: Model is too hard for algorithm X.

Case 2: Algorithm X needs a function not supported by the model.

We already have Case 1 and it's problematic to diagnose. Case 2 is at least easy to diagnose and provide an error message for --- the function gets called and fails. We list which functions are unavailable for which algorithms and point the user to that.

Right, this is basically for what I am advocating. We implement all the functions for var but discontinuous functions throw an exception when you try to use them, so they compile (and can be used for simple stuff) but fail when you try to use them for HMC/gradient based optimizations.

Incidentally, this would be the same behavior if we have a function with only first-order derivatives but what to run a higher-order algorithm. The new stuff has exceptions built in to throw so that the user is aware (a) which function failed and (b) why.

syclik commented 9 years ago

From @bob-carpenter on July 2, 2014 20:6

OK --- then it's just a discussion about whether we want to eliminate functions that are not everywhere differentiable from HMC.

The other relevant case is conditionals involving parameters. This lets you sneak in non-differentiable functions. For example, with abs:

real abs_a; if (a >= 0) abs_a <- a; else abs_a <- -a;

On Jul 2, 2014, at 3:57 PM, Michael Betancourt notifications@github.com wrote:

From the user perspective, they write down a model and for some reason it doesn't fit.

Case 1: Model is too hard for algorithm X.

Case 2: Algorithm X needs a function not supported by the model.

We already have Case 1 and it's problematic to diagnose. Case 2 is at least easy to diagnose and provide an error message for --- the function gets called and fails. We list which functions are unavailable for which algorithms and point the user to that.

Right, this is basically for what I am advocating. We implement all the functions for var but discontinuous functions throw an exception when you try to use them, so they compile (and can be used for simple stuff) but fail when you try to use them for HMC/gradient based optimizations.

Incidentally, this would be the same behavior if we have a function with only first-order derivatives but what to run a higher-order algorithm. The new stuff has exceptions built in to throw so that the user is aware (a) which function failed and (b) why. — Reply to this email directly or view it on GitHub.

syclik commented 9 years ago

From @betanalpha on July 2, 2014 20:15

Yeah, they make me uncomfortable, too, although that’s much harder to check. I don’t think we can enforce it from within the autodiff library alone.

On Jul 2, 2014, at 9:06 PM, Bob Carpenter notifications@github.com wrote:

OK --- then it's just a discussion about whether we want to eliminate functions that are not everywhere differentiable from HMC.

The other relevant case is conditionals involving parameters. This lets you sneak in non-differentiable functions. For example, with abs:

real abs_a; if (a >= 0) abs_a <- a; else abs_a <- -a;

Bob

On Jul 2, 2014, at 3:57 PM, Michael Betancourt notifications@github.com wrote:

From the user perspective, they write down a model and for some reason it doesn't fit.

Case 1: Model is too hard for algorithm X.

Case 2: Algorithm X needs a function not supported by the model.

We already have Case 1 and it's problematic to diagnose. Case 2 is at least easy to diagnose and provide an error message for --- the function gets called and fails. We list which functions are unavailable for which algorithms and point the user to that.

Right, this is basically for what I am advocating. We implement all the functions for var but discontinuous functions throw an exception when you try to use them, so they compile (and can be used for simple stuff) but fail when you try to use them for HMC/gradient based optimizations.

Incidentally, this would be the same behavior if we have a function with only first-order derivatives but what to run a higher-order algorithm. The new stuff has exceptions built in to throw so that the user is aware (a) which function failed and (b) why. — Reply to this email directly or view it on GitHub.

— Reply to this email directly or view it on GitHub.

syclik commented 9 years ago

From @bob-carpenter on July 2, 2014 20:16

I'm now back in my usual bind. I submitted a pull request with a patch to functions. I think it's OK, but our testing's backed up enough that it'll be days before it can get reviewed, tested, and merged.

In the meantime, there's a new bug I want to fix.

I need to work from the previous fixes, as there's a huge overlap in code and I don't want a merge trainwreck on my hands.

So, I ask again because I forgot the answer from last time, what should I do?

just cool my heels and wait, or
belay the last pull request, roll the issue into the old issue, and fix everything at once, or.
branch from the bugfix branch for the last issue.

And if the answer is (3), how do I create the pull request for the fix? Is the target develop or the previous bugfix branch?

On Jul 2, 2014, at 12:36 PM, Jeffrey Arnold notifications@github.com wrote:

Both the BNF and the parser allow void for argument types, which I don't think makes sense. The parser doesn't throw an error, but an error occurs when the C++ code is compiled.

Example using stan 2.3.0:

functions { void foo(void a) { print("hello, world!"); } } model { foo(); }

produces the following error,

$ make foo

--- Translating Stan model to C++ code --- bin/stanc foo.stan --o=foo.cpp Model name=foo_model Input file=foo.stan Output file=foo.cpp clang++ -I src -isystem lib/eigen_3.2.0 -isystem lib/boost_1.54.0 -Wall -DBOOST_RESULT_OF_USE_TR1 -DBOOST_NO_DECLTYPE -DBOOST_DISABLE_ASSERTS -pipe -DEIGEN_NO_DEBUG -Wno-unused-function -Wno-uninitialized -Wno-c++11-long-long -c -O3 -o foo.o foo.cpp foo.cpp:40:15: error: cannot form a reference to 'void' foo(const void& a, std::ostream* pstream__) { ^ 1 error generated. make: *\ [foo.o] Error 1

— Reply to this email directly or view it on GitHub.

syclik commented 9 years ago

From @bob-carpenter on July 2, 2014 20:18

It's easy to enforce from the language side. We know where variables are defined and there's a function to check if an expression involves a parameter (transformed parameter, or local variable in the transformed parameter or model blocks).

The one sticking point before was interpolation, but I think the answer there may be to write a special BUGS-like interpolation function.

On Jul 2, 2014, at 4:15 PM, Michael Betancourt notifications@github.com wrote:

Yeah, they make me uncomfortable, too, although that’s much harder to check. I don’t think we can enforce it from within the autodiff library alone.

On Jul 2, 2014, at 9:06 PM, Bob Carpenter notifications@github.com wrote:

OK --- then it's just a discussion about whether we want to eliminate functions that are not everywhere differentiable from HMC.

The other relevant case is conditionals involving parameters. This lets you sneak in non-differentiable functions. For example, with abs:

real abs_a; if (a >= 0) abs_a <- a; else abs_a <- -a;

Bob

On Jul 2, 2014, at 3:57 PM, Michael Betancourt notifications@github.com wrote:

From the user perspective, they write down a model and for some reason it doesn't fit.

Case 1: Model is too hard for algorithm X.

Case 2: Algorithm X needs a function not supported by the model.

We already have Case 1 and it's problematic to diagnose. Case 2 is at least easy to diagnose and provide an error message for --- the function gets called and fails. We list which functions are unavailable for which algorithms and point the user to that.

Right, this is basically for what I am advocating. We implement all the functions for var but discontinuous functions throw an exception when you try to use them, so they compile (and can be used for simple stuff) but fail when you try to use them for HMC/gradient based optimizations.

Incidentally, this would be the same behavior if we have a function with only first-order derivatives but what to run a higher-order algorithm. The new stuff has exceptions built in to throw so that the user is aware (a) which function failed and (b) why. — Reply to this email directly or view it on GitHub.

— Reply to this email directly or view it on GitHub.

— Reply to this email directly or view it on GitHub.

syclik commented 9 years ago

From @betanalpha on July 2, 2014 20:33

This is exactly for what I’ve been advocating for some time — custom spline, etc functions that exactly enforce the desired smoothness properties.

On Jul 2, 2014, at 9:18 PM, Bob Carpenter notifications@github.com wrote:

The one sticking point before was interpolation, but I think the answer there may be to write a special BUGS-like interpolation function.

syclik commented 7 years ago

I'm closing this due to inactivity; we know that there are discontinuous functions out there and we don't currently have a solution for enforcing smooth functions.