reduce the frequency of metropolis rejection messages

stan-dev / stan

Stan development repository. The master branch contains the current release. The develop branch contains the latest stable development. See the Developer Process Wiki for details.

https://mc-stan.org

BSD 3-Clause "New" or "Revised" License

2.61k stars 369 forks source link

reduce the frequency of metropolis rejection messages #753

Closed syclik closed 10 years ago

syclik commented 10 years ago

The "Informational Message" that's currently in Stan confuses users.

Proposals so far:

remove message from warmup
remove message altogether

mbrubake commented 10 years ago

I'd like to add a third proposal: suppress but count the number of rejections. If, after some number of iterations (e.g., warmup or a few 100), more than a certain percentage of steps are being rejected (more than 10% maybe?) start reporting the error along with a message indicating how to turn the message on from the beginning and explaining that there is probably a misspecified model.

bob-carpenter commented 10 years ago

My main objection to the third proposal (counting number of such rejections) is that

it adds more state to manage for the samplers, and hence more things that can go wrong,
it requires us to somehow branch on the identity of exceptions, and
it will complicate all of our doc if we have error messages that only print out under certain conditions, especially if we have ways to control when they're output.

betanalpha commented 10 years ago

I am against all three options.

Here’s the problem — as hard as we work Stan will never be a black box. Users have to think about both their models and whether their models can fit and there is no way we will be able to automate that last step in entirety.

If we ignore it completely then users might think they have valid results when they don’t, which is the worst outcome. If we ignore it during warmup users might have chains that never converged (got stuck in the tails) and not know about it, which is almost as bad. Trying to engineer a threshold is completely arbitrary and will give false confidence when users don’t see a warning.

Have we considered having the message reference a section in the manual where we can explain the issue at depth? It’s a dependency, but there’s no way we’re going to be able to elaborate on the issue well-enough in a warning message.

On Jul 8, 2014, at 6:27 PM, Bob Carpenter notifications@github.com wrote:

My main objection to the third proposal (counting number of such rejections) is that

it adds more state to manage for the samplers, and hence more things that can go wrong,

it requires us to somehow branch on the identity of exceptions, and

it will complicate all of our doc if we have error messages that only print out under certain conditions, especially if we have ways to control when they're output.

— Reply to this email directly or view it on GitHub.

bob-carpenter commented 10 years ago

I'm also not a fan of any of solutions 1--3.

The problem is that often we get one or more of these messages during warmup, where they're harmless. Users

(a) see a gazillion of the same message, which makes it look even more severe than it is,

(b) they don't trust Stan, and

I think all three of these are more problematic than

(d) users getting answers that are wrong that they think are right.

What advice would we give them? It's not as if when they get a bunch of these messages, they can conclude their model's buggy or the sampler's giving them the wrong answers.

If we're going to keep the message spew, then I'm for referencing a section of the manual rather than dumping out a huge message, which is even more frightening.

I'm not sure what language to put around it if we're not going to use "error" or "warning", which is an issue in Daniel's other issue about renaming the message conditions.

On Jul 8, 2014, at 3:28 PM, Michael Betancourt notifications@github.com wrote:

I am against all three options.

Here’s the problem — as hard as we work Stan will never be a black box. Users have to think about both their models and whether their models can fit and there is no way we will be able to automate that last step in entirety.

If we ignore it completely then users might think they have valid results when they don’t, which is the worst outcome. If we ignore it during warmup users might have chains that never converged (got stuck in the tails) and not know about it, which is almost as bad. Trying to engineer a threshold is completely arbitrary and will give false confidence when users don’t see a warning.

Have we considered having the message reference a section in the manual where we can explain the issue at depth? It’s a dependency, but there’s no way we’re going to be able to elaborate on the issue well-enough in a warning message.

On Jul 8, 2014, at 6:27 PM, Bob Carpenter notifications@github.com wrote:

My main objection to the third proposal (counting number of such rejections) is that

it adds more state to manage for the samplers, and hence more things that can go wrong,

it requires us to somehow branch on the identity of exceptions, and

it will complicate all of our doc if we have error messages that only print out under certain conditions, especially if we have ways to control when they're output.

— Reply to this email directly or view it on GitHub.

— Reply to this email directly or view it on GitHub.

aadler commented 10 years ago

Can't there simply be an option "suppress rejection message" set to FALSE at first (maybe even mentioning the section in the manual). This way, the user gets warned, gets pointed to the appropriate reference, and then has the choice to "live dangerously" if they so choose.

bob-carpenter commented 10 years ago

It's computer code, so we could, in principle, do just about anything.

What I see happening to software over time is that these little hacks accrue to the point where the whole thing's unmanageable. What I'm resistant to is some wide-ranging change that handles one exception (it's wide ranging because it gets thrown at a low level then percolates up through sampling, optimization, etc., each of which need decisions on how to handle it separately from other exceptions).

The problem here is that configuring this one message would require a whole lot of ad hoc plumbing throughout that we could either take the time to engineer properly or just hack. Of course, we could choose to shut off all warning messages. There already is that option, I believe.

On Jul 8, 2014, at 3:44 PM, Avraham Adler notifications@github.com wrote:

Can't there simply be an option "suppress rejection message" set to FALSE at first (andmaybe even mentioning the section in the manual). This way, the user gets warned, gets pointed to the appropriate reference, and then has the choice to "live dangerously" if they so choose.

— Reply to this email directly or view it on GitHub.

betanalpha commented 10 years ago

No one gets to turn off error messages until they pass a Stan certified statistics workshop. Too many people think they can live dangerously — it’s like everyone is a statistical teenager.

Seriously, though, the problem here is that the software will never be bullet proof. People wanting a complete black box statistical program will inevitably end up with poor results — you can’t do analysis without thinking really hard! I would MUCH rather have Stan be the truthful program being liberal with information and diagnostics. Especially since almost every time this has come up the model is teetering on being ill-posed.

On Jul 8, 2014, at 9:14 PM, Bob Carpenter notifications@github.com wrote:

It's computer code, so we could, in principle, do just about anything.

What I see happening to software over time is that these little hacks accrue to the point where the whole thing's unmanageable. What I'm resistant to is some wide-ranging change that handles one exception (it's wide ranging because it gets thrown at a low level then percolates up through sampling, optimization, etc., each of which need decisions on how to handle it separately from other exceptions).

The problem here is that configuring this one message would require a whole lot of ad hoc plumbing throughout that we could either take the time to engineer properly or just hack. Of course, we could choose to shut off all warning messages. There already is that option, I believe.

On Jul 8, 2014, at 3:44 PM, Avraham Adler notifications@github.com wrote:

Can't there simply be an option "suppress rejection message" set to FALSE at first (andmaybe even mentioning the section in the manual). This way, the user gets warned, gets pointed to the appropriate reference, and then has the choice to "live dangerously" if they so choose.

— Reply to this email directly or view it on GitHub.

— Reply to this email directly or view it on GitHub.

bob-carpenter commented 10 years ago

I completely agree.

And I'd still like to find a way to make this easier to digest for users so they don't freak out when they shouldn't and either clog up our mailing lists or give up on Stan in disgust.

I can live with a compromise of always printing the message, removing "error" or "warning" or any other "scary" word, and then redirecting to the manual. At least that's a consistent interface behavior for all the causes of rejection.

Along these lines (though it's a different issue), only some of our failing functions raise messages. Others (namely any C++ built-in) just returns NaN or -infinity and gets rejected silently.

Similarly, we don't produce warnings when the Hamiltonian simulation goes off the rails and we do rejections from the slice sampler inside (or do we and it just never happens?).

On Jul 8, 2014, at 4:23 PM, Michael Betancourt notifications@github.com wrote:

No one gets to turn off error messages until they pass a Stan certified statistics workshop. Too many people think they can live dangerously — it’s like everyone is a statistical teenager.

Seriously, though, the problem here is that the software will never be bullet proof. People wanting a complete black box statistical program will inevitably end up with poor results — you can’t do analysis without thinking really hard! I would MUCH rather have Stan be the truthful program being liberal with information and diagnostics. Especially since almost every time this has come up the model is teetering on being ill-posed.

On Jul 8, 2014, at 9:14 PM, Bob Carpenter notifications@github.com wrote:

It's computer code, so we could, in principle, do just about anything.

What I see happening to software over time is that these little hacks accrue to the point where the whole thing's unmanageable. What I'm resistant to is some wide-ranging change that handles one exception (it's wide ranging because it gets thrown at a low level then percolates up through sampling, optimization, etc., each of which need decisions on how to handle it separately from other exceptions).

The problem here is that configuring this one message would require a whole lot of ad hoc plumbing throughout that we could either take the time to engineer properly or just hack. Of course, we could choose to shut off all warning messages. There already is that option, I believe.

On Jul 8, 2014, at 3:44 PM, Avraham Adler notifications@github.com wrote:

Can't there simply be an option "suppress rejection message" set to FALSE at first (andmaybe even mentioning the section in the manual). This way, the user gets warned, gets pointed to the appropriate reference, and then has the choice to "live dangerously" if they so choose.

— Reply to this email directly or view it on GitHub.

— Reply to this email directly or view it on GitHub.

— Reply to this email directly or view it on GitHub.

betanalpha commented 10 years ago

And I'd still like to find a way to make this easier to digest for users so they don't freak out when they shouldn't and either clog up our mailing lists or give up on Stan in disgust.

If this is all we expect from users then we might as well and stop developing now.

Similarly, we don't produce warnings when the Hamiltonian simulation goes off the rails and we do rejections from the slice sampler inside (or do we and it just never happens?).

It gets accumulated in the n_divegent stat. Honestly this would be a case where I would advocate promotion to a visible warning, as it’s another place where people can potentially miss something important.

bob-carpenter commented 10 years ago

I don't understand your response. I think it's a lot to expect from users to understand error messages and be able to put them into context. Especially given our lack of doc as to what they mean. That's why I think a pointer to the manual in the error message is a good idea.

To the extent we can make our messages clearer, I think we'll be doing both ourselves and our users a favor.

We're trying to build software for users who are not idiots, but are not necessarily differential geometers, Bayesian modeling experts, or programming gurus.

Many of our users are like me and can be made to understand these issues with enough explanation. Others are in the position of understanding the model they want to build very well, but not understanding issues like arithmetic precision errors due to discretization of the Hamiltonian that cause positive-definiteness checks to fail. I don't think we want to just tell those users to pick up their marbles and go home.

Maybe I'll be the one shouting at the next meeting!

On Jul 8, 2014, at 4:39 PM, Michael Betancourt notifications@github.com wrote:

And I'd still like to find a way to make this easier to digest for users so they don't freak out when they shouldn't and either clog up our mailing lists or give up on Stan in disgust.

If this is all we expect from users then we might as well and stop developing now.

Similarly, we don't produce warnings when the Hamiltonian simulation goes off the rails and we do rejections from the slice sampler inside (or do we and it just never happens?).

It gets accumulated in the n_divegent stat. Honestly this would be a case where I would advocate promotion to a visible warning, as it’s another place where people can potentially miss something important. — Reply to this email directly or view it on GitHub.

mbrubake commented 10 years ago

Personally, I'd rather leave the message (well, a message) in as it is now. If we're going to try to remove it, I'd rather not completely remove it lest it completely mask some very bad issues. Also, I think a lot of the messages we get from users due to this warning are actually useful. A lot of the time they are caused by an incorrectly specified model. I don't have numbers but it's not like 90% of the cases could be safely ignored..

So the issue we're really trying to resolve here is that 1) the warning potentially freaks users out and 2) users clog the mailing list with questions. 1) we can't really solve at the end of the day. MCMC is dangerous and we do a disservice if we hide potentially dangerous behaviour from users. 2) we might be able to solve in another way. Part of the problem is that the warning message gives almost no indication about why it was generated. We don't give a model line number and the function and variable names are often incorrect or, at best, misleading.

So given that, what if we focus on making the messages more informative and useful? This should give more info to users so that they can more quickly diagnose the problem themselves.

betanalpha commented 10 years ago

I’m not sure you’re getting my point — if users pick up their marbles and go home if they see an error message they do not understand then they’re not putting the effort into using Stan safely. It should absolutely not be frustrating to try to figure out what the error is, but in many cases the problems are not obvious and non-trivial work has to go into understanding it (again, we should make understanding as easy as possible). Users not unable or not willing to put that effort in should not be using Stan because they will not be able to use it safety and may very well get burned by a bad result.

On Jul 8, 2014, at 10:00 PM, Bob Carpenter notifications@github.com wrote:

We're trying to build software for users who are not idiots, but are not necessarily differential geometers, Bayesian modeling experts, or programming gurus.

Many of our users are like me and can be made to understand these issues with enough explanation. Others are in the position of understanding the model they want to build very well, but not understanding issues like arithmetic precision errors due to discretization of the Hamiltonian that cause positive-definiteness checks to fail. I don't think we want to just tell those users to pick up their marbles and go home.

Maybe I'll be the one shouting at the next meeting!

aadler commented 10 years ago

This particular issue is about the frequency of the warnings that the Metropolis-Hastings portion was rejected. As the error message says now, in and of itself that isn't a bad thing (especially when dealing with highly constrained objects like correlation matrices). What is a problem is when too many iterations reject, correct? If so, instead of bombarding them with n% of the rejection warnings, how about keeping track of the percentage/ratio of rejections, and if it stays above a threshold for a significant amount of time, pausing the simulation and asking the user if they are sure they want to continue since they may be playing with fire, pointing them to references that those of us "non-differential geometers, Bayesian modeling experts, or programming gurus" could get some modicum of understanding. IF they choose to continue, no more messages during the run, and possibly a "did you know that xy.z% of your iterations were rejected?" message after it is finished.

bob-carpenter commented 10 years ago

Micheal --- I'm agreeing with you --- we should keep the message and try to point users to an explanation using as much finesse as we can to let them know what they're seeing isn't a bug in Stan and should only be of concern if they see it too many times. It's the latter I don't know how to do.

Avraham --- As I just replied to Sebastian in a thread on stan-users, I'm generally opposed to complicated behavior on our interfaces (as measured by how hard it is to doc and configure and understand --- one issue is that reasonable people vary in their assessments here).

Pausing for user input won't work in parallelized settings, so we'd then have to have config to override any interactive input. I have no idea what ramifications would be on the command-line or for PyStan.

My own feeling is that it's not worth the coding, doc, and user support effort compared to just spewing the message. The spew gets the point across at least that it needs to be fixed.

And users shouldn't be running 2000 iterations before these issues are sorted out anyway. Maybe we can change the default number of iterations?

On Jul 8, 2014, at 7:31 PM, Avraham Adler notifications@github.com wrote:

This particular issue is about the frequency of the warnings that the Metropolis-Hastings portion was rejected. As the error message says now, in and of itself that isn't a bad thing (especially when dealing with highly constrained objects like correlation matrices). What is a problem is when too many iterations reject, correct? If so, instead of bombarding them with n% of the rejection warnings, how about keeping track of the percentage/ratio of rejections, and if it stays above a threshold for a significant amount of time, pausing the simulation and asking the user if they are sure they want to continue since they may be playing with fire, pointing them to references that those of us "non-differential geometers, Bayesian modeling experts, or programming gurus" could get some modicum of understanding. IF they choose to continue, no more messages during the run, and possibly a "did you know that xy.z% of your iterations were rejected?" message after it is finished.

— Reply to this email directly or view it on GitHub.

syclik commented 10 years ago

Closing this because we've looped around and decided to keep the messages. I'll make sure (at least) one issue exists in the issue tracker for pointing to the manual.