stan-dev / math

The Stan Math Library is a C++ template library for automatic differentiation of any order using forward, reverse, and mixed modes. It includes a range of built-in functions for probabilistic modeling, linear algebra, and equation solving.
https://mc-stan.org
BSD 3-Clause "New" or "Revised" License
735 stars 185 forks source link

truncated PRNGs #214

Open bob-carpenter opened 8 years ago

bob-carpenter commented 8 years ago

It'd be nice to be able to match our models and do something like this:

sigma_sim <- normal_rng(mu, sigma) T[lb, ];

to get a truncated form of the normal (and so on for upper and lower bounds and just upper bounds).

What we need in this repo is truncated forms of the PRNGs. Rejection sampling won't be robust enough and writing a slice sampler seems like a huge pain and I'm not even clear it'll work for fat tails like the Cauchy or constrained distributions without transformed. Will this have to wait until we have inverse CDFs?

sakrejda commented 8 years ago

On Sat, Dec 5, 2015 at 9:24 PM, Bob Carpenter notifications@github.com wrote:

Not for fat tails, but other than that slice sampling is pretty generic in 1D, what do you mean by constrained distributions (?)

avehtari commented 8 years ago

1D slice sampling is one of the simplest and most robust algorithms. It would be useful also 1) to make it easy to sample from a user defined (log) density function, without need to implement CDF, and 2) to check new 1D rng implementations.

Inverse CDF would be otherwise better choice. Some truncated distributions (e.g. truncated normal) have special algorithms which are more efficient,

sakrejda commented 8 years ago

On Dec 6, 2015 4:47 AM, "Aki Vehtari" notifications@github.com wrote:

1D slice sampling is one of the simplest and most robust algorithms. It would be useful also 1) to make it easy to sample from a user defined (log) density function, without need to implement CDF, and 2) to check new 1D rng implementations.

I could add Neal's step-out generic slice sampler pretty easily, but it still has two ways of failing: 1) for user defined log densities I don't know how to calculate a step size generically without adaptation and I'm concerned about having a nested adaptation that can fail even if the rest of the program runs fine. I imagine there's something we could do with the curvature from ad to get into the ballpark, but I haven't thought it through. 2) for multi-modal densities, say a user defined mixture, it can also get stuck in a single mode. I've implemented a slice sampler for specific densities that used the companion matrix to find all the modes and include them in the step-out procedure but again I wonder if we couldn't just use the ad with some different starting points to find all the modes.

So implementing is relatively simple but we should decide if we want 1 and 2 in there as well.

Inverse CDF would be otherwise better choice. Some truncated distributions (e.g. truncated normal) have special algorithms which are more efficient,

— Reply to this email directly or view it on GitHub.

betanalpha commented 8 years ago

If…if only we had some method for efficiently sampling from complex distributions.

Seriously, though, the problem with a slice sampler is that it’s a Markov chain which means there’s no guarantee of exact samples unlike our current RNGs. Without being able to guarantee the existence of RNGs I think our best bet is a smart rejection sampler. In particular, knowledge of the transformed density along with the Jacobian should be enough to admit a good guess at an initial envelope. Hell, we could even do an adaptive rejection sampler on the constrained space directly.

On Dec 6, 2015, at 2:50 PM, Krzysztof Sakrejda notifications@github.com wrote:

On Dec 6, 2015 4:47 AM, "Aki Vehtari" notifications@github.com wrote:

1D slice sampling is one of the simplest and most robust algorithms. It would be useful also 1) to make it easy to sample from a user defined (log) density function, without need to implement CDF, and 2) to check new 1D rng implementations.

I could add Neal's step-out generic slice sampler pretty easily, but it still has two ways of failing: 1) for user defined log densities I don't know how to calculate a step size generically without adaptation and I'm concerned about having a nested adaptation that can fail even if the rest of the program runs fine. I imagine there's something we could do with the curvature from ad to get into the ballpark, but I haven't thought it through. 2) for multi-modal densities, say a user defined mixture, it can also get stuck in a single mode. I've implemented a slice sampler for specific densities that used the companion matrix to find all the modes and include them in the step-out procedure but again I wonder if we couldn't just use the ad with some different starting points to find all the modes.

So implementing is relatively simple but we should decide if we want 1 and 2 in there as well.

Inverse CDF would be otherwise better choice. Some truncated distributions (e.g. truncated normal) have special algorithms which are more efficient,

— Reply to this email directly or view it on GitHub. — Reply to this email directly or view it on GitHub.

sakrejda commented 8 years ago

On Dec 6, 2015 10:34 AM, "Michael Betancourt" notifications@github.com wrote:

If…if only we had some method for efficiently sampling from complex distributions.

Sure, that's where we're at with high dimensional distributions, but for 1d distributions the set of problems is much smaller.

Seriously, though, the problem with a slice sampler is that it’s a Markov chain which means there’s no guarantee of exact samples unlike our current RNGs.

The Markov chain issue only becomes relevant with far trails, is a rejection sampler going to solve that somehow?

Without being

able to guarantee the existence of RNGs I think our best bet is a smart rejection sampler. In particular, knowledge of the transformed density along with the Jacobian should be enough to admit a good guess at an initial envelope.

This is the same thing you need for the slice sampler to work well.

Hell, we could even do an adaptive rejection sampler on the constrained space directly.

On Dec 6, 2015, at 2:50 PM, Krzysztof Sakrejda notifications@github.com wrote:

On Dec 6, 2015 4:47 AM, "Aki Vehtari" notifications@github.com wrote:

1D slice sampling is one of the simplest and most robust algorithms. It would be useful also 1) to make it easy to sample from a user defined (log) density function, without need to implement CDF, and 2) to check new 1D rng implementations.

I could add Neal's step-out generic slice sampler pretty easily, but it still has two ways of failing: 1) for user defined log densities I don't know how to calculate a step size generically without adaptation and I'm concerned about having a nested adaptation that can fail even if the rest of the program runs fine. I imagine there's something we could do with the curvature from ad to get into the ballpark, but I haven't thought it through. 2) for multi-modal densities, say a user defined mixture, it can also get stuck in a single mode. I've implemented a slice sampler for specific densities that used the companion matrix to find all the modes and include them in the step-out procedure but again I wonder if we couldn't just use the ad with some different starting points to find all the modes.

So implementing is relatively simple but we should decide if we want 1 and 2 in there as well.

Inverse CDF would be otherwise better choice. Some truncated distributions (e.g. truncated normal) have special algorithms which are more efficient,

— Reply to this email directly or view it on GitHub. — Reply to this email directly or view it on GitHub.

— Reply to this email directly or view it on GitHub.

betanalpha commented 8 years ago

Slice sampler: Markov chain (Adaptive) rejection sampler: exact samples

In some sense the rejection envelope is similar to an importance distribution or a proposal distribution, but it gives exact samples and is ever so slightly easier to tune.

On Dec 6, 2015, at 4:07 PM, Krzysztof Sakrejda notifications@github.com wrote:

On Dec 6, 2015 10:34 AM, "Michael Betancourt" notifications@github.com wrote:

If…if only we had some method for efficiently sampling from complex distributions.

Sure, that's where we're at with high dimensional distributions, but for 1d distributions the set of problems is much smaller.

Seriously, though, the problem with a slice sampler is that it’s a Markov chain which means there’s no guarantee of exact samples unlike our current RNGs.

The Markov chain issue only becomes relevant with far trails, is a rejection sampler going to solve that somehow?

Without being

able to guarantee the existence of RNGs I think our best bet is a smart rejection sampler. In particular, knowledge of the transformed density along with the Jacobian should be enough to admit a good guess at an initial envelope.

This is the same thing you need for the slice sampler to work well.

Hell, we could even do an adaptive rejection sampler on the constrained space directly.

On Dec 6, 2015, at 2:50 PM, Krzysztof Sakrejda notifications@github.com wrote:

On Dec 6, 2015 4:47 AM, "Aki Vehtari" notifications@github.com wrote:

1D slice sampling is one of the simplest and most robust algorithms. It would be useful also 1) to make it easy to sample from a user defined (log) density function, without need to implement CDF, and 2) to check new 1D rng implementations.

I could add Neal's step-out generic slice sampler pretty easily, but it still has two ways of failing: 1) for user defined log densities I don't know how to calculate a step size generically without adaptation and I'm concerned about having a nested adaptation that can fail even if the rest of the program runs fine. I imagine there's something we could do with the curvature from ad to get into the ballpark, but I haven't thought it through. 2) for multi-modal densities, say a user defined mixture, it can also get stuck in a single mode. I've implemented a slice sampler for specific densities that used the companion matrix to find all the modes and include them in the step-out procedure but again I wonder if we couldn't just use the ad with some different starting points to find all the modes.

So implementing is relatively simple but we should decide if we want 1 and 2 in there as well.

Inverse CDF would be otherwise better choice. Some truncated distributions (e.g. truncated normal) have special algorithms which are more efficient,

— Reply to this email directly or view it on GitHub. — Reply to this email directly or view it on GitHub.

— Reply to this email directly or view it on GitHub. — Reply to this email directly or view it on GitHub.

sakrejda commented 8 years ago

On Dec 6, 2015 11:15 AM, "Michael Betancourt" notifications@github.com wrote:

Slice sampler: Markov chain

Slice sampler on U(0,1): exact samples Slice sampler on Cauchy(0,1): junk

I think there's enough room in between those two extremes to make it useful but if there's an automatic way to generate a good generic envelope for rejection sampling i can see how that would be preferable.

(Adaptive) rejection sampler: exact samples

In some sense the rejection envelope is similar to an importance distribution or a proposal distribution, but it gives exact samples and is ever so slightly easier to tune.

On Dec 6, 2015, at 4:07 PM, Krzysztof Sakrejda notifications@github.com wrote:

On Dec 6, 2015 10:34 AM, "Michael Betancourt" notifications@github.com wrote:

If…if only we had some method for efficiently sampling from complex distributions.

Sure, that's where we're at with high dimensional distributions, but for 1d distributions the set of problems is much smaller.

Seriously, though, the problem with a slice sampler is that it’s a Markov chain which means there’s no guarantee of exact samples unlike our current RNGs.

The Markov chain issue only becomes relevant with far trails, is a rejection sampler going to solve that somehow?

Without being

able to guarantee the existence of RNGs I think our best bet is a smart rejection sampler. In particular, knowledge of the transformed density along with the Jacobian should be enough to admit a good guess at an initial envelope.

This is the same thing you need for the slice sampler to work well.

Hell, we could even do an adaptive rejection sampler on the constrained space directly.

On Dec 6, 2015, at 2:50 PM, Krzysztof Sakrejda < notifications@github.com> wrote:

On Dec 6, 2015 4:47 AM, "Aki Vehtari" notifications@github.com wrote:

1D slice sampling is one of the simplest and most robust algorithms. It would be useful also 1) to make it easy to sample from a user defined (log) density function, without need to implement CDF, and 2) to check new 1D rng implementations.

I could add Neal's step-out generic slice sampler pretty easily, but it still has two ways of failing: 1) for user defined log densities I don't know how to calculate a step size generically without adaptation and I'm concerned about having a nested adaptation that can fail even if the rest of the program runs fine. I imagine there's something we could do with the curvature from ad to get into the ballpark, but I haven't thought it through. 2) for multi-modal densities, say a user defined mixture, it can also get stuck in a single mode. I've implemented a slice sampler for specific densities that used the companion matrix to find all the modes and include them in the step-out procedure but again I wonder if we couldn't just use the ad with some different starting points to find all the modes.

So implementing is relatively simple but we should decide if we want 1 and 2 in there as well.

Inverse CDF would be otherwise better choice. Some truncated distributions (e.g. truncated normal) have special algorithms which are more efficient,

— Reply to this email directly or view it on GitHub. — Reply to this email directly or view it on GitHub.

— Reply to this email directly or view it on GitHub. — Reply to this email directly or view it on GitHub.

— Reply to this email directly or view it on GitHub.

avehtari commented 8 years ago

Of course it would be nice to get independent samples, and it would be good to implement known efficient algorithms for specific distributions (e.g. truncated normal) and use inverse CDF. If the focus is in the built-in distributions, then I guess there is no need for generic adaptive rejection sampling or slice sampling.

bob-carpenter commented 8 years ago

I think I made an issue out of this prematurely.

The advantage of the truncated PRNGs over just throwing things into the model is that they don't interact with the rest of sampling and they can interact with the rest of generated quantities.

nschiett commented 6 years ago

Hi! I realize this issue is closed, but is there an example available on how to finally do truncated normal rngs?

bgoodri commented 6 years ago

https://discourse.mc-stan.org/t/rng-for-truncated-distributions/3122/7?u=bgoodri

On Wed, Sep 5, 2018 at 9:28 AM nschiett notifications@github.com wrote:

Hi! I realize this issue is closed, but is there an example available on how to finally do truncated normal rngs?

— You are receiving this because you are subscribed to this thread. Reply to this email directly, view it on GitHub https://github.com/stan-dev/math/issues/214#issuecomment-418730313, or mute the thread https://github.com/notifications/unsubscribe-auth/ADOrqkp23lggqeuWnczZN-tllqqEsrNRks5uX9GBgaJpZM4GvkT- .

nschiett commented 6 years ago

Thanks! Is it then correct, that a double truncated normal rng function would look like this?

real normal_lb_ub_rng(real mu, real sigma, real lb, real ub) {
    real p1 = normal_cdf(lb, mu, sigma);  // cdf with lower bound
    real p2 = normal_cdf(ub, mu, sigma);  // cdf with upper bound
    real u = uniform_rng(p1, p2);
    return (sigma * inv_Phi(u)) + mu;  // inverse cdf 
}
bgoodri commented 6 years ago

Yes, although you should expose it to R via rstan::expose_stan_functions() to check that it works correctly.

On Wed, Sep 5, 2018 at 10:18 AM nschiett notifications@github.com wrote:

Thanks! Is it then correct, that a double truncated normal rng function would look like this?

real normal_lb_ub_rng(real mu, real sigma, real lb, real ub) { real p1 = normal_cdf(lb, mu, sigma); // cdf with lower bound real p2 = normal_cdf(ub, mu, sigma); // cdf with upper bound real u = uniform_rng(p1, p2); return (sigma * inv_Phi(u)) + mu; // inverse cdf }

— You are receiving this because you commented. Reply to this email directly, view it on GitHub https://github.com/stan-dev/math/issues/214#issuecomment-418747619, or mute the thread https://github.com/notifications/unsubscribe-auth/ADOrqq1X30SExvEpWxSCenz1qrbL6Uw5ks5uX90zgaJpZM4GvkT- .

nschiett commented 6 years ago

Great, it seems to work just fine indeed. Thanks.