stan-dev / stan

Stan development repository. The master branch contains the current release. The develop branch contains the latest stable development. See the Developer Process Wiki for details.
https://mc-stan.org
BSD 3-Clause "New" or "Revised" License
2.57k stars 368 forks source link

Ensemble Samplers #362

Closed betanalpha closed 7 years ago

betanalpha commented 10 years ago

Admit generic ensemble sampling within Stan's sampler engine.

Allows running multiple chains internally that can communicate with each other, possibly useful during adaptation. Allows ensemble samplers such as DREAM.

betanalpha commented 10 years ago

What’s the architecture being considered here? It looks like you’re having each sampler fill the parallel chains individually which is a huge duplication of code and not a robust way to approach is statistically.

On Jul 7, 2014, at 7:43 PM, Peter Li notifications@github.com wrote:

Need to change sample to std::vector in ....

src/stan/agrad/rev/matrix/sd.hpp src/stan/agrad/rev/matrix/variance.hpp src/stan/command/print.cpp src/stan/common/command.hpp src/stan/common/init_adapt.hpp src/stan/common/init_nuts.hpp src/stan/common/init_static_hmc.hpp src/stan/common/init_windowed_adapt.hpp src/stan/common/run_markov_chain.hpp src/stan/common/sample.hpp src/stan/common/warmup.hpp src/stan/common/write_error_msg.hpp src/stan/common.hpp src/stan/gm/arguments/arg_method.hpp src/stan/gm/arguments/arg_num_samples.hpp src/stan/gm/arguments/arg_sample.hpp src/stan/gm/arguments/arg_sample_algo.hpp src/stan/gm/arguments/arg_save_warmup.hpp src/stan/gm/arguments/arg_thin.hpp src/stan/gm/ast.hpp src/stan/gm/ast_def.cpp src/stan/gm/generator.hpp src/stan/gm/grammars/statement_2_grammar_def.hpp src/stan/gm/grammars/statement_grammar.hpp src/stan/gm/grammars/statement_grammar_def.hpp src/stan/io/mcmc_writer.hpp src/stan/io/stan_csv_reader.hpp src/stan/math/functions/Phi.hpp src/stan/math/functions/Phi_approx.hpp src/stan/math/matrix/mean.hpp src/stan/math/matrix/sd.hpp src/stan/math/matrix/variance.hpp src/stan/mcmc/base_mcmc.hpp src/stan/mcmc/chains.hpp src/stan/mcmc/covar_adaptation.hpp src/stan/mcmc/fixed_param_sampler.hpp src/stan/mcmc/hmc/base_hmc.hpp src/stan/mcmc/hmc/hamiltonians/base_hamiltonian.hpp src/stan/mcmc/hmc/hamiltonians/dense_e_metric.hpp src/stan/mcmc/hmc/hamiltonians/diag_e_metric.hpp src/stan/mcmc/hmc/hamiltonians/unit_e_metric.hpp src/stan/mcmc/hmc/nuts/adapt_dense_e_nuts.hpp src/stan/mcmc/hmc/nuts/adapt_diag_e_nuts.hpp src/stan/mcmc/hmc/nuts/adapt_unit_e_nuts.hpp src/stan/mcmc/hmc/nuts/base_nuts.hpp src/stan/mcmc/hmc/static/adapt_dense_e_static_hmc.hpp src/stan/mcmc/hmc/static/adapt_diag_e_static_hmc.hpp src/stan/mcmc/hmc/static/adapt_unit_e_static_hmc.hpp src/stan/mcmc/hmc/static/base_static_hmc.hpp src/stan/mcmc/sample.hpp src/stan/mcmc/var_adaptation.hpp src/stan/mcmc.hpp src/stan/prob/distributions/multivariate/continuous/dirichlet.hpp src/stan/prob/distributions/univariate/continuous/beta.hpp src/stan/prob/distributions/univariate/discrete/beta_binomial.hpp src/stan/prob/welford_covar_estimator.hpp src/stan/prob/welford_var_estimator.hpp

src/test/test-models/syntax-only/function_signatures1.stan

src/test/unit/common/command_init_test.cpp src/test/unit/common/do_bfgs_optimize_test.cpp src/test/unit/common/run_markov_chain_test.cpp src/test/unit/common/sample_test.cpp src/test/unit/common/warmup_test.cpp src/test/unit/distribution/multivariate/continuous/dirichlet_test.cpp src/test/unit/gm/parser_test.cpp src/test/unit/io/mcmc_writer_test.cpp src/test/unit/io/stan_csv_reader_test.cpp src/test/unit/io/test_csv_files/blocker.0.csv src/test/unit/io/test_csv_files/blocker_nondiag.0.csv src/test/unit/io/test_csv_files/epil.0.csv src/test/unit/io/test_csv_files/metadata1.csv src/test/unit/io/test_csv_files/metadata2.csv src/test/unit/mcmc/chains_test.cpp src/test/unit/mcmc/hmc/base_hmc_test.cpp src/test/unit/mcmc/hmc/base_nuts_test.cpp src/test/unit/mcmc/hmc/base_static_hmc_test.cpp src/test/unit/mcmc/hmc/derived_nuts_test.cpp src/test/unit/mcmc/hmc/hamiltonians/dense_e_metric_test.cpp src/test/unit/mcmc/hmc/hamiltonians/diag_e_metric_test.cpp src/test/unit/mcmc/hmc/hamiltonians/unit_e_metric_test.cpp src/test/unit/mcmc/hmc/mock_hmc.hpp src/test/unit/mcmc/sample_test.cpp src/test/unit/mcmc/test_csv_files/blocker.1.csv src/test/unit/mcmc/test_csv_files/blocker.2.csv src/test/unit/mcmc/test_csv_files/epil.1.csv src/test/unit/mcmc/test_csv_files/epil.2.csv src/test/unit/prob/distributions/multivariate/continuous/inv_wishart_test.cpp src/test/unit/prob/distributions/multivariate/discrete/multinomial_test.cpp src/test/unit/prob/welford_covar_estimator_test.cpp src/test/unit/prob/welford_var_estimator_test.cpp — Reply to this email directly or view it on GitHub.

bob-carpenter commented 10 years ago

Peter and I just talked about this. I think the simplest thing to do for evaluation is to have each ensemble just return an average. Then it'll all fit into our current framework. The posterior mean estimates will be correct as will their standard errors, but posterior variance will be underestimated (by roughly 1/sqrt(ensemble size)), so we won't really be able to use it like that for inference of anything other than parameter means.

Then, we can run evaluations vs. the ensemble methods.

Then, if they're promising, we can figure out how to dump out all the samples for inference.

I don't see where Michael thinks there's going to be duplicated code.

I believe the outstanding issue is how to compute n_eff for the full set of ensemble states.

On Jul 7, 2014, at 3:41 PM, Michael Betancourt notifications@github.com wrote:

What’s the architecture being considered here? It looks like you’re having each sampler fill the parallel chains individually which is a huge duplication of code and not a robust way to approach is statistically.

On Jul 7, 2014, at 7:43 PM, Peter Li notifications@github.com wrote:

Need to change sample to std::vector in ....

src/stan/agrad/rev/matrix/sd.hpp src/stan/agrad/rev/matrix/variance.hpp src/stan/command/print.cpp src/stan/common/command.hpp src/stan/common/init_adapt.hpp src/stan/common/init_nuts.hpp src/stan/common/init_static_hmc.hpp src/stan/common/init_windowed_adapt.hpp src/stan/common/run_markov_chain.hpp src/stan/common/sample.hpp src/stan/common/warmup.hpp src/stan/common/write_error_msg.hpp src/stan/common.hpp src/stan/gm/arguments/arg_method.hpp src/stan/gm/arguments/arg_num_samples.hpp src/stan/gm/arguments/arg_sample.hpp src/stan/gm/arguments/arg_sample_algo.hpp src/stan/gm/arguments/arg_save_warmup.hpp src/stan/gm/arguments/arg_thin.hpp src/stan/gm/ast.hpp src/stan/gm/ast_def.cpp src/stan/gm/generator.hpp src/stan/gm/grammars/statement_2_grammar_def.hpp src/stan/gm/grammars/statement_grammar.hpp src/stan/gm/grammars/statement_grammar_def.hpp src/stan/io/mcmc_writer.hpp src/stan/io/stan_csv_reader.hpp src/stan/math/functions/Phi.hpp src/stan/math/functions/Phi_approx.hpp src/stan/math/matrix/mean.hpp src/stan/math/matrix/sd.hpp src/stan/math/matrix/variance.hpp src/stan/mcmc/base_mcmc.hpp src/stan/mcmc/chains.hpp src/stan/mcmc/covar_adaptation.hpp src/stan/mcmc/fixed_param_sampler.hpp src/stan/mcmc/hmc/base_hmc.hpp src/stan/mcmc/hmc/hamiltonians/base_hamiltonian.hpp src/stan/mcmc/hmc/hamiltonians/dense_e_metric.hpp src/stan/mcmc/hmc/hamiltonians/diag_e_metric.hpp src/stan/mcmc/hmc/hamiltonians/unit_e_metric.hpp src/stan/mcmc/hmc/nuts/adapt_dense_e_nuts.hpp src/stan/mcmc/hmc/nuts/adapt_diag_e_nuts.hpp src/stan/mcmc/hmc/nuts/adapt_unit_e_nuts.hpp src/stan/mcmc/hmc/nuts/base_nuts.hpp src/stan/mcmc/hmc/static/adapt_dense_e_static_hmc.hpp src/stan/mcmc/hmc/static/adapt_diag_e_static_hmc.hpp src/stan/mcmc/hmc/static/adapt_unit_e_static_hmc.hpp src/stan/mcmc/hmc/static/base_static_hmc.hpp src/stan/mcmc/sample.hpp src/stan/mcmc/var_adaptation.hpp src/stan/mcmc.hpp src/stan/prob/distributions/multivariate/continuous/dirichlet.hpp src/stan/prob/distributions/univariate/continuous/beta.hpp src/stan/prob/distributions/univariate/discrete/beta_binomial.hpp src/stan/prob/welford_covar_estimator.hpp src/stan/prob/welford_var_estimator.hpp

src/test/test-models/syntax-only/function_signatures1.stan

src/test/unit/common/command_init_test.cpp src/test/unit/common/do_bfgs_optimize_test.cpp src/test/unit/common/run_markov_chain_test.cpp src/test/unit/common/sample_test.cpp src/test/unit/common/warmup_test.cpp src/test/unit/distribution/multivariate/continuous/dirichlet_test.cpp src/test/unit/gm/parser_test.cpp src/test/unit/io/mcmc_writer_test.cpp src/test/unit/io/stan_csv_reader_test.cpp src/test/unit/io/test_csv_files/blocker.0.csv src/test/unit/io/test_csv_files/blocker_nondiag.0.csv src/test/unit/io/test_csv_files/epil.0.csv src/test/unit/io/test_csv_files/metadata1.csv src/test/unit/io/test_csv_files/metadata2.csv src/test/unit/mcmc/chains_test.cpp src/test/unit/mcmc/hmc/base_hmc_test.cpp src/test/unit/mcmc/hmc/base_nuts_test.cpp src/test/unit/mcmc/hmc/base_static_hmc_test.cpp src/test/unit/mcmc/hmc/derived_nuts_test.cpp src/test/unit/mcmc/hmc/hamiltonians/dense_e_metric_test.cpp src/test/unit/mcmc/hmc/hamiltonians/diag_e_metric_test.cpp src/test/unit/mcmc/hmc/hamiltonians/unit_e_metric_test.cpp src/test/unit/mcmc/hmc/mock_hmc.hpp src/test/unit/mcmc/sample_test.cpp src/test/unit/mcmc/test_csv_files/blocker.1.csv src/test/unit/mcmc/test_csv_files/blocker.2.csv src/test/unit/mcmc/test_csv_files/epil.1.csv src/test/unit/mcmc/test_csv_files/epil.2.csv src/test/unit/prob/distributions/multivariate/continuous/inv_wishart_test.cpp src/test/unit/prob/distributions/multivariate/discrete/multinomial_test.cpp src/test/unit/prob/welford_covar_estimator_test.cpp src/test/unit/prob/welford_var_estimator_test.cpp — Reply to this email directly or view it on GitHub.

— Reply to this email directly or view it on GitHub.

betanalpha commented 10 years ago

a) If we have interacting chains then there will no way of using all the samples — we’re stuck with ensemble expectations (of means, variances, and maybe some quantiles if we’re being generous).

b) I don’t get what having std::vector as an input does here. The individual samplers should be stateless from the user’s perspective, taking in one state at a time and updating it.

On Jul 7, 2014, at 8:53 PM, Bob Carpenter notifications@github.com wrote:

Peter and I just talked about this. I think the simplest thing to do for evaluation is to have each ensemble just return an average. Then it'll all fit into our current framework. The posterior mean estimates will be correct as will their standard errors, but posterior variance will be underestimated (by roughly 1/sqrt(ensemble size)), so we won't really be able to use it like that for inference of anything other than parameter means.

Then, we can run evaluations vs. the ensemble methods.

Then, if they're promising, we can figure out how to dump out all the samples for inference.

I don't see where Michael thinks there's going to be duplicated code.

I believe the outstanding issue is how to compute n_eff for the full set of ensemble states.

  • Bob

On Jul 7, 2014, at 3:41 PM, Michael Betancourt notifications@github.com wrote:

What’s the architecture being considered here? It looks like you’re having each sampler fill the parallel chains individually which is a huge duplication of code and not a robust way to approach is statistically.

On Jul 7, 2014, at 7:43 PM, Peter Li notifications@github.com wrote:

Need to change sample to std::vector in ....

src/stan/agrad/rev/matrix/sd.hpp src/stan/agrad/rev/matrix/variance.hpp src/stan/command/print.cpp src/stan/common/command.hpp src/stan/common/init_adapt.hpp src/stan/common/init_nuts.hpp src/stan/common/init_static_hmc.hpp src/stan/common/init_windowed_adapt.hpp src/stan/common/run_markov_chain.hpp src/stan/common/sample.hpp src/stan/common/warmup.hpp src/stan/common/write_error_msg.hpp src/stan/common.hpp src/stan/gm/arguments/arg_method.hpp src/stan/gm/arguments/arg_num_samples.hpp src/stan/gm/arguments/arg_sample.hpp src/stan/gm/arguments/arg_sample_algo.hpp src/stan/gm/arguments/arg_save_warmup.hpp src/stan/gm/arguments/arg_thin.hpp src/stan/gm/ast.hpp src/stan/gm/ast_def.cpp src/stan/gm/generator.hpp src/stan/gm/grammars/statement_2_grammar_def.hpp src/stan/gm/grammars/statement_grammar.hpp src/stan/gm/grammars/statement_grammar_def.hpp src/stan/io/mcmc_writer.hpp src/stan/io/stan_csv_reader.hpp src/stan/math/functions/Phi.hpp src/stan/math/functions/Phi_approx.hpp src/stan/math/matrix/mean.hpp src/stan/math/matrix/sd.hpp src/stan/math/matrix/variance.hpp src/stan/mcmc/base_mcmc.hpp src/stan/mcmc/chains.hpp src/stan/mcmc/covar_adaptation.hpp src/stan/mcmc/fixed_param_sampler.hpp src/stan/mcmc/hmc/base_hmc.hpp src/stan/mcmc/hmc/hamiltonians/base_hamiltonian.hpp src/stan/mcmc/hmc/hamiltonians/dense_e_metric.hpp src/stan/mcmc/hmc/hamiltonians/diag_e_metric.hpp src/stan/mcmc/hmc/hamiltonians/unit_e_metric.hpp src/stan/mcmc/hmc/nuts/adapt_dense_e_nuts.hpp src/stan/mcmc/hmc/nuts/adapt_diag_e_nuts.hpp src/stan/mcmc/hmc/nuts/adapt_unit_e_nuts.hpp src/stan/mcmc/hmc/nuts/base_nuts.hpp src/stan/mcmc/hmc/static/adapt_dense_e_static_hmc.hpp src/stan/mcmc/hmc/static/adapt_diag_e_static_hmc.hpp src/stan/mcmc/hmc/static/adapt_unit_e_static_hmc.hpp src/stan/mcmc/hmc/static/base_static_hmc.hpp src/stan/mcmc/sample.hpp src/stan/mcmc/var_adaptation.hpp src/stan/mcmc.hpp src/stan/prob/distributions/multivariate/continuous/dirichlet.hpp src/stan/prob/distributions/univariate/continuous/beta.hpp src/stan/prob/distributions/univariate/discrete/beta_binomial.hpp src/stan/prob/welford_covar_estimator.hpp src/stan/prob/welford_var_estimator.hpp

src/test/test-models/syntax-only/function_signatures1.stan

src/test/unit/common/command_init_test.cpp src/test/unit/common/do_bfgs_optimize_test.cpp src/test/unit/common/run_markov_chain_test.cpp src/test/unit/common/sample_test.cpp src/test/unit/common/warmup_test.cpp src/test/unit/distribution/multivariate/continuous/dirichlet_test.cpp src/test/unit/gm/parser_test.cpp src/test/unit/io/mcmc_writer_test.cpp src/test/unit/io/stan_csv_reader_test.cpp src/test/unit/io/test_csv_files/blocker.0.csv src/test/unit/io/test_csv_files/blocker_nondiag.0.csv src/test/unit/io/test_csv_files/epil.0.csv src/test/unit/io/test_csv_files/metadata1.csv src/test/unit/io/test_csv_files/metadata2.csv src/test/unit/mcmc/chains_test.cpp src/test/unit/mcmc/hmc/base_hmc_test.cpp src/test/unit/mcmc/hmc/base_nuts_test.cpp src/test/unit/mcmc/hmc/base_static_hmc_test.cpp src/test/unit/mcmc/hmc/derived_nuts_test.cpp src/test/unit/mcmc/hmc/hamiltonians/dense_e_metric_test.cpp src/test/unit/mcmc/hmc/hamiltonians/diag_e_metric_test.cpp src/test/unit/mcmc/hmc/hamiltonians/unit_e_metric_test.cpp src/test/unit/mcmc/hmc/mock_hmc.hpp src/test/unit/mcmc/sample_test.cpp src/test/unit/mcmc/test_csv_files/blocker.1.csv src/test/unit/mcmc/test_csv_files/blocker.2.csv src/test/unit/mcmc/test_csv_files/epil.1.csv src/test/unit/mcmc/test_csv_files/epil.2.csv src/test/unit/prob/distributions/multivariate/continuous/inv_wishart_test.cpp src/test/unit/prob/distributions/multivariate/discrete/multinomial_test.cpp src/test/unit/prob/welford_covar_estimator_test.cpp src/test/unit/prob/welford_var_estimator_test.cpp — Reply to this email directly or view it on GitHub.

— Reply to this email directly or view it on GitHub.

— Reply to this email directly or view it on GitHub.

bob-carpenter commented 10 years ago

On Jul 7, 2014, at 4:43 PM, Michael Betancourt notifications@github.com wrote:

a) If we have interacting chains then there will no way of using all the samples — we’re stuck with ensemble expectations (of means, variances, and maybe some quantiles if we’re being generous).

Just to be clear on terminology here, each Markov chain is composed of a number of walkers (in the Goodman/Weare approach). Let's call the walkers "ensemble members". Andrew, of course, wants to run multiple chains.

The plan is to average the walkers (ensemble members) in each chain. Then we have multiple chains, just like before. In particular, their posterior means and std errors should be right. But the variance will be underestimated if you try to use them for inference beyond posterior mean estimation for parameters.

What I don't see is why we can't just use all the walker states in inference. I thought the only problem is that we don't know how to compute n_eff.

b) I don’t get what having std::vector as an input does here. The individual samplers should be stateless from the user’s perspective, taking in one state at a time and updating it.

The plan now is NOT to change any of the interfaces, so maybe we can hold off on this discussion until after the evaluation based on averaging walkers within a chain.

On Jul 7, 2014, at 8:53 PM, Bob Carpenter notifications@github.com wrote:

Peter and I just talked about this. I think the simplest thing to do for evaluation is to have each ensemble just return an average. Then it'll all fit into our current framework. The posterior mean estimates will be correct as will their standard errors, but posterior variance will be underestimated (by roughly 1/sqrt(ensemble size)), so we won't really be able to use it like that for inference of anything other than parameter means.

Then, we can run evaluations vs. the ensemble methods.

Then, if they're promising, we can figure out how to dump out all the samples for inference.

I don't see where Michael thinks there's going to be duplicated code.

I believe the outstanding issue is how to compute n_eff for the full set of ensemble states.

  • Bob

On Jul 7, 2014, at 3:41 PM, Michael Betancourt notifications@github.com wrote:

What’s the architecture being considered here? It looks like you’re having each sampler fill the parallel chains individually which is a huge duplication of code and not a robust way to approach is statistically.

On Jul 7, 2014, at 7:43 PM, Peter Li notifications@github.com wrote:

Need to change sample to std::vector in ....

src/stan/agrad/rev/matrix/sd.hpp src/stan/agrad/rev/matrix/variance.hpp src/stan/command/print.cpp src/stan/common/command.hpp src/stan/common/init_adapt.hpp src/stan/common/init_nuts.hpp src/stan/common/init_static_hmc.hpp src/stan/common/init_windowed_adapt.hpp src/stan/common/run_markov_chain.hpp src/stan/common/sample.hpp src/stan/common/warmup.hpp src/stan/common/write_error_msg.hpp src/stan/common.hpp src/stan/gm/arguments/arg_method.hpp src/stan/gm/arguments/arg_num_samples.hpp src/stan/gm/arguments/arg_sample.hpp src/stan/gm/arguments/arg_sample_algo.hpp src/stan/gm/arguments/arg_save_warmup.hpp src/stan/gm/arguments/arg_thin.hpp src/stan/gm/ast.hpp src/stan/gm/ast_def.cpp src/stan/gm/generator.hpp src/stan/gm/grammars/statement_2_grammar_def.hpp src/stan/gm/grammars/statement_grammar.hpp src/stan/gm/grammars/statement_grammar_def.hpp src/stan/io/mcmc_writer.hpp src/stan/io/stan_csv_reader.hpp src/stan/math/functions/Phi.hpp src/stan/math/functions/Phi_approx.hpp src/stan/math/matrix/mean.hpp src/stan/math/matrix/sd.hpp src/stan/math/matrix/variance.hpp src/stan/mcmc/base_mcmc.hpp src/stan/mcmc/chains.hpp src/stan/mcmc/covar_adaptation.hpp src/stan/mcmc/fixed_param_sampler.hpp src/stan/mcmc/hmc/base_hmc.hpp src/stan/mcmc/hmc/hamiltonians/base_hamiltonian.hpp src/stan/mcmc/hmc/hamiltonians/dense_e_metric.hpp src/stan/mcmc/hmc/hamiltonians/diag_e_metric.hpp src/stan/mcmc/hmc/hamiltonians/unit_e_metric.hpp src/stan/mcmc/hmc/nuts/adapt_dense_e_nuts.hpp src/stan/mcmc/hmc/nuts/adapt_diag_e_nuts.hpp src/stan/mcmc/hmc/nuts/adapt_unit_e_nuts.hpp src/stan/mcmc/hmc/nuts/base_nuts.hpp src/stan/mcmc/hmc/static/adapt_dense_e_static_hmc.hpp src/stan/mcmc/hmc/static/adapt_diag_e_static_hmc.hpp src/stan/mcmc/hmc/static/adapt_unit_e_static_hmc.hpp src/stan/mcmc/hmc/static/base_static_hmc.hpp src/stan/mcmc/sample.hpp src/stan/mcmc/var_adaptation.hpp src/stan/mcmc.hpp src/stan/prob/distributions/multivariate/continuous/dirichlet.hpp src/stan/prob/distributions/univariate/continuous/beta.hpp src/stan/prob/distributions/univariate/discrete/beta_binomial.hpp src/stan/prob/welford_covar_estimator.hpp src/stan/prob/welford_var_estimator.hpp

src/test/test-models/syntax-only/function_signatures1.stan

src/test/unit/common/command_init_test.cpp src/test/unit/common/do_bfgs_optimize_test.cpp src/test/unit/common/run_markov_chain_test.cpp src/test/unit/common/sample_test.cpp src/test/unit/common/warmup_test.cpp src/test/unit/distribution/multivariate/continuous/dirichlet_test.cpp src/test/unit/gm/parser_test.cpp src/test/unit/io/mcmc_writer_test.cpp src/test/unit/io/stan_csv_reader_test.cpp src/test/unit/io/test_csv_files/blocker.0.csv src/test/unit/io/test_csv_files/blocker_nondiag.0.csv src/test/unit/io/test_csv_files/epil.0.csv src/test/unit/io/test_csv_files/metadata1.csv src/test/unit/io/test_csv_files/metadata2.csv src/test/unit/mcmc/chains_test.cpp src/test/unit/mcmc/hmc/base_hmc_test.cpp src/test/unit/mcmc/hmc/base_nuts_test.cpp src/test/unit/mcmc/hmc/base_static_hmc_test.cpp src/test/unit/mcmc/hmc/derived_nuts_test.cpp src/test/unit/mcmc/hmc/hamiltonians/dense_e_metric_test.cpp src/test/unit/mcmc/hmc/hamiltonians/diag_e_metric_test.cpp src/test/unit/mcmc/hmc/hamiltonians/unit_e_metric_test.cpp src/test/unit/mcmc/hmc/mock_hmc.hpp src/test/unit/mcmc/sample_test.cpp src/test/unit/mcmc/test_csv_files/blocker.1.csv src/test/unit/mcmc/test_csv_files/blocker.2.csv src/test/unit/mcmc/test_csv_files/epil.1.csv src/test/unit/mcmc/test_csv_files/epil.2.csv src/test/unit/prob/distributions/multivariate/continuous/inv_wishart_test.cpp src/test/unit/prob/distributions/multivariate/discrete/multinomial_test.cpp src/test/unit/prob/welford_covar_estimator_test.cpp src/test/unit/prob/welford_var_estimator_test.cpp — Reply to this email directly or view it on GitHub.

— Reply to this email directly or view it on GitHub.

— Reply to this email directly or view it on GitHub.

— Reply to this email directly or view it on GitHub.

betanalpha commented 10 years ago

If the chains are interacting then you create a single state at each time point by averaging the chains, which you can then use to compute expectations. You do not take expectations of the chains and then average those (again, the problem is the interactions).

The problem is defining a chain that targets the right invariant distribution, and trying to use all the samples does not do that.

If you’re not changing the interfaces then you be more clear on what this the checklist is for?

On Jul 7, 2014, at 10:08 PM, Bob Carpenter notifications@github.com wrote:

On Jul 7, 2014, at 4:43 PM, Michael Betancourt notifications@github.com wrote:

a) If we have interacting chains then there will no way of using all the samples — we’re stuck with ensemble expectations (of means, variances, and maybe some quantiles if we’re being generous).

Just to be clear on terminology here, each Markov chain is composed of a number of walkers (in the Goodman/Weare approach). Let's call the walkers "ensemble members". Andrew, of course, wants to run multiple chains.

The plan is to average the walkers (ensemble members) in each chain. Then we have multiple chains, just like before. In particular, their posterior means and std errors should be right. But the variance will be underestimated if you try to use them for inference beyond posterior mean estimation for parameters.

What I don't see is why we can't just use all the walker states in inference. I thought the only problem is that we don't know how to compute n_eff.

b) I don’t get what having std::vector as an input does here. The individual samplers should be stateless from the user’s perspective, taking in one state at a time and updating it.

The plan now is NOT to change any of the interfaces, so maybe we can hold off on this discussion until after the evaluation based on averaging walkers within a chain.

  • Bob

On Jul 7, 2014, at 8:53 PM, Bob Carpenter notifications@github.com wrote:

Peter and I just talked about this. I think the simplest thing to do for evaluation is to have each ensemble just return an average. Then it'll all fit into our current framework. The posterior mean estimates will be correct as will their standard errors, but posterior variance will be underestimated (by roughly 1/sqrt(ensemble size)), so we won't really be able to use it like that for inference of anything other than parameter means.

Then, we can run evaluations vs. the ensemble methods.

Then, if they're promising, we can figure out how to dump out all the samples for inference.

I don't see where Michael thinks there's going to be duplicated code.

I believe the outstanding issue is how to compute n_eff for the full set of ensemble states.

  • Bob

On Jul 7, 2014, at 3:41 PM, Michael Betancourt notifications@github.com wrote:

What’s the architecture being considered here? It looks like you’re having each sampler fill the parallel chains individually which is a huge duplication of code and not a robust way to approach is statistically.

On Jul 7, 2014, at 7:43 PM, Peter Li notifications@github.com wrote:

Need to change sample to std::vector in ....

src/stan/agrad/rev/matrix/sd.hpp src/stan/agrad/rev/matrix/variance.hpp src/stan/command/print.cpp src/stan/common/command.hpp src/stan/common/init_adapt.hpp src/stan/common/init_nuts.hpp src/stan/common/init_static_hmc.hpp src/stan/common/init_windowed_adapt.hpp src/stan/common/run_markov_chain.hpp src/stan/common/sample.hpp src/stan/common/warmup.hpp src/stan/common/write_error_msg.hpp src/stan/common.hpp src/stan/gm/arguments/arg_method.hpp src/stan/gm/arguments/arg_num_samples.hpp src/stan/gm/arguments/arg_sample.hpp src/stan/gm/arguments/arg_sample_algo.hpp src/stan/gm/arguments/arg_save_warmup.hpp src/stan/gm/arguments/arg_thin.hpp src/stan/gm/ast.hpp src/stan/gm/ast_def.cpp src/stan/gm/generator.hpp src/stan/gm/grammars/statement_2_grammar_def.hpp src/stan/gm/grammars/statement_grammar.hpp src/stan/gm/grammars/statement_grammar_def.hpp src/stan/io/mcmc_writer.hpp src/stan/io/stan_csv_reader.hpp src/stan/math/functions/Phi.hpp src/stan/math/functions/Phi_approx.hpp src/stan/math/matrix/mean.hpp src/stan/math/matrix/sd.hpp src/stan/math/matrix/variance.hpp src/stan/mcmc/base_mcmc.hpp src/stan/mcmc/chains.hpp src/stan/mcmc/covar_adaptation.hpp src/stan/mcmc/fixed_param_sampler.hpp src/stan/mcmc/hmc/base_hmc.hpp src/stan/mcmc/hmc/hamiltonians/base_hamiltonian.hpp src/stan/mcmc/hmc/hamiltonians/dense_e_metric.hpp src/stan/mcmc/hmc/hamiltonians/diag_e_metric.hpp src/stan/mcmc/hmc/hamiltonians/unit_e_metric.hpp src/stan/mcmc/hmc/nuts/adapt_dense_e_nuts.hpp src/stan/mcmc/hmc/nuts/adapt_diag_e_nuts.hpp src/stan/mcmc/hmc/nuts/adapt_unit_e_nuts.hpp src/stan/mcmc/hmc/nuts/base_nuts.hpp src/stan/mcmc/hmc/static/adapt_dense_e_static_hmc.hpp src/stan/mcmc/hmc/static/adapt_diag_e_static_hmc.hpp src/stan/mcmc/hmc/static/adapt_unit_e_static_hmc.hpp src/stan/mcmc/hmc/static/base_static_hmc.hpp src/stan/mcmc/sample.hpp src/stan/mcmc/var_adaptation.hpp src/stan/mcmc.hpp src/stan/prob/distributions/multivariate/continuous/dirichlet.hpp src/stan/prob/distributions/univariate/continuous/beta.hpp src/stan/prob/distributions/univariate/discrete/beta_binomial.hpp src/stan/prob/welford_covar_estimator.hpp src/stan/prob/welford_var_estimator.hpp

src/test/test-models/syntax-only/function_signatures1.stan

src/test/unit/common/command_init_test.cpp src/test/unit/common/do_bfgs_optimize_test.cpp src/test/unit/common/run_markov_chain_test.cpp src/test/unit/common/sample_test.cpp src/test/unit/common/warmup_test.cpp src/test/unit/distribution/multivariate/continuous/dirichlet_test.cpp src/test/unit/gm/parser_test.cpp src/test/unit/io/mcmc_writer_test.cpp src/test/unit/io/stan_csv_reader_test.cpp src/test/unit/io/test_csv_files/blocker.0.csv src/test/unit/io/test_csv_files/blocker_nondiag.0.csv src/test/unit/io/test_csv_files/epil.0.csv src/test/unit/io/test_csv_files/metadata1.csv src/test/unit/io/test_csv_files/metadata2.csv src/test/unit/mcmc/chains_test.cpp src/test/unit/mcmc/hmc/base_hmc_test.cpp src/test/unit/mcmc/hmc/base_nuts_test.cpp src/test/unit/mcmc/hmc/base_static_hmc_test.cpp src/test/unit/mcmc/hmc/derived_nuts_test.cpp src/test/unit/mcmc/hmc/hamiltonians/dense_e_metric_test.cpp src/test/unit/mcmc/hmc/hamiltonians/diag_e_metric_test.cpp src/test/unit/mcmc/hmc/hamiltonians/unit_e_metric_test.cpp src/test/unit/mcmc/hmc/mock_hmc.hpp src/test/unit/mcmc/sample_test.cpp src/test/unit/mcmc/test_csv_files/blocker.1.csv src/test/unit/mcmc/test_csv_files/blocker.2.csv src/test/unit/mcmc/test_csv_files/epil.1.csv src/test/unit/mcmc/test_csv_files/epil.2.csv src/test/unit/prob/distributions/multivariate/continuous/inv_wishart_test.cpp src/test/unit/prob/distributions/multivariate/discrete/multinomial_test.cpp src/test/unit/prob/welford_covar_estimator_test.cpp src/test/unit/prob/welford_var_estimator_test.cpp — Reply to this email directly or view it on GitHub.

— Reply to this email directly or view it on GitHub.

— Reply to this email directly or view it on GitHub.

— Reply to this email directly or view it on GitHub.

— Reply to this email directly or view it on GitHub.

bob-carpenter commented 10 years ago
  1. I'm not sure what the checklist is for. Peter? We decided that the interfaces should NOT change.
  2. I'm still confused on Michael's use of "chain". There are multiple chains, each composed of an ensemble of walkers.
  3. The plan is to average the ensemble members on a per-iteration basis within a chain to produce a new "chain." That's what we'll use to compute posterior means and std errors.
  4. You get the same expectation no matter how you break the chains of ensemble members up for averaging, right? That is, if if x1 and x2 are vectors of length N, then

    avg(avg(x1), avg(x2)) == avg(concatenate(x1,x2))

  5. I still don't see why we wouldn't do inference over all of the ensemble members. Each of them has a stationary distribution equal to the posterior.
betanalpha commented 10 years ago
  1. I'm still confused on Michael's use of "chain". There are multiple chains, each composed of an ensemble of walkers.

Right — I’m not sure why we’re talking about multiple chains in the first place. Let’s agree that we have one multiple-state chain, where each state represents a walker.

  1. The plan is to average the ensemble members on a per-iteration basis within a chain to produce a new "chain." That's what we'll use to compute posterior means and std errors.

That is correct.

  1. You get the same expectation no matter how you break the chains of ensemble members up for averaging, right? That is, if if x1 and x2 are vectors of length N, then

avg(avg(x1), avg(x2)) == avg(concatenate(x1,x2))

Only if the expectation you’re taking is the actual state and not some nonlinear transformation. This is why the variance is tricky to calculate (but can be done).

  1. I still don't see why we wouldn't do inference over all of the ensemble members. Each of them has a stationary distribution equal to the posterior.

If the walkers interact then the individual chains do not necessarily have the same stationary distribution.

bob-carpenter commented 10 years ago

On Jul 7, 2014, at 6:04 PM, Michael Betancourt notifications@github.com wrote:

  1. I'm still confused on Michael's use of "chain". There are multiple chains, each composed of an ensemble of walkers.

Right — I’m not sure why we’re talking about multiple chains in the first place. Let’s agree that we have one multiple-state chain, where each state represents a walker.

Andrew insists on doing everything with multiple chains.

And it's also just to clarify what we're talking about in the interfaces.

  1. The plan is to average the ensemble members on a per-iteration basis within a chain to produce a new "chain." That's what we'll use to compute posterior means and std errors.

That is correct.

  1. You get the same expectation no matter how you break the chains of ensemble members up for averaging, right? That is, if if x1 and x2 are vectors of length N, then

avg(avg(x1), avg(x2)) == avg(concatenate(x1,x2))

Only if the expectation you’re taking is the actual state and not some nonlinear transformation. This is why the variance is tricky to calculate (but can be done).

Good --- I thought my basic control of algebra was breaking down here. I get that f(avg(x)) != avg(f(x)) in general. I still have to draw little diagrams to figure out which way Jensen's inequality goes.

I realize there are problems with transforms. That's what I was getting at by saying we can look at the posterior means for parameters, but we couldn't use the output for any other inference (unless we declared the target of the inference as a parameter).

  1. I still don't see why we wouldn't do inference over all of the ensemble members. Each of them has a stationary distribution equal to the posterior.

If the walkers interact then the individual chains do not necessarily have the same stationary distribution.

In the Goodman and Weare approach, the individual walkers will each have a stationary distribution corresponding to the posterior.

I think that's the case for differential evolution, too, but I'm not 100% positive.

And doesn't that mean we can use them for posterior inference even if we're not sure how to compute n_eff for the general inferences?

One clunky approach would be to compute posterior sd using all the walkers, then combine per iteration within chains to compute n_eff for the expectations. What a mess!

— Reply to this email directly or view it on GitHub.

betanalpha commented 10 years ago

In the Goodman and Weare approach, the individual walkers will each have a stationary distribution corresponding to the posterior.

It’s tricky — if you randomly sample from the walkers then you’re fine and if you take just one walker than your fine, but the entire ensemble of walkers doesn’t have the right target distribution. In other words, the ensemble targets some joint distribution that is incorrect but has the right marginals, which is why it’s hard to compute things like the autocorrelation.

Also notice that Goodman and Weare are careful to do the expectation at each update.

And doesn't that mean we can use them for posterior inference even if we're not sure how to compute n_eff for the general inferences?

Technically, yes, but it’s dangerous to make estimates without some understanding of the variance. Especially when the autocorrelations are large and it’s easy to get deceived by false stationarity.

PeterLi2016 commented 10 years ago

Sorry. I should've been more specific. I thought the idea was to replace std::vector in calls like transition and mcmc_writer so that they work with ensembles. In any case, what I did was just replace such calls so everything still works the same for HMC and all that (those just use a vector of size 1) where as the Ensemble sampler will use vectors of not size 1. I'm also modifying the code so when it does all the writer stuff it takes the average of the vector of samples so again in the case of HMC it would be the same as before but then for the ensemble samplers we would get the behavior that Bob described above.

As for the expectations, I was going to take the average at each iteration.

betanalpha commented 10 years ago

Sorry. I should've been more specific. I thought the idea was to replace std::vector in calls like transition and mcmc_writer so that they work with ensembles. In any case, what I did was just replace such calls so everything still works the same for HMC and all that (those just use a vector of size 1) where as the Ensemble sampler will use vectors of not size 1. I'm also modifying the code so when it does all the writer stuff it takes the average of the vector of samples so again in the case of HMC it would be the same as before but then for the ensemble samplers we would get the behavior that Bob described above

That's what I thought and I’m not a huge fan. I’d rather see the samplers remain stateless (at least from the user’s perspective) so that an ensemble sampler would do something like

for (int i = 0; i < n_walker; ++i) sampler.transition(walkers[i])

Having the HMC samplers admit vector arguments just seems to pollute the interface.

PeterLi2016 commented 10 years ago

What do you mean by stateless? Alright. I'll see if I can come up with a way around this. The only issue is that to transition one walker, it requires at least one of the other walkers but we won't know which one ahead of time (and for one of the variations it could potentially involve all of the other walkers).

betanalpha commented 10 years ago

I'm being careless when I say stateless: I mean that samplers have one, and only one, common interface: a transition function that maps a sample object to a new sample object. Of course the samplers will have internal states, like the momenta and step size for HMC, but the after configuration these will not be exposed.

If we're adding multiple chains then we'd want to have a vector of samples and then just loop,

for (int i = 0; i < samplers.size(); ++i) sampler.transition(samplers[i]);

Ensemble samplers are tricky because we have a multi-sample state. Even though we use only the average at each update but we keep all of the samples to generate the next transition. What I think this calls for is for ensemble samplers to define an ensemble_transition(vector& samples) method, being careful to distinguish from the single state samplers.

Basically the ensemble stuff shouldn't touch any of the existing samplers -- unless people have different opinions?

On Tue, Jul 8, 2014 at 3:11 PM, Peter Li notifications@github.com wrote:

What do you mean by stateless? Alright. I'll see if I can come up with a way around this. The only issue is that to transition one walker, it requires at least one of the other walkers but we won't know which one ahead of time (and for one of the variations it could potentially involve all of the other walkers).

— Reply to this email directly or view it on GitHub https://github.com/stan-dev/stan/issues/362#issuecomment-48341027.

PeterLi2016 commented 10 years ago

That would require adding more code to run_markov_chain and mcmc_writer specifically to deal with the ensembles, right? I think that should work then.

On Tue, Jul 8, 2014 at 10:33 AM, Michael Betancourt < notifications@github.com> wrote:

I'm being careless when I say stateless: I mean that samplers have one, and only one, common interface: a transition function that maps a sample object to a new sample object. Of course the samplers will have internal states, like the momenta and step size for HMC, but the after configuration these will not be exposed.

If we're adding multiple chains then we'd want to have a vector of samples and then just loop,

for (int i = 0; i < samplers.size(); ++i) sampler.transition(samplers[i]);

Ensemble samplers are tricky because we have a multi-sample state. Even though we use only the average at each update but we keep all of the samples to generate the next transition. What I think this calls for is for ensemble samplers to define an ensemble_transition(vector& samples) method, being careful to distinguish from the single state samplers.

Basically the ensemble stuff shouldn't touch any of the existing samplers -- unless people have different opinions?

On Tue, Jul 8, 2014 at 3:11 PM, Peter Li notifications@github.com wrote:

What do you mean by stateless? Alright. I'll see if I can come up with a way around this. The only issue is that to transition one walker, it requires at least one of the other walkers but we won't know which one ahead of time (and for one of the variations it could potentially involve all of the other walkers).

— Reply to this email directly or view it on GitHub https://github.com/stan-dev/stan/issues/362#issuecomment-48341027.

— Reply to this email directly or view it on GitHub https://github.com/stan-dev/stan/issues/362#issuecomment-48344233.

betanalpha commented 10 years ago

Yeah -- in particular mcmc_writer may not even be appropriate. Hmmm.

I think the easiest thing to do is this: ensemble_sampler holds its states internally, and it implements the method

sample transition(sample& s)

which ignores s and generates a new sample from its internal state alone. The output is the ensemble average.

I think then we would be able to use the existing run_markov_chain and mcmc_writer as is (and I think the resulting n_eff would be an okay estimate).

I think if we really wanted to be careful we'd have to have some template sample that could be scalar or vector and then implement default vectorization for the samplers, etc, which would be a ton of work.

On Tue, Jul 8, 2014 at 3:45 PM, Peter Li notifications@github.com wrote:

That would require adding more code to run_markov_chain and mcmc_writer specifically to deal with the ensembles, right? I think that should work then.

On Tue, Jul 8, 2014 at 10:33 AM, Michael Betancourt < notifications@github.com> wrote:

I'm being careless when I say stateless: I mean that samplers have one, and only one, common interface: a transition function that maps a sample object to a new sample object. Of course the samplers will have internal states, like the momenta and step size for HMC, but the after configuration these will not be exposed.

If we're adding multiple chains then we'd want to have a vector of samples and then just loop,

for (int i = 0; i < samplers.size(); ++i) sampler.transition(samplers[i]);

Ensemble samplers are tricky because we have a multi-sample state. Even though we use only the average at each update but we keep all of the samples to generate the next transition. What I think this calls for is for ensemble samplers to define an ensemble_transition(vector& samples) method, being careful to distinguish from the single state samplers.

Basically the ensemble stuff shouldn't touch any of the existing samplers -- unless people have different opinions?

On Tue, Jul 8, 2014 at 3:11 PM, Peter Li notifications@github.com wrote:

What do you mean by stateless? Alright. I'll see if I can come up with a way around this. The only issue is that to transition one walker, it requires at least one of the other walkers but we won't know which one ahead of time (and for one of the variations it could potentially involve all of the other walkers).

— Reply to this email directly or view it on GitHub https://github.com/stan-dev/stan/issues/362#issuecomment-48341027.

— Reply to this email directly or view it on GitHub https://github.com/stan-dev/stan/issues/362#issuecomment-48344233.

— Reply to this email directly or view it on GitHub https://github.com/stan-dev/stan/issues/362#issuecomment-48346278.

PeterLi2016 commented 10 years ago

Oh, I see. That sounds like it should work.

PeterLi2016 commented 10 years ago

The only thing is, then how do we pass in the inits?

betanalpha commented 10 years ago

You'll have to define an initialization. For example, you could start each walker out at one initialization and perturb, or start each walker out at one initialization and run a few Random Walk Metropolis updates, or you could randomly initialize each walker individually.

The problem right now is that samplers and the samples are too tightly interwoven so any kind of ensemble architecture is going to be a hack unless we rewrite everything.

On Tue, Jul 8, 2014 at 3:58 PM, Peter Li notifications@github.com wrote:

The only thing is, then how do we pass in the inits?

— Reply to this email directly or view it on GitHub https://github.com/stan-dev/stan/issues/362#issuecomment-48348484.

bob-carpenter commented 10 years ago

I agree that the ensemble stuff shouldn't touch the existing samplers. If we need to, we can create a new interface.

I'm in favor of breaking apart interfaces where appropriate rather than trying to create so-called "god" interfaces that can do anything with enough config. See, e.g.,

http://en.wikipedia.org/wiki/God_object

I find these hard to deal with for all the reasons everyone in the software design literature has enumerated ad infinitum. Namely that they're not modular and any local change propagates through the whole interface.

Heh heh --- I learned a new term, too, "ravioli code". Or what I grew up as thinking of as unix-like.

On Jul 8, 2014, at 10:33 AM, Michael Betancourt notifications@github.com wrote:

I'm being careless when I say stateless: I mean that samplers have one, and only one, common interface: a transition function that maps a sample object to a new sample object. Of course the samplers will have internal states, like the momenta and step size for HMC, but the after configuration these will not be exposed.

If we're adding multiple chains then we'd want to have a vector of samples and then just loop,

for (int i = 0; i < samplers.size(); ++i) sampler.transition(samplers[i]);

Ensemble samplers are tricky because we have a multi-sample state. Even though we use only the average at each update but we keep all of the samples to generate the next transition. What I think this calls for is for ensemble samplers to define an ensemble_transition(vector& samples) method, being careful to distinguish from the single state samplers.

Basically the ensemble stuff shouldn't touch any of the existing samplers -- unless people have different opinions?

On Tue, Jul 8, 2014 at 3:11 PM, Peter Li notifications@github.com wrote:

What do you mean by stateless? Alright. I'll see if I can come up with a way around this. The only issue is that to transition one walker, it requires at least one of the other walkers but we won't know which one ahead of time (and for one of the variations it could potentially involve all of the other walkers).

— Reply to this email directly or view it on GitHub https://github.com/stan-dev/stan/issues/362#issuecomment-48341027.

— Reply to this email directly or view it on GitHub.

bob-carpenter commented 10 years ago

On Jul 8, 2014, at 11:01 AM, Michael Betancourt notifications@github.com wrote:

You'll have to define an initialization. For example, you could start each walker out at one initialization and perturb, or start each walker out at one initialization and run a few Random Walk Metropolis updates, or you could randomly initialize each walker individually.

The problem right now is that samplers and the samples are too tightly interwoven so any kind of ensemble architecture is going to be a hack unless we rewrite everything.

Exactly! I'd rather move to a more modular solution.

For now, though I think we should treat this project as experimental.

I would like to keep in mind the problems it will present when thinking about refactoring commands for Stan 3.

On Tue, Jul 8, 2014 at 3:58 PM, Peter Li notifications@github.com wrote:

The only thing is, then how do we pass in the inits?

— Reply to this email directly or view it on GitHub https://github.com/stan-dev/stan/issues/362#issuecomment-48348484.

— Reply to this email directly or view it on GitHub.

PeterLi2016 commented 10 years ago

Ok, so I've got a version of the stretch move implemented in the branch feature/issue-362-... but I'm not sure how to interface it with command.hpp. Right now, I just tried adding the following to command.hpp. What else do I need to do and then how do I call it from CmdStan? (I remember last summer for rwm, I just had to add a flag like -unit_metro to the call).

else if (algo->value() == "stretch_ensemble") {
          // Headers
          writer.write_sample_names(s, sampler_ptr, model);
          writer.write_diagnostic_names(s, sampler_ptr, model);

          std::string prefix = "";
          std::string suffix = "\n";
          NoOpFunctor startTransitionCallback;

          // Sampling
          clock_t start = clock();

          sample<Model, rng_t>(sampler_ptr, 0, num_samples, num_thin,
                               refresh, true,
                               writer,
                               s, model, base_rng,
                               prefix, suffix, std::cout,
                               startTransitionCallback);

          clock_t end = clock();
          sampleDeltaT = (double)(end - start) / CLOCKS_PER_SEC;

          writer.write_timing(warmDeltaT, sampleDeltaT);

        if (sampler_ptr) delete sampler_ptr;

        }
betanalpha commented 10 years ago

You’ll have to add a “stretch_ensemble” argument to the gm/arguments and then add that as a valid option to arg_algorithm.hpp. Then you should be able to run the code by calling ./program sample algo=strech_ensemble or the like.

On Jul 8, 2014, at 7:54 PM, Peter Li notifications@github.com wrote:

Ok, so I've got a version of the stretch move implemented in the branch feature/issue-362-... but I'm not sure how to interface it with command.hpp. Right now, I just tried adding the following to command.hpp. What else do I need to do and then how do I call it from CmdStan? (I remember last summer for rwm, I just had to add a flag like -unit_metro to the call).

else if (algo->value() == "stretch_ensemble") { // Headers writer.write_sample_names(s, sampler_ptr, model); writer.write_diagnostic_names(s, sampler_ptr, model);

      std::string prefix = "";
      std::string suffix = "\n";
      NoOpFunctor startTransitionCallback;

      // Sampling
      clock_t start = clock();

      sample<Model, rng_t>(sampler_ptr, 0, num_samples, num_thin,
                           refresh, true,
                           writer,
                           s, model, base_rng,
                           prefix, suffix, std::cout,
                           startTransitionCallback);

      clock_t end = clock();
      sampleDeltaT = (double)(end - start) / CLOCKS_PER_SEC;

      writer.write_timing(warmDeltaT, sampleDeltaT);

    if (sampler_ptr) delete sampler_ptr;

    }

— Reply to this email directly or view it on GitHub.

PeterLi2016 commented 10 years ago

Ok. I've got everything up and running now for the stretch move version of the ensemble sampler. I've tried running it with the blockers model (bugs vol 1) and it gets terrible results (I've tried running up to 100k iterations).

And when I run it with a simple bernoulli model in the examples

data { 
  int<lower=0> N; 
  int<lower=0,upper=1> y[N];
} 
parameters {
  real<lower=0,upper=1> theta;
} 
model {
  theta ~ beta(1,1);
  for (n in 1:N) 
    y[n] ~ bernoulli(theta);
}

where theta should converge to 0.25, the ensemble samplers get around 0.18 [I still get around this when I vary the 'scale parameter' i.e. the parameter a in the distribution Goodman and Weare suggest to use in generating the movements of the walkers].

And a plot of theta against iterations gives

Any ideas on what could be going on?

PeterLi2016 commented 10 years ago

I think both the walk and stretch moves are working in the branch now--the number of walkers might need to be adjusted manually depending on the number of parameters.

The issue right now is returning the mean of the constrained parameters at each iteration. Right now, I'm returning just the unconstrained parameters of the first walker at each iteration which are then converted back the constrained parameters (outside of all the ensemble stuff).

Bob suggested converting the unconstrained parameters to the constrained parameters, calculating the mean of these, converting this back to the unconstrained parameters and then returning this at each iteration so it works with all of the current framework. The issue with this right now is that there isn't any code in the generated model files to convert from unconstrained to constrained (in the generated model files this is all done in the logprob function). This would also be slower (though I'm not sure how much slower) since we'd be converting back and forth between the constrained and unconstrained when we don't really have to so I'm not sure how good that would be for comparisons to the other samplers. But to avoid this I think it would require changing the current interfaces to support ensembles. In any case, this is just something to keep in mind when speccing out 3.0.

Another thing is how do we want to determine the number of walkers? It's suggested to have at least twice as many walkers as parameters but I think if we have fewer parameters we may want to have more than twice as many but as the number of parameters gets larger this would probably be too costly (emcee allows users to specify the number of walkers).

And for the distribution suggested by Goodman and Weare for making the stretch moves, they suggest image. Right now it is always sets a=2 (which is also what emcee does). I'm not sure if it is worth tuning this or even varying it at all?

betanalpha commented 10 years ago

The issue right now is returning the mean of the constrained parameters at each iteration. Right now, I'm returning just the unconstrained parameters of the first walker at each iteration which are then converted back the constrained parameters (outside of all the ensemble stuff).

Bob suggested converting the unconstrained parameters to the constrained parameters, calculating the mean of these, converting this back to the unconstrained parameters and then returning this at each iteration so it works with all of the current framework. The issue with this right now is that there isn't any code in the generated model files to convert from unconstrained to constrained (in the generated model files this is all done in the logprob function). This would also be slower (though I'm not sure how much slower) since we'd be converting back and forth between the constrained and unconstrained when we don't really have to so I'm not sure how good that would be for comparisons to the other samplers. But to avoid this I think it would require changing the current interfaces to support ensembles. In any case, this is just something to keep in mind when speccing out 3.0.

Can you not use model::write_array to do it? See, for example, src/stan/io/mcmc_writer::write_sample_params for an example. Another thing is how do we want to determine the number of walkers? It's suggested to have at least twice as many walkers as parameters but I think if we have fewer parameters we may want to have more than twice as many but as the number of parameters gets larger this would probably be too costly (emcee allows users to specify the number of walkers).

And for the distribution suggested by Goodman and Weare for making the stretch moves, they suggest . Right now it is always sets a=2 (which is also what emcee does). I'm not sure if it is worth tuning this or even varying it at all?

Hey, we have a sampler with no free parameters! Except all those free parameters!

Because it’s easy to let it be tunable, I’d say let it be tunable but don’t bother trying to find an optimal value. Just let a=2 be the default and do any comparisons with the emcee defaults.

bob-carpenter commented 10 years ago

Perhaps the constraining transform shouldn't be named write_array() :-)

The conversion from unconstrained to constrained is going to be necessary at some point. The current proposal takes each iteration, constrains each walker, then averages, then unconstrains the constrained average. There's only 1 unconstraining transform compared to N constraining ones, and we have to do the constraining ones anyway.

I agree with Michael that we should just use the emcee defaults to start.

It'd also be possible to play with the proposal density, I would think as well as the endpoints, but I agree we should wait on this.

It'd also be nice to compare how well emcee itself works on constrained vs. unconstrained params. Their doc just says to deal with constraints by setting the log prob to -infinity, but I'm assuming that's not exactly an efficient way to deal with constraints.

On Jul 10, 2014, at 5:15 PM, Michael Betancourt notifications@github.com wrote:

The issue right now is returning the mean of the constrained parameters at each iteration. Right now, I'm returning just the unconstrained parameters of the first walker at each iteration which are then converted back the constrained parameters (outside of all the ensemble stuff).

Bob suggested converting the unconstrained parameters to the constrained parameters, calculating the mean of these, converting this back to the unconstrained parameters and then returning this at each iteration so it works with all of the current framework. The issue with this right now is that there isn't any code in the generated model files to convert from unconstrained to constrained (in the generated model files this is all done in the logprob function). This would also be slower (though I'm not sure how much slower) since we'd be converting back and forth between the constrained and unconstrained when we don't really have to so I'm not sure how good that would be for comparisons to the other samplers. But to avoid this I think it would require changing the current interfaces to support ensembles. In any case, this is just something to keep in mind when speccing out 3.0.

Can you not use model::write_array to do it? See, for example, src/stan/io/mcmc_writer::write_sample_params for an example. Another thing is how do we want to determine the number of walkers? It's suggested to have at least twice as many walkers as parameters but I think if we have fewer parameters we may want to have more than twice as many but as the number of parameters gets larger this would probably be too costly (emcee allows users to specify the number of walkers).

And for the distribution suggested by Goodman and Weare for making the stretch moves, they suggest . Right now it is always sets a=2 (which is also what emcee does). I'm not sure if it is worth tuning this or even varying it at all?

Hey, we have a sampler with no free parameters! Except all those free parameters!

Because it’s easy to let it be tunable, I’d say let it be tunable but don’t bother trying to find an optimal value. Just let a=2 be the default and do any comparisons with the emcee defaults. — Reply to this email directly or view it on GitHub.

betanalpha commented 10 years ago

The conversion from unconstrained to constrained is going to be necessary at some point. The current proposal takes each iteration, constrains each walker, then averages, then unconstrains the constrained average. There's only 1 unconstraining transform compared to N constraining ones, and we have to do the constraining ones anyway.

Why does the average need to be unconstrained at the end? Isn’t the next iteration just set by the current state of the walkers on the unconstrained space?

It'd also be nice to compare how well emcee itself works on constrained vs. unconstrained params. Their doc just says to deal with constraints by setting the log prob to -infinity, but I'm assuming that's not exactly an efficient way to deal with constraints.

A shudder goes down my spine every time someone does that. I have a lot of back problems.

PeterLi2016 commented 10 years ago

On Thu, Jul 10, 2014 at 5:44 PM, Michael Betancourt < notifications@github.com> wrote:

The conversion from unconstrained to constrained is going to be necessary at some point. The current proposal takes each iteration, constrains each walker, then averages, then unconstrains the constrained average. There's only 1 unconstraining transform compared to N constraining ones, and we have to do the constraining ones anyway.

Why does the average need to be unconstrained at the end? Isn’t the next iteration just set by the current state of the walkers on the unconstrained space?

It has to be converted to unconstrained and then converted back to the constrained because the sample we return from the transition method at each iteration is converted from unconstrained to constrained in the current framework.

It'd also be nice to compare how well emcee itself works on constrained vs. unconstrained params. Their doc just says to deal with constraints by setting the log prob to -infinity, but I'm assuming that's not exactly an efficient way to deal with constraints.

A shudder goes down my spine every time someone does that. I have a lot of back problems.

— Reply to this email directly or view it on GitHub https://github.com/stan-dev/stan/issues/362#issuecomment-48668681.

bob-carpenter commented 10 years ago

On Jul 10, 2014, at 11:29 PM, Peter Li notifications@github.com wrote:

On Thu, Jul 10, 2014 at 5:44 PM, Michael Betancourt < notifications@github.com> wrote:

The conversion from unconstrained to constrained is going to be necessary at some point. The current proposal takes each iteration, constrains each walker, then averages, then unconstrains the constrained average. There's only 1 unconstraining transform compared to N constraining ones, and we have to do the constraining ones anyway.

Why does the average need to be unconstrained at the end? Isn’t the next iteration just set by the current state of the walkers on the unconstrained space?

It has to be converted to unconstrained and then converted back to the constrained because the sample we return from the transition method at each iteration is converted from unconstrained to constrained in the current framework.

Exactly.

If we ever got serious about the ensemble approach, we could bypass some of this overhead. The current approach is just an expedient to drop it into the existing CmdStan framework for evaluation.

It'd also be nice to compare how well emcee itself works on constrained vs. unconstrained params. Their doc just says to deal with constraints by setting the log prob to -infinity, but I'm assuming that's not exactly an efficient way to deal with constraints.

A shudder goes down my spine every time someone does that. I have a lot of back problems.

The point is that we want to make a shudder go down everybody's back. We can be the Stephen Kings of the MCMC world :-)

betanalpha commented 10 years ago

It has to be converted to unconstrained and then converted back to the constrained because the sample we return from the transition method at each iteration is converted from unconstrained to constrained in the current framework.

Exactly.

Ah, got it.

A shudder goes down my spine every time someone does that. I have a lot of back problems.

The point is that we want to make a shudder go down everybody's back. We can be the Stephen Kings of the MCMC world :-)

Can’t we be the Orwellian police of the statistical world? “We’re watching you and your bad analyses…"

PeterLi2016 commented 10 years ago

So now the problem I'm having is converting the means back to the unconstrained using the constrain_inits() from the generated model files. It requires

const stan::io::var_context& context

which it uses to run some checks on the parameters, but I'm not sure how to recreate it from within the ensemble sampler.

syclik commented 10 years ago

A simple instantiation is in src/test/unit/io/dump_test.cpp. On Jul 11, 2014 10:27 AM, "Peter Li" notifications@github.com wrote:

So now the problem I'm having is converting the means back to the unconstrained using the constrain_inits() from the generated model files. It requires

const stan::io::var_context& context

which it uses to run some checks on the parameters, but I'm not sure how to recreate it from within the ensemble sampler.

— Reply to this email directly or view it on GitHub https://github.com/stan-dev/stan/issues/362#issuecomment-48736760.

PeterLi2016 commented 10 years ago

I've tried playing around with it but I'm not sure how to get it so it has the right attributes so it passes all the checks generated in transform_inits(). I've also tried passing in the original var_context that is used when the model is first initialized as a parameter but it's having trouble finding it (see branch feature/issue-362-ensemble-samplers).

stan/src/stan/common/command.hpp:619:54: error: use of undeclared identifier 'init_var_context'; did
      you mean 'data_var_context'?
          sampler_ptr = new sampler(model, base_rng, init_var_context, &std::cout, &std::cout);
                                                     ^~~~~~~~~~~~~~~~
                                                     data_var_context
stan/src/stan/common/command.hpp:130:22: note: 'data_var_context' declared here
      stan::io::dump data_var_context(data_stream);
                     ^
stan/src/stan/common/command.hpp:628:54: error: use of undeclared identifier 'init_var_context'; did
      you mean 'data_var_context'?
          sampler_ptr = new sampler(model, base_rng, init_var_context, &std::cout, &std::cout);
                                                     ^~~~~~~~~~~~~~~~
                                                     data_var_context
stan/src/stan/common/command.hpp:130:22: note: 'data_var_context' declared here
      stan::io::dump data_var_context(data_stream);
bob-carpenter commented 10 years ago

RStan and PyStan must be using transform_inits to do the unconstraining transform, so there's probably existing code in their repos you could use. All you need to do is create a var_context instance programatically --- you can use the constrained parameters and get_param_names() and get_dims() which return the variables as the same order as in the parameter vectors results in order.

I think it'd be worth encapsulating that function for now in the models as a function unconstrain_params() that takes a model as an argument.

For Stan 3 (or earlier), it'd be nice to rename write_array() to constrain_params(). And generate code to efficiently do the unconstrain without so much copying overhead.

On Jul 11, 2014, at 3:41 PM, Peter Li notifications@github.com wrote:

I've tried playing around with it but I'm not sure how to get it so it has the right attributes so it passes all the checks generated in transform_inits(). I've also tried passing in the original var_context that is used when the model is first initialized as a parameter but it's having trouble finding it (see branch feature/issue-362-ensemble-samplers).

stan/src/stan/common/command.hpp:619:54: error: use of undeclared identifier 'init_var_context'; did you mean 'data_var_context'? sampler_ptr = new sampler(model, base_rng, init_var_context, &std::cout, &std::cout); ^~~~ data_var_context stan/src/stan/common/command.hpp:130:22: note: 'data_var_context' declared here stan::io::dump data_var_context(data_stream); ^ stan/src/stan/common/command.hpp:628:54: error: use of undeclared identifier 'init_var_context'; did you mean 'data_var_context'? sampler_ptr = new sampler(model, base_rng, init_var_context, &std::cout, &std::cout); ^~~~ data_var_context stan/src/stan/common/command.hpp:130:22: note: 'data_var_context' declared here stan::io::dump data_var_context(data_stream);

— Reply to this email directly or view it on GitHub.

PeterLi2016 commented 10 years ago

You mean write something like whats below? (this is just a rough sketch--I haven't finished or tested it yet, I just wanted to make sure that this is what you meant)

    std::vector<std::string> names;
    std::vector<std::vector<size_t> > dims;
    std::instream in;
    _model.template get_param_names(names);
    _model.template get_dims(dims);
    int count = 0;

    for (int i = 0; i < names.size(); i++) {
      if (dim[i].size() == 0) {
        in << names[i] << " <- " << _params_mean[count] << "\n";
        count += 1;
      } else if (dims[i].size() == 1) {

      } else if (dims[i].size() > 1) {
        in << names[i] << " <- structure(c(";
        int num_data = 1;
        for (int j = 0; j < dims[i].size(); j++)
          num_data *= dims[i][j];
        for (int j = count; j < num_data+count; j++)
          in << _params_mean[j] << ",";
        count += num_data;
      }

    }

On Fri, Jul 11, 2014 at 4:12 PM, Bob Carpenter notifications@github.com wrote:

RStan and PyStan must be using transform_inits to do the unconstraining transform, so there's probably existing code in their repos you could use. All you need to do is create a var_context instance programatically --- you can use the constrained parameters and get_param_names() and get_dims() which return the variables as the same order as in the parameter vectors results in order.

I think it'd be worth encapsulating that function for now in the models as a function unconstrain_params() that takes a model as an argument.

For Stan 3 (or earlier), it'd be nice to rename write_array() to constrain_params(). And generate code to efficiently do the unconstrain without so much copying overhead.

  • Bob

On Jul 11, 2014, at 3:41 PM, Peter Li notifications@github.com wrote:

I've tried playing around with it but I'm not sure how to get it so it has the right attributes so it passes all the checks generated in transform_inits(). I've also tried passing in the original var_context that is used when the model is first initialized as a parameter but it's having trouble finding it (see branch feature/issue-362-ensemble-samplers).

stan/src/stan/common/command.hpp:619:54: error: use of undeclared identifier 'init_var_context'; did you mean 'data_var_context'? sampler_ptr = new sampler(model, base_rng, init_var_context, &std::cout, &std::cout); ^~~~ data_var_context stan/src/stan/common/command.hpp:130:22: note: 'data_var_context' declared here stan::io::dump data_var_context(data_stream); ^ stan/src/stan/common/command.hpp:628:54: error: use of undeclared identifier 'init_var_context'; did you mean 'data_var_context'? sampler_ptr = new sampler(model, base_rng, init_var_context, &std::cout, &std::cout); ^~~~ data_var_context stan/src/stan/common/command.hpp:130:22: note: 'data_var_context' declared here stan::io::dump data_var_context(data_stream);

— Reply to this email directly or view it on GitHub.

— Reply to this email directly or view it on GitHub https://github.com/stan-dev/stan/issues/362#issuecomment-48777406.

bob-carpenter commented 10 years ago

My suggestion was to go look at RStan and PyStan and see what they did. They must be creating a var_context object from a model in order to do the unconstraining transforms.

I don't see how what you have below will populate a var_context object.

On Jul 11, 2014, at 4:59 PM, Peter Li notifications@github.com wrote:

You mean write something like whats below? (this is just a rough sketch--I haven't finished or tested it yet, I just wanted to make sure that this is what you meant)

std::vectorstd::string names; std::vectorstd::vector dims; std::instream in; _model.template get_param_names(names); _model.template get_dims(dims); int count = 0;

for (int i = 0; i < names.size(); i++) { if (dim[i].size() == 0) { in << names[i] << " <- " << _params_mean[count] << "\n"; count += 1; } else if (dims[i].size() == 1) {

} else if (dims[i].size() > 1) { in << names[i] << " <- structure(c("; int num_data = 1; for (int j = 0; j < dims[i].size(); j++) num_data *= dims[i][j]; for (int j = count; j < num_data+count; j++) in << _params_mean[j] << ","; count += num_data; }

}

On Fri, Jul 11, 2014 at 4:12 PM, Bob Carpenter notifications@github.com wrote:

RStan and PyStan must be using transform_inits to do the unconstraining transform, so there's probably existing code in their repos you could use. All you need to do is create a var_context instance programatically --- you can use the constrained parameters and get_param_names() and get_dims() which return the variables as the same order as in the parameter vectors results in order.

I think it'd be worth encapsulating that function for now in the models as a function unconstrain_params() that takes a model as an argument.

For Stan 3 (or earlier), it'd be nice to rename write_array() to constrain_params(). And generate code to efficiently do the unconstrain without so much copying overhead.

  • Bob

On Jul 11, 2014, at 3:41 PM, Peter Li notifications@github.com wrote:

I've tried playing around with it but I'm not sure how to get it so it has the right attributes so it passes all the checks generated in transform_inits(). I've also tried passing in the original var_context that is used when the model is first initialized as a parameter but it's having trouble finding it (see branch feature/issue-362-ensemble-samplers).

stan/src/stan/common/command.hpp:619:54: error: use of undeclared identifier 'init_var_context'; did you mean 'data_var_context'? sampler_ptr = new sampler(model, base_rng, init_var_context, &std::cout, &std::cout); ^~~~ data_var_context stan/src/stan/common/command.hpp:130:22: note: 'data_var_context' declared here stan::io::dump data_var_context(data_stream); ^ stan/src/stan/common/command.hpp:628:54: error: use of undeclared identifier 'init_var_context'; did you mean 'data_var_context'? sampler_ptr = new sampler(model, base_rng, init_var_context, &std::cout, &std::cout); ^~~~ data_var_context stan/src/stan/common/command.hpp:130:22: note: 'data_var_context' declared here stan::io::dump data_var_context(data_stream);

— Reply to this email directly or view it on GitHub.

— Reply to this email directly or view it on GitHub https://github.com/stan-dev/stan/issues/362#issuecomment-48777406.

— Reply to this email directly or view it on GitHub.

PeterLi2016 commented 10 years ago

Oh. I thought the idea was to format the data like we require for users in the .data.R files and then pass that in to stan::io::dump and then use that for the var_context argument in model.constrain_inits().

RStan does something pretty similar to CmdStan, I think.

But PyStan builds a subclass of var_context and uses it in

    std::vector<double> unconstrain_pars(vars_r_t& vars_r, vars_i_t& vars_i) {
      pystan::io::py_var_context par_context(vars_r, vars_i);
      std::vector<int> params_i;
      std::vector<double> params_r;
      model_.transform_inits(par_context, params_i, params_r);
      return params_r;
    }

So should I do what PyStan does and build a subclass of var_context? Or should I try and use stan::io::dump?

bob-carpenter commented 10 years ago

sublcass of var_context. It's horrendously expensive to convert numbers to strings and strings to numbers. And it's more memory intensive.

On Jul 11, 2014, at 6:56 PM, Peter Li notifications@github.com wrote:

Oh. I thought the idea was to format the data like we require for users in the .data.R files and then pass that in to stan::io::dump and then use that for the var_context argument in model.constrain_inits().

RStan does something pretty similar to CmdStan, I think.

But PyStan builds a subclass of var_context and uses it in

std::vector<double> unconstrain_pars(vars_r_t& vars_r, vars_i_t& vars_i) {
  pystan::io::py_var_context par_context(vars_r, vars_i);
  std::vector<int> params_i;
  std::vector<double> params_r;
  model_.transform_inits(par_context, params_i, params_r);
  return params_r;
}

So should I do what PyStan does and build a subclass of var_context? Or should I try and use stan::io::dump?

— Reply to this email directly or view it on GitHub.

PeterLi2016 commented 10 years ago

Ah, ok.

On Fri, Jul 11, 2014 at 7:19 PM, Bob Carpenter notifications@github.com wrote:

sublcass of var_context. It's horrendously expensive to convert numbers to strings and strings to numbers. And it's more memory intensive.

  • Bob

On Jul 11, 2014, at 6:56 PM, Peter Li notifications@github.com wrote:

Oh. I thought the idea was to format the data like we require for users in the .data.R files and then pass that in to stan::io::dump and then use that for the var_context argument in model.constrain_inits().

RStan does something pretty similar to CmdStan, I think.

But PyStan builds a subclass of var_context and uses it in

std::vector unconstrain_pars(vars_r_t& vars_r, vars_i_t& vars_i) { pystan::io::py_var_context par_context(vars_r, vars_i); std::vector params_i; std::vector paramsr; model.transform_inits(par_context, params_i, params_r); return params_r; }

So should I do what PyStan does and build a subclass of var_context? Or should I try and use stan::io::dump?

— Reply to this email directly or view it on GitHub.

— Reply to this email directly or view it on GitHub https://github.com/stan-dev/stan/issues/362#issuecomment-48793724.

PeterLi2016 commented 10 years ago

It looks like it is working now (at least with the bernoulli model). I'll add more tests with different parameter types to make sure it works for all of the parameter types that Stan allows.

bob-carpenter commented 7 years ago

see discussion in #774

syclik commented 7 years ago

I'm closing this issue. I think the evaluation that Peter did was sufficient to close this issue.