add automatic thinning of draws

n-kall commented 11 months ago

Summary

After discussions with @avehtari this PR adds automatic thinning of draws based on ESS as suggested by Säilynoja et al. 2022 (appendix).

Thin by = S / (min(ess_tail, ess_bulk)). For multiple chains, this is ndraws_per_chain / mean_over_chains(min(ess_tail, ess_bulk))

Addresses issue #127

Copyright and Licensing

By submitting this pull request, the copyright holder is agreeing to license the submitted work under the following licenses:

Code: BSD 3-clause (https://opensource.org/licenses/BSD-3-Clause)
Documentation: CC-BY 4.0 (https://creativecommons.org/licenses/by/4.0/)

codecov-commenter commented 11 months ago

Codecov Report

All modified and coverable lines are covered by tests :white_check_mark:

Comparison is base (0322b46) 95.14% compared to head (4abb2fb) 95.15%.

:exclamation: Current head 4abb2fb differs from pull request most recent head 3c3d7e8. Consider uploading reports for the commit 3c3d7e8 to get more accurate results

Additional details and impacted files

```diff @@ Coverage Diff @@ ## master #326 +/- ## ========================================== + Coverage 95.14% 95.15% +0.01% ========================================== Files 47 47 Lines 3745 3754 +9 ========================================== + Hits 3563 3572 +9 Misses 182 182 ```

:umbrella: View full report in Codecov by Sentry.
:loudspeaker: Have feedback on the report? Share it here.

github-actions[bot] commented 11 months ago

This is how benchmark results would change (along with a 95% confidence interval in relative change) if c25f45291c2d0c1440ba379855cf429e9d4bc9e4 is merged into master:

:ballot_box_with_check:as_draws_array: 102ms -> 102ms [-2.27%, +1.77%]
:ballot_box_with_check:as_draws_df: 33ms -> 32.7ms [-2.83%, +1.04%]
:exclamation::snail:as_draws_list: 188ms -> 192ms [+0.66%, +2.57%]
:ballot_box_with_check:as_draws_matrix: 30.1ms -> 30ms [-1.46%, +0.65%]
:ballot_box_with_check:as_draws_rvars: 160ms -> 164ms [-0.99%, +5.13%]
:ballot_box_with_check:summarise_draws_100_variables: 726ms -> 726ms [-0.47%, +0.56%]
:ballot_box_with_check:summarise_draws_10_variables: 81.1ms -> 81.3ms [-0.48%, +0.91%] Further explanation regarding interpretation and methodology can be found in the documentation.

github-actions[bot] commented 11 months ago

This is how benchmark results would change (along with a 95% confidence interval in relative change) if 976b950edb1bf7a13acbe358827e84fcfc20eab0 is merged into master:

:ballot_box_with_check:as_draws_array: 100ms -> 100ms [-0.4%, +1.29%]
:ballot_box_with_check:as_draws_df: 31.6ms -> 32.1ms [-0.69%, +3.48%]
:exclamation::snail:as_draws_list: 185ms -> 188ms [+0.36%, +3.32%]
:ballot_box_with_check:as_draws_matrix: 29.3ms -> 29ms [-3.18%, +0.86%]
:ballot_box_with_check:as_draws_rvars: 156ms -> 159ms [-0.99%, +4.66%]
:ballot_box_with_check:summarise_draws_100_variables: 716ms -> 718ms [-0.74%, +1.22%]
:ballot_box_with_check:summarise_draws_10_variables: 79.2ms -> 79.2ms [-1.35%, +1.38%] Further explanation regarding interpretation and methodology can be found in the documentation.

n-kall commented 11 months ago

@avehtari Should it be ceiling() or round() on the thinning value if non-integer? In the issue #127 it says ceiling, but I don't remember what we decided

avehtari commented 11 months ago

How about thinning with non-integers? E.g. round(seq(1,100,by=max(1,1.5))) gives valid indeces

github-actions[bot] commented 11 months ago

This is how benchmark results would change (along with a 95% confidence interval in relative change) if 126d6dc21ff473ab00f08e94fdb3fcca6c1c2e89 is merged into master:

:ballot_box_with_check:as_draws_array: 101ms -> 101ms [-0.65%, +0.93%]
:ballot_box_with_check:as_draws_df: 32ms -> 32.2ms [-0.98%, +2.04%]
:ballot_box_with_check:as_draws_list: 185ms -> 186ms [-1.04%, +1.68%]
:ballot_box_with_check:as_draws_matrix: 29.6ms -> 29.6ms [-1.8%, +1.79%]
:ballot_box_with_check:as_draws_rvars: 159ms -> 159ms [-1.54%, +1.21%]
:ballot_box_with_check:summarise_draws_100_variables: 719ms -> 719ms [-0.93%, +0.8%]
:ballot_box_with_check:summarise_draws_10_variables: 79.7ms -> 80ms [-0.43%, +1.16%] Further explanation regarding interpretation and methodology can be found in the documentation.

github-actions[bot] commented 10 months ago

This is how benchmark results would change (along with a 95% confidence interval in relative change) if 98dea553a85c498c932dfbc94959689bd8c950f8 is merged into master:

:ballot_box_with_check:as_draws_array: 102ms -> 102ms [-1.52%, +0.83%]
:ballot_box_with_check:as_draws_df: 33.7ms -> 34.1ms [-0.5%, +3.02%]
:ballot_box_with_check:as_draws_list: 195ms -> 194ms [-2.71%, +1.62%]
:rocket:as_draws_matrix: 32.9ms -> 30.6ms [-9.42%, -4.64%]
:ballot_box_with_check:as_draws_rvars: 169ms -> 171ms [-3.28%, +5%]
:ballot_box_with_check:summarise_draws_100_variables: 746ms -> 749ms [-2.13%, +2.76%]
:ballot_box_with_check:summarise_draws_10_variables: 81.7ms -> 81.1ms [-1.91%, +0.59%] Further explanation regarding interpretation and methodology can be found in the documentation.

github-actions[bot] commented 10 months ago

This is how benchmark results would change (along with a 95% confidence interval in relative change) if b32dfe8d111e1473aa7a667e271ac169b5ae9aca is merged into master:

:ballot_box_with_check:as_draws_array: 109ms -> 107ms [-4.82%, +1.13%]
:ballot_box_with_check:as_draws_df: 37.3ms -> 37.3ms [-3.06%, +2.91%]
:ballot_box_with_check:as_draws_list: 207ms -> 209ms [-3.44%, +6%]
:exclamation::snail:as_draws_matrix: 32.9ms -> 34.1ms [+0.04%, +7.4%]
:ballot_box_with_check:as_draws_rvars: 177ms -> 178ms [-3.94%, +5.37%]
:ballot_box_with_check:summarise_draws_100_variables: 772ms -> 765ms [-2.8%, +1.01%]
:ballot_box_with_check:summarise_draws_10_variables: 85.3ms -> 85.9ms [-1.51%, +2.83%] Further explanation regarding interpretation and methodology can be found in the documentation.

github-actions[bot] commented 10 months ago

This is how benchmark results would change (along with a 95% confidence interval in relative change) if c5c1db02df76d7eb3bb27fb131ddff73ecc22ec2 is merged into master:

:ballot_box_with_check:as_draws_array: 101ms -> 102ms [-0.31%, +0.95%]
:ballot_box_with_check:as_draws_df: 33.8ms -> 34.1ms [-1.04%, +3.24%]
:ballot_box_with_check:as_draws_list: 189ms -> 191ms [-1.22%, +3.06%]
:exclamation::snail:as_draws_matrix: 29.9ms -> 30.8ms [+1.27%, +4.65%]
:ballot_box_with_check:as_draws_rvars: 161ms -> 160ms [-2.02%, +0.68%]
:ballot_box_with_check:summarise_draws_100_variables: 731ms -> 730ms [-0.68%, +0.42%]
:ballot_box_with_check:summarise_draws_10_variables: 83.5ms -> 80.3ms [-11.76%, +4.12%] Further explanation regarding interpretation and methodology can be found in the documentation.

n-kall commented 10 months ago

I've changed the thinning to handle non-integer values. I think this is now ready for further review

paul-buerkner commented 10 months ago

Looks good to me. @avehtari do you have any comments? Otherwise, I will merge.

avehtari commented 10 months ago

It would be good that the documentation would explicitly mention what happens with non-integer thin values. Even if people would not use non-integer as argument, if they use automatic thinning, they may be surprised by the number of returned iterations unless behavior of non-integer thin is documented

n-kall commented 10 months ago

@avehtari how about this for the doc: "thin (numeric) The period for selecting draws. Must be between 1 and the number of iterations. If the value is not an integer, the draws will be selected such that the average interval between them approaches the thin value. If NULL, it will be automatically calculated based on bulk and tail effective sample size as suggested by Säilynoja et al. (2022)."

github-actions[bot] commented 10 months ago

This is how benchmark results would change (along with a 95% confidence interval in relative change) if 4abb2fbb1c39cc7e436efddd70c487e4ac37c124 is merged into master:

:ballot_box_with_check:as_draws_array: 101ms -> 102ms [-0.45%, +1.11%]
:ballot_box_with_check:as_draws_df: 33.7ms -> 33.8ms [-2.69%, +3.37%]
:rocket:as_draws_list: 190ms -> 187ms [-3.35%, -0.07%]
:exclamation::snail:as_draws_matrix: 29.6ms -> 30.8ms [+2.09%, +5.88%]
:ballot_box_with_check:as_draws_rvars: 161ms -> 160ms [-2.1%, +1.32%]
:ballot_box_with_check:summarise_draws_100_variables: 723ms -> 726ms [-0.56%, +1.31%]
:ballot_box_with_check:summarise_draws_10_variables: 79.4ms -> 79.3ms [-0.78%, +0.55%] Further explanation regarding interpretation and methodology can be found in the documentation.

github-actions[bot] commented 10 months ago

This is how benchmark results would change (along with a 95% confidence interval in relative change) if a8b50b58ee14afc6cd01b0bb9ae0f0e1e3c78061 is merged into master:

:ballot_box_with_check:as_draws_array: 99.6ms -> 99.7ms [-0.35%, +0.6%]
:ballot_box_with_check:as_draws_df: 33ms -> 33.5ms [-0.43%, +3.61%]
:ballot_box_with_check:as_draws_list: 183ms -> 182ms [-1.22%, +0.43%]
:exclamation::snail:as_draws_matrix: 28.9ms -> 30.2ms [+3.18%, +5.53%]
:ballot_box_with_check:as_draws_rvars: 155ms -> 157ms [-0.7%, +2.47%]
:ballot_box_with_check:summarise_draws_100_variables: 717ms -> 716ms [-0.75%, +0.55%]
:ballot_box_with_check:summarise_draws_10_variables: 78.2ms -> 78.5ms [-0.19%, +0.83%] Further explanation regarding interpretation and methodology can be found in the documentation.

n-kall commented 10 months ago

Updated the documentation based on dicussions with @avehtari. I think it's ready now

avehtari commented 10 months ago

OK for me

github-actions[bot] commented 10 months ago

This is how benchmark results would change (along with a 95% confidence interval in relative change) if 6555ac87c5fc6ab348e50adba4e797bf95baf045 is merged into master:

:ballot_box_with_check:as_draws_array: 99.6ms -> 99.8ms [-0.42%, +0.9%]
:ballot_box_with_check:as_draws_df: 32.6ms -> 37.8ms [-12.06%, +44.12%]
:ballot_box_with_check:as_draws_list: 182ms -> 184ms [-1.54%, +4.14%]
:exclamation::snail:as_draws_matrix: 29.1ms -> 30.4ms [+1.84%, +7.63%]
:ballot_box_with_check:as_draws_rvars: 155ms -> 155ms [-1.46%, +1.37%]
:ballot_box_with_check:summarise_draws_100_variables: 715ms -> 716ms [-0.27%, +0.38%]
:ballot_box_with_check:summarise_draws_10_variables: 78.9ms -> 79.2ms [-0.32%, +1.19%] Further explanation regarding interpretation and methodology can be found in the documentation.

paul-buerkner commented 10 months ago

thanks!

stan-dev / posterior