lizzieinvancouver / ospree

Budbreak review paper database
3 stars 0 forks source link

update priors in phylogeny code #445

Open lizzieinvancouver opened 1 year ago

lizzieinvancouver commented 1 year ago

Lizzie decided the priors are too informative in the phylogeny model and @MoralesCastilla and I agreed to update them.

@MoralesCastilla Here's some updated results:

                         mean     se_mean        sd         2.5%        25%        50%        75%      97.5%
a_z                30.6355678 0.048022913 3.4486572  23.77530319 28.3694823 30.6324975 32.9314840 37.5291566
lam_interceptsa     0.3412973 0.001755927 0.1014318   0.15896232  0.2667138  0.3377008  0.4083225  0.5466940
sigma_interceptsa  15.9418897 0.017593934 1.1428012  13.91100720 15.1249916 15.8968418 16.6749936 18.3607170
b_zf               -5.9309326 0.051273253 2.1725713 -10.09849134 -7.3684668 -5.9591411 -4.5612671 -1.5109412
lam_interceptsbf    0.7153708 0.010943695 0.1870341   0.28665896  0.5964494  0.7440669  0.8658757  0.9816850
sigma_interceptsbf  5.7228763 0.042187303 0.9983676   3.97176202  5.0279124  5.6546159  6.3528317  7.8617460
b_zc               -6.8353466 0.036884377 2.1807432 -10.96629676 -8.2809787 -6.8784845 -5.4383298 -2.3914394
lam_interceptsbc    0.5448730 0.005278042 0.1491919   0.24931360  0.4423494  0.5505829  0.6506744  0.8229247
sigma_interceptsbc  7.0564254 0.028322959 0.8554346   5.53376705  6.4599837  7.0095533  7.5766277  8.9161011
b_zp               -1.3108915 0.021176512 0.7645529  -2.83792983 -1.7912210 -1.3117205 -0.8306633  0.2468662
lam_interceptsbp    0.3804741 0.017250466 0.2364247   0.01834988  0.1901613  0.3577760  0.5495094  0.8762836
sigma_interceptsbp  2.4482465 0.019509419 0.3908509   1.74246012  2.1756638  2.4174613  2.6989308  3.2745220
sigma_y            12.5765770 0.002466067 0.1798002  12.23223427 12.4533337 12.5711019 12.6928642 12.9375163
lizzieinvancouver commented 1 year ago

@MoralesCastilla Looking above, do any of the results change much? I realize they might not, but I feel better with less informative-looking priors. As such, see commit 760854d95ec8a9f457ca76837ba803316ac44f07 where I added two Stan files and also started edits on the R code to call the new Stan file. If you can use this new file I would feel MUCH better. Also, we should keep priors in the Stan code. Not only does it make the Stan file a more complete model packet, but we found out this summer that the models run faster with the priors in the Stan file, so please update that too!

I also showed how I think you can plot the posterior vs. prior.

I also added another version with code that Mike Betancourt helped write. I am currently running it now, so no updates on that yet.

Let me know if you have any questions! And thanks again for your work on this.

lizzieinvancouver commented 1 year ago

Adding in some notes from @MoralesCastilla that I may need someday:

The two pieces of code that I have been using to fit stan models for the phylo models (both with identical priors, coming from an old version that I think Geoff coded) are:

/ospree/analyses/phylogeny/phylo_ospree_compact4.R /ospree/analyses/phylogeny/phylo_ospree_compact3.R

lizzieinvancouver commented 1 year ago

Running the new code by Mike I got similar answers I believe, but slightly less great model diagnostics.

Warning messages:
1: There were 1 chains where the estimated Bayesian Fraction of Missing Information was low. See
https://mc-stan.org/misc/warnings.html#bfmi-low 
2: Examine the pairs() plot to diagnose sampling problems

3: The largest R-hat is 1.06, indicating chains have not mixed.
Running the chains for more iterations may help.

> summary(totallynew, pars = list("a_z", "lambda_a", "tau_a", "b_zf", "lambda_bf", "tau_bf", "b_zc", "lambda_bc", "tau_bc", "b_zp", "lambda_bp", "tau_bp", "sigma_y"))$summary
                mean     se_mean         sd         2.5%        25%        50%        75%      97.5%      n_eff      Rhat
a_z       30.6609163 0.046317151 3.33769538  23.95634682 28.5010915 30.7481164 32.8456834 37.2881760 5192.89708 0.9995800
lambda_a   0.3377752 0.001847287 0.09866929   0.15864799  0.2681852  0.3330143  0.4024822  0.5401110 2852.95804 0.9999700
tau_a     15.9237365 0.020688010 1.14739439  13.96401445 15.1256397 15.8215151 16.6271837 18.3849482 3076.01173 0.9997567
b_zf      -6.0332336 0.056389351 2.21506855 -10.28645959 -7.5062610 -6.0522629 -4.6040607 -1.5875633 1543.05052 1.0023952
lambda_bf  0.6811229 0.011384102 0.19164504   0.25698274  0.5554078  0.7046368  0.8337117  0.9679002  283.39852 1.0112592
tau_bf     5.6360271 0.042889895 0.94820798   4.03609663  4.9640554  5.5551065  6.2152706  7.7711556  488.76179 1.0039632
b_zc      -6.9115799 0.042340308 2.10111913 -11.06380102 -8.2735237 -6.9767670 -5.5070111 -2.7140132 2462.59686 0.9998469
lambda_bc  0.5447945 0.004558514 0.14546925   0.25691488  0.4445376  0.5498117  0.6504146  0.8118499 1018.34700 1.0039005
tau_bc     7.0021445 0.032791311 0.86704422   5.46204534  6.4034941  6.9323746  7.5456707  8.8136223  699.14127 1.0074943
b_zp      -1.2860828 0.022000514 0.75870937  -2.79614474 -1.7622484 -1.2844222 -0.8102541  0.2376088 1189.28309 1.0028611
lambda_bp  0.3824301 0.027144479 0.26003614   0.01675179  0.1656038  0.3425425  0.5627393  0.9451274   91.77077 1.0628525
tau_bp     2.3821174 0.027784707 0.42114739   1.61251008  2.0875751  2.3634719  2.6493096  3.2484771  229.75057 1.0233743
sigma_y   12.5742529 0.002499161 0.17671671  12.23556219 12.4525164 12.5730693 12.6918316 12.9275085 4999.96051 1.0009709

So I think we can stick with uber_threeslopeintercept_modified_cholesky_updatedpriors.stan.

MoralesCastilla commented 1 year ago

@lizzieinvancouver

I have finally re-run all models with the less-informative prior specification. I copy model results below.

I don't think results change much, but still I need to redo the figures and such.

Angiosperms, lambda estimated

                         mean     se_mean         sd         2.5%        25%        50%        75%      97.5%     n_eff      Rhat
a_z                30.5698337 0.045116461 3.29551748  24.00814546 28.4390626 30.5777589 32.7712567 37.0669303 5335.5254 1.0004168
lam_interceptsa     0.3408081 0.001591384 0.09896235   0.16252835  0.2695451  0.3366184  0.4066509  0.5441582 3867.1389 1.0003653
sigma_interceptsa  15.9194325 0.020064023 1.15013909  13.86384886 15.1307935 15.8216493 16.6496890 18.3016999 3285.9783 0.9995509
b_zf               -5.9785888 0.047144237 2.14350056 -10.15515401 -7.3782389 -5.9890429 -4.6557944 -1.6668884 2067.2356 0.9999082
lam_interceptsbf    0.6758743 0.012506339 0.19317987   0.26568149  0.5452259  0.6983239  0.8282835  0.9760406  238.5961 1.0121001
sigma_interceptsbf  5.7897111 0.048241929 0.99520267   4.06988072  5.0616813  5.7189319  6.4164365  7.9513326  425.5727 1.0148572
b_zc               -6.7666514 0.036590039 2.14265064 -10.93965107 -8.1934853 -6.7952248 -5.3958791 -2.5028562 3429.0750 1.0002483
lam_interceptsbc    0.5474053 0.004493938 0.14104932   0.26275240  0.4494224  0.5530051  0.6497245  0.8055448  985.1173 1.0005217
sigma_interceptsbc  7.0709024 0.029210707 0.85017991   5.56182179  6.4846726  7.0204340  7.5860298  8.9123260  847.1055 1.0041661
b_zp               -1.3161851 0.021997504 0.75754529  -2.83795504 -1.7991514 -1.3268101 -0.8505484  0.2628632 1185.9609 1.0028960
lam_interceptsbp    0.3475606 0.014166708 0.22965387   0.01828947  0.1624168  0.3101506  0.5022728  0.8526711  262.7905 1.0249002
sigma_interceptsbp  2.4604110 0.022254232 0.42145808   1.69323230  2.1667801  2.4455523  2.7400241  3.3187662  358.6605 1.0104276
sigma_y            12.5750818 0.002729262 0.17999912  12.22709200 12.4514340 12.5716476 12.6965495 12.9375918 4349.6102 1.0002397

"n_eff / iter looks reasonable for all parameters"
[1] "Rhat looks reasonable for all parameters"
[1] "0 of 4000 iterations ended with a divergence (0%)"
[1] "0 of 4000 iterations saturated the maximum tree depth of 10 (0%)"
[1] "E-BFMI indicated no pathological behavior"

Angiosperms, lambda equal 0

                        mean     se_mean        sd       2.5%       25%       50%        75%      97.5%     n_eff      Rhat
a_z                31.796005 0.015890006 1.2728014  29.344963 30.944199 31.765060 32.6215022 34.2648010 6416.1300 0.9992967
sigma_interceptsa  16.306573 0.015155452 1.0063649  14.480794 15.599613 16.261427 16.9645501 18.3737171 4409.3358 1.0002113
b_zf               -7.350602 0.019886936 0.8468412  -9.003962 -7.908733 -7.350565 -6.8094274 -5.6708022 1813.2941 1.0000730
sigma_interceptsbf  5.114663 0.037194748 0.7605353   3.740357  4.590597  5.089010  5.6021568  6.7369401  418.0955 1.0019363
b_zc               -8.703028 0.014955234 0.8082515 -10.282033 -9.246122 -8.688199 -8.1619645 -7.1246700 2920.8324 1.0005406
sigma_interceptsbc  6.768274 0.027778531 0.8120911   5.223212  6.203654  6.756036  7.2938087  8.4803331  854.6553 1.0006152
b_zp               -1.256429 0.012782953 0.4566615  -2.100051 -1.571497 -1.256470 -0.9346289 -0.3456373 1276.2226 1.0030703
sigma_interceptsbp  2.361540 0.018156489 0.3599157   1.724175  2.109709  2.340033  2.5883584  3.1295219  392.9505 1.0048888
sigma_y            12.563860 0.002333488 0.1812416  12.220998 12.436961 12.562409 12.6843526 12.9257537 6032.6021 1.0000062

"n_eff / iter looks reasonable for all parameters"
[1] "Rhat looks reasonable for all parameters"
[1] "0 of 4000 iterations ended with a divergence (0%)"
[1] "0 of 4000 iterations saturated the maximum tree depth of 10 (0%)"
[1] "E-BFMI indicated no pathological behavior"

Angiosperms, lambda equal 1 (I doubled the iterations)

                        mean     se_mean        sd       2.5%       25%       50%        75%      97.5%      n_eff      Rhat
a_z                30.277335 0.068270771 8.0395915  14.396757 24.924323 30.243591 35.7188868 45.7480596 13867.5058 0.9998248
sigma_interceptsa  40.859234 0.040700206 2.7296836  35.698956 38.971335 40.781634 42.6583932 46.4335659  4498.1237 1.0002791
b_zf               -5.886590 0.037918204 2.9072340 -11.632152 -7.780380 -5.848392 -3.9791943 -0.2073095  5878.4723 1.0002404
sigma_interceptsbf  7.264870 0.056781235 1.2938708   4.998010  6.339600  7.200506  8.0926430  9.9939832   519.2448 1.0047167
b_zc               -6.436208 0.030832564 3.0032475 -12.349818 -8.458071 -6.449134 -4.4281425 -0.5802844  9487.7444 0.9999501
sigma_interceptsbc  7.994588 0.043521996 1.1483549   5.912170  7.192587  7.943586  8.7220321 10.4705199   696.2010 1.0048485
b_zp               -1.249395 0.016754814 1.1750324  -3.566583 -2.022259 -1.255572 -0.4622225  1.0800820  4918.3616 1.0003479
sigma_interceptsbp  2.773545 0.030777993 0.5374969   1.796114  2.401322  2.746072  3.1053093  3.9027831   304.9800 1.0084877
sigma_y            12.733207 0.001737975 0.1775833  12.394766 12.613744 12.731724 12.8527287 13.0915781 10440.4020 0.9998832

"n_eff / iter looks reasonable for all parameters"
[1] "Rhat looks reasonable for all parameters"
[1] "1 of 8000 iterations ended with a divergence (0.0125%)"
[1] "  Try running with larger adapt_delta to remove the divergences"
[1] "2 of 8000 iterations saturated the maximum tree depth of 10 (0.025%)"
[1] "  Run again with max_depth set to a larger value to avoid saturation"
[1] "E-BFMI indicated no pathological behavior"

Gymnosperms, lambda estimated


                         mean     se_mean        sd         2.5%         25%        50%        75%      97.5%     n_eff      Rhat
a_z                30.2256865 0.112785049 7.6388847  15.26603021  25.2367354 29.8998183 35.2264178 45.7535703 4587.2969 0.9994628
lam_interceptsa     0.4911762 0.004493682 0.2565673   0.03539310   0.2852377  0.5022033  0.7066333  0.9216469 3259.8520 0.9999109
sigma_interceptsa  22.2052543 0.131204988 6.8334137  12.49009015  17.3354916 21.0878736 25.7228761 38.5337514 2712.5311 1.0013225
b_zf               -6.8560293 0.111666220 4.9236670 -15.80267733 -10.1228692 -7.1165403 -4.0533101  3.9324037 1944.1677 1.0025051
lam_interceptsbf    0.3331731 0.004691138 0.2247961   0.01240057   0.1411251  0.3061017  0.4954662  0.7996907 2296.2604 1.0003758
sigma_interceptsbf  8.0654389 0.062392454 2.2510811   4.56619669   6.4431193  7.7538309  9.3672958 13.2620790 1301.7218 1.0011611
b_zc               -6.2352533 0.072786465 4.5739017 -15.13680234  -9.1450525 -6.3780387 -3.3689365  3.1884248 3948.8661 0.9997060
lam_interceptsbc    0.3548995 0.005097885 0.2488106   0.01094861   0.1462913  0.3172337  0.5286040  0.8847925 2382.0868 1.0003974
sigma_interceptsbc  8.7971079 0.070267919 2.6125775   4.20000468   6.9039834  8.6243366 10.4326359 14.4879361 1382.3696 1.0015859
b_zp                0.8067194 0.055738756 3.0446422  -5.61411356  -1.1162468  0.9464332  2.8525180  6.3994996 2983.7194 1.0006226
lam_interceptsbp    0.3590925 0.005260629 0.2454962   0.01617594   0.1510806  0.3211276  0.5355478  0.8743978 2177.7821 1.0026101
sigma_interceptsbp  6.0858368 0.075930649 2.1974098   2.24640802   4.5834381  5.8845239  7.4176201 10.9760500  837.5059 1.0032076
sigma_y            17.1881726 0.008561957 0.5452795  16.19043896  16.8094600 17.1721078 17.5378137 18.3120527 4055.9461 0.9997802

"n_eff / iter looks reasonable for all parameters"
[1] "Rhat looks reasonable for all parameters"
[1] "2 of 4000 iterations ended with a divergence (0.05%)"
[1] "  Try running with larger adapt_delta to remove the divergences"
[1] "0 of 4000 iterations saturated the maximum tree depth of 10 (0%)"
[1] "E-BFMI indicated no pathological behavior"

Gymnosperms, lambda equal 0

                        mean     se_mean        sd       2.5%        25%       50%       75%      97.5%    n_eff      Rhat
a_z                25.989206 0.057924182 4.2153416  17.470161 23.2406163 26.059564 28.846347 34.0716500 5295.973 1.0000331
sigma_interceptsa  16.677894 0.059814912 3.7727704  10.766086 13.9778966 16.198837 18.784704 25.3221642 3978.339 1.0009924
b_zf               -6.938407 0.062240865 3.1650509 -12.866728 -9.1266905 -7.060234 -4.920798 -0.2005887 2585.891 1.0015204
sigma_interceptsbf  7.204626 0.039041562 1.9148399   4.179500  5.8002729  6.969676  8.337232 11.7083304 2405.529 1.0010831
b_zc               -7.184062 0.040605900 2.8244163 -12.718080 -9.0740600 -7.171829 -5.279118 -1.7305661 4838.148 0.9997787
sigma_interceptsbc  8.227924 0.046487651 2.2504216   4.147575  6.6675130  8.132408  9.695886 12.9066894 2343.433 1.0006041
b_zp                1.017619 0.040046982 2.2787040  -3.634013 -0.4314259  1.070516  2.586993  5.2657292 3237.697 1.0004010
sigma_interceptsbp  5.527989 0.055109715 1.8883951   2.180981  4.2464600  5.387116  6.693959  9.6029437 1174.166 1.0024006
sigma_y            17.132596 0.005525257 0.5260042  16.159283 16.7646776 17.117660 17.478428 18.2083369 9063.029 1.0000526

"n_eff / iter looks reasonable for all parameters"
[1] "Rhat looks reasonable for all parameters"
[1] "17 of 8000 iterations ended with a divergence (0.2125%)"
[1] "  Try running with larger adapt_delta to remove the divergences"
[1] "0 of 8000 iterations saturated the maximum tree depth of 10 (0%)"
[1] "E-BFMI indicated no pathological behavior"

Gymnosperms, lambda equal 1

                       mean     se_mean        sd       2.5%         25%       50%        75%     97.5%     n_eff      Rhat
a_z                30.677405 0.109505409 9.4118381  11.955816  24.4841526 30.986210 36.8119487 49.025900 7387.1643 1.0007738
sigma_interceptsa  54.220964 0.135417178 9.1873824  38.158017  47.8430795 53.369317 59.9417408 74.542002 4602.9480 1.0008913
b_zf               -5.350638 0.245928453 8.0388053 -20.870916 -10.8468703 -5.435706  0.1963762  9.953317 1068.4777 1.0053484
sigma_interceptsbf 14.121534 0.110392572 2.7358623   9.368467  12.2464793 13.969342 15.9134533 19.749084  614.1985 1.0095934
b_zc               -5.048741 0.075481540 6.2702079 -17.627199  -9.1416852 -4.940118 -1.0382048  7.180208 6900.5290 0.9999788
sigma_interceptsbc 10.327531 0.060376031 2.7093221   5.437332   8.4201542 10.161171 12.0297462 16.155218 2013.6879 1.0008810
b_zp                1.185438 0.056728898 3.3886824  -6.145433  -0.8626672  1.446356  3.4488517  7.338692 3568.2317 1.0020747
sigma_interceptsbp  5.170297 0.130264926 2.7900472   1.066200   3.0066119  4.730851  6.8535121 11.601862  458.7416 1.0135208
sigma_y            17.525284 0.009989987 0.5392013  16.535879  17.1332286 17.504983 17.8848899 18.630119 2913.2121 1.0018391

"n_eff / iter looks reasonable for all parameters"
[1] "Rhat looks reasonable for all parameters"
[1] "762 of 12000 iterations ended with a divergence (6.35%)"
[1] "  Try running with larger adapt_delta to remove the divergences"
[1] "0 of 12000 iterations saturated the maximum tree depth of 10 (0%)"
[1] "E-BFMI indicated no pathological behavior"