Closed lizzieinvancouver closed 5 years ago
@lizzieinvancouver I have two (somewhat opposing) thoughts about this: 1) If it's relatively easy to fit the latitude model with just the "okay" studies, that seems like a useful thing to do. It may change things, though, if lots of species need to be excluded, but I suppose it would be useful to know this. 2) I think the model is fine as it is. Our study is not intended to be about population genetics or genetic differences in budburst responses. I think the definition of "provenance latitude" in the meta-data associated with OSPREE is reasonable and useful "Latitude of samples collected." I think the problem may be more in the name, which is misleading, than in the way data are listed in the database. My understanding was that, from fairly early on in the OSPREE database development, the intention of this column was not to strictly adhere to "provenance" in the way that people traditionally think of from botanic gardens or population genetic studies.
@AileneKane I think 1. would be good; re: 2 -- I think we can do that if we must but I'd prefer not to. When I helped Dan and Tim set up the database my intention of 'provenance' was to treat it as is traditionally done. I think what we strayed from that (understandably!) and the provenance column right now is not provenance as used in ecology or evolution, unfortunately. I think we also need to be more careful here -- the studies we have cited about HOW provenance may affect photoperiod have nothing to do with provenance where tissue was grown, they have to do with local adaptation. I'd rather get this right if we can.
The other thing to keep in mind with this is that provenance latitude (I.e. the latitude at which samples were collected) is used to pull climate data and estimate chilling.
@lizzieinvancouver @AileneKane I will run this new model version today and report back here. Thanks, Lizzie, for going through all of the papers and sorting them for me! This is a tricky situation though because it would mean changing large portions of the data including---as Ailene mentions---the climate data. I toyed with the idea of trying a new model where I pull climate data from the growing latitudes but then I realized we never thoroughly cleaned that column, nor would it capture the crucial field sampling dates. I would be happy to Skype today to discuss this more in depth once I get the new results from the new model.
@lizzieinvancouver In regards to spann04 and spiers74, should those have been weeded out when we removed crops? Do I need to fix that code?
By removing the less great studies, I received the following warnings:
Warning messages:
1: There were 253 divergent transitions after warmup. Increasing adapt_delta above 0.99 may help. See
http://mc-stan.org/misc/warnings.html#divergent-transitions-after-warmup
2: There were 1 chains where the estimated Bayesian Fraction of Missing Information was low. See
http://mc-stan.org/misc/warnings.html#bfmi-low
3: Examine the pairs() plot to diagnose sampling problems
4: The largest R-hat is 1.16, indicating chains have not mixed.
Running the chains for more iterations may help. See
http://mc-stan.org/misc/warnings.html#r-hat
5: Bulk Effective Samples Size (ESS) is too low, indicating posterior means and medians may be unreliable.
Running the chains for more iterations may help. See
http://mc-stan.org/misc/warnings.html#bulk-ess
6: Tail Effective Samples Size (ESS) is too low, indicating posterior variances and tail quantiles may be unreliable.
Running the chains for more iterations may help. See
http://mc-stan.org/misc/warnings.html#tail-ess
@cchambe12 Thank you for this! Could you push the code so I can see what's up?
Regarding spann04 and spiers74, they do appear to be commercial blueberries ... so maybe we we forgot to toss out commercial blueberries in our crop code? Would be great if you could check.
I don't think we need to worry so much about the chilling ... for the most part I think the provenance column should be correct for using that. At least in the papers I looked at yesterday it seemed okay.
@lizzieinvancouver Great about the chilling!! And yes, I did push it. It's nested within the earlier lat model code: analyses/lat_analysis/models_stan_lat.R
I will definitely check on the blueberries!
@cchambe12 Okay! Sorry, I see it now. Can you let me know what model you're running? I just tried m2l.inter and got only 4 divergent transitions ... I imagine those will go away with a higher number of iterations.
@lizzieinvancouver Weird!!! Me too! I'm rerunning it now on a new computer. I'll keep you posted! Hopefully something was just wonky the first time I ran it...
@lizzieinvancouver Woohoo!! Okay no idea what was going on but new warning messages below..
Warning messages:
1: There were 1 divergent transitions after warmup. Increasing adapt_delta above 0.99 may help. See http://mc-stan.org/misc/warnings.html#divergent-transitions-after-warmup 2: Examine the pairs() plot to diagnose sampling problems 3: Bulk Effective Samples Size (ESS) is too low, indicating posterior means and medians may be unreliable. Running the chains for more iterations may help. See http://mc-stan.org/misc/warnings.html#bulk-ess 4: Tail Effective Samples Size (ESS) is too low, indicating posterior variances and tail quantiles may be unreliable. Running the chains for more iterations may help. See http://mc-stan.org/misc/warnings.html#tail-ess
And check_all_diagnostics:
[1] "n_eff / iter looks reasonable for all parameters" [1] "Rhat looks reasonable for all parameters" [1] "1 of 4000 iterations ended with a divergence (0.025%)" [1] " Try running with larger adapt_delta to remove the divergences" [1] "0 of 4000 iterations saturated the maximum tree depth of 10 (0%)" [1] "E-FMI indicated no pathological behavior"
I'm going to increase the iterations to 4000 with 2000 warmup to see if it cleans things up just a bit. But overall I think we're okay! I'll rerun a new muplot after the higher iterations and push when it's compiled.
@cchambe12 Thanks, but I need the code you ran (not the warnings).
@lizzieinvancouver I'm just running models_stan_lat.R lines 6-168
I ran the model 4 times, only the first one had severe mishaps so I think I had something up in my R. I increased the warmup and iterations and then created a new muplot: analyses/lat_analysis/figures/latanalysis_rmdatasets.pdf. It looks very similar to the original plot but less variation across species.
@cchambe12
Regarding spann04 and spiers74, they do appear to be commercial blueberries ... so maybe we we forgot to toss out commercial blueberries in our crop code? Would be great if you could check.
@lizzieinvancouver I cleaned this up in master, should I make the change in bbculdesac as well and then rerun the main model?
Okay, I will probably trash my code -- it's running fine for me too (I upped warmup to 3K out of 4K iter) ... just so we have it, here is my output:
> if(use.zscore == TRUE){m2l.inter = stan('../lat_analysis/stan/winter_2level_lat.stan', data = datalist.lat,
+ iter = 4000, warmup=3000, control=list(max_treedepth = 15,adapt_delta = 0.99))}
>
> check_all_diagnostics(m2l.inter)
[1] "n_eff / iter looks reasonable for all parameters"
[1] "Rhat looks reasonable for all parameters"
[1] "0 of 4000 iterations ended with a divergence (0%)"
[1] "0 of 4000 iterations saturated the maximum tree depth of 10 (0%)"
[1] "E-FMI indicated no pathological behavior"
> #pl<- plot(m2l.iter, pars="b_", ci.lvl=0.5)
> launch_shinystan(m2l.inter)
Launching ShinyStan interface... for large models this may take some time.
Listening on http://127.0.0.1:7200
>
> m2l.inter.sum <- summary(m2l.inter)$summary
> m2l.inter.sum[grep("mu_", rownames(m2l.inter.sum)),]
mean se_mean sd 2.5% 25% 50% 75% 97.5%
mu_a_sp 27.513479 0.04861618 2.102037 23.249263 26.1363304 27.519140 28.8971971 31.551261
mu_b_force_sp -6.450839 0.02787499 1.241591 -8.829973 -7.2821422 -6.427563 -5.6690633 -3.932398
mu_b_photo_sp -1.703004 0.03628077 1.600930 -4.806581 -2.7597819 -1.764898 -0.6646035 1.509000
mu_b_chill_sp -9.087727 0.03575551 1.883463 -12.810002 -10.2985321 -9.096760 -7.8370546 -5.419063
mu_b_lat_sp -0.857695 0.07891213 1.901191 -4.330635 -2.1232454 -1.026549 0.2870450 3.176014
mu_b_pl_sp 1.694155 0.05381354 2.018184 -2.385062 0.4137491 1.645537 2.9688960 5.744989
n_eff Rhat
mu_a_sp 1869.4722 1.0017403
mu_b_force_sp 1983.9346 0.9996498
mu_b_photo_sp 1947.1149 1.0014073
mu_b_chill_sp 2774.7793 1.0004809
mu_b_lat_sp 580.4487 1.0112813
mu_b_pl_sp 1406.4962 1.0044025
> m2l.inter.sum[grep("sigma_", rownames(m2l.inter.sum)),]
mean se_mean sd 2.5% 25% 50% 75% 97.5%
sigma_a_sp 8.925233 0.034231239 1.5504495 6.293420 7.843566 8.805375 9.868062 12.255383
sigma_b_force_sp 4.701939 0.025811209 0.9875672 3.078785 4.001529 4.597984 5.285282 6.842125
sigma_b_photo_sp 5.381701 0.046114296 1.3935272 3.134519 4.414569 5.213919 6.133165 8.561501
sigma_b_chill_sp 7.589549 0.030762390 1.3971715 5.266960 6.626086 7.432849 8.369738 10.861380
sigma_b_lat_sp 3.963943 0.167054808 1.9522395 1.003804 2.538256 3.694080 5.147530 8.459794
sigma_b_pl_sp 6.169420 0.068996467 1.8467310 3.205506 4.851891 5.956756 7.214324 10.397314
sigma_y 14.964833 0.004126825 0.2970901 14.412300 14.757743 14.957816 15.164636 15.565699
n_eff Rhat
sigma_a_sp 2051.4929 1.0015096
sigma_b_force_sp 1463.9176 1.0011769
sigma_b_photo_sp 913.1871 1.0020824
sigma_b_chill_sp 2062.8104 1.0020448
sigma_b_lat_sp 136.5678 1.0444707
sigma_b_pl_sp 716.3967 1.0027910
sigma_y 5182.5598 0.9997286
... and:
> if(use.zscore == FALSE){
+ m2l.inter = stan('../lat_analysis/stan/winter_2level_lat.stan', data = datalist.lat.nonz,iter = 4000, warmup=3000, control=list(max_treedepth = 12,adapt_delta = 0.99))
+ }
>
> check_all_diagnostics(m2l.inter)
[1] "n_eff / iter looks reasonable for all parameters"
[1] "Rhat looks reasonable for all parameters"
[1] "0 of 4000 iterations ended with a divergence (0%)"
[1] "0 of 4000 iterations saturated the maximum tree depth of 10 (0%)"
[1] "E-FMI indicated no pathological behavior"
> #pl<- plot(m2l.iter, pars="b_", ci.lvl=0.5)
> #launch_shinystan(m2l.inter)
>
> m2l.inter.sum <- summary(m2l.inter)$summary
> m2l.inter.sum[grep("mu_", rownames(m2l.inter.sum)),]
mean se_mean sd 2.5% 25% 50% 75% 97.5%
mu_a_sp 27.513479 0.04861618 2.102037 23.249263 26.1363304 27.519140 28.8971971 31.551261
mu_b_force_sp -6.450839 0.02787499 1.241591 -8.829973 -7.2821422 -6.427563 -5.6690633 -3.932398
mu_b_photo_sp -1.703004 0.03628077 1.600930 -4.806581 -2.7597819 -1.764898 -0.6646035 1.509000
mu_b_chill_sp -9.087727 0.03575551 1.883463 -12.810002 -10.2985321 -9.096760 -7.8370546 -5.419063
mu_b_lat_sp -0.857695 0.07891213 1.901191 -4.330635 -2.1232454 -1.026549 0.2870450 3.176014
mu_b_pl_sp 1.694155 0.05381354 2.018184 -2.385062 0.4137491 1.645537 2.9688960 5.744989
n_eff Rhat
mu_a_sp 1869.4722 1.0017403
mu_b_force_sp 1983.9346 0.9996498
mu_b_photo_sp 1947.1149 1.0014073
mu_b_chill_sp 2774.7793 1.0004809
mu_b_lat_sp 580.4487 1.0112813
mu_b_pl_sp 1406.4962 1.0044025
> m2l.inter.sum[grep("sigma_", rownames(m2l.inter.sum)),]
mean se_mean sd 2.5% 25% 50% 75% 97.5%
sigma_a_sp 8.925233 0.034231239 1.5504495 6.293420 7.843566 8.805375 9.868062 12.255383
sigma_b_force_sp 4.701939 0.025811209 0.9875672 3.078785 4.001529 4.597984 5.285282 6.842125
sigma_b_photo_sp 5.381701 0.046114296 1.3935272 3.134519 4.414569 5.213919 6.133165 8.561501
sigma_b_chill_sp 7.589549 0.030762390 1.3971715 5.266960 6.626086 7.432849 8.369738 10.861380
sigma_b_lat_sp 3.963943 0.167054808 1.9522395 1.003804 2.538256 3.694080 5.147530 8.459794
sigma_b_pl_sp 6.169420 0.068996467 1.8467310 3.205506 4.851891 5.956756 7.214324 10.397314
sigma_y 14.964833 0.004126825 0.2970901 14.412300 14.757743 14.957816 15.164636 15.565699
n_eff Rhat
sigma_a_sp 2051.4929 1.0015096
sigma_b_force_sp 1463.9176 1.0011769
sigma_b_photo_sp 913.1871 1.0020824
sigma_b_chill_sp 2062.8104 1.0020448
sigma_b_lat_sp 136.5678 1.0444707
sigma_b_pl_sp 716.3967 1.0027910
sigma_y 5182.5598 0.9997286
And info on my datalist:
> str(datalist.lat)
List of 8
$ y : num [1:1437] 60 60 60 60 60 60 60 60 60 60 ...
$ chill: num [1:1437] 1.189 -0.693 0.582 -1.354 1.189 ...
$ force: num [1:1437] 0.146 0.146 0.146 0.146 0.146 ...
$ photo: num [1:1437] -0.293 -0.293 -0.293 -0.293 -0.293 ...
$ lat : num [1:1437] -1.22 -1.22 -1.22 -1.22 -1.22 ...
$ sp : num [1:1437] 14 14 14 14 14 14 14 14 14 14 ...
$ N : int 1437
$ n_sp : int 31
>
@cchambe12 @AileneKane I personally would vote to switch to this model because it actually tests what we mean to test (local adaptation of photoperiod cue). If we use the model we currently have people could discount it because we don't only use provenance studies, even though actually the answer holds up.
Sounds good! I can add this into the new ms version! If it’s possible to easily update your model table, cat, I’d love it! Otherwise, I can do it tomorrow. Thank you!
@AileneKane I'm going to update the table now unless you're working in the supp!
@AileneKane Table should be all set!
@cchambe12 Thank you cat!
Just checked Heide 93a, which is multiple provenances and correctly entered!
I think we're good to go!
@AileneKane @cchambe12 Thank you both for working on this! I think it's a strong response and hope the reviewer agrees.
R3 complained about use of provenance and I think R3 had a point. Provenance is really about local adaptation so if we mix studies with real provenance info and those without it (i.e., where we don't really know where the seed came from) I think we are mushing an important issue in our latitude model.
To that end I just ran the code at the top of models_stan_lat.R and looked at what studies it uses (@cchambe12 is this the correct code to use?) and reviewed each paper quickly (save for one of the heide93 papers, I cannot find both so we should check but based on the papers from As, I think we are okay).
Okay ...
Less great ...
As you can see the VAST majority of studies were really clear about provenance.
I am not sure what to do next and could use advice from @cchambe12 .... would it be possible to re-run the lat models excluding these 'less great' studies? Do we deal with this some other way?
Any help appreciated! @AileneKane @dbuona @cchambe12 ... all thoughts and opinions welcome.
Side note: I thought we were more careful about what we put in 'provenance' but not really -- it seems you can guess when it's right if 'population' is also entered but we should be clear what we mean here when we post the data to KNB.