Running traitors model for each trait

DeirdreLoughnan commented 3 years ago

@lizzieinvancouver @legault @FaithJones I have been working on the traitors model and could use help working out some kinks and with interpretation.

The general traitors model developed using the test data can be found here.

I started by running the model on sla using this stan code and this Rcode.

For SLA: the model runs and does not produce any error messages, but the n_eff of several of the muSp and muStudy are low (around 750) and the chains mix poorly. Such as for muSp[1]:

I have run the model with more iterations, but neither 8000 or 10000 improved the n_eff. The Rhat's are not bad though. Am I correct that this suggests an issue with the data itself and non-identifiability?

The low n_eff aside, the summary statistics are the following: Screen Shot 2021-08-03 at 8 43 41 PM

My interpretation of these results would be:

All cues advance bb
Chilling is still the strongest, but not as strong as in the bb ms, followed by forcing and photoperiod
The negative values for the betaTraitxCues suggests that increasing SLA leads to advances in bb with each cue
SLA is most responsive to photoperiod cues since it is most negative

FaithJones commented 3 years ago

Progress we need to make on this issue:

[ ] @legault is checking if we can use a substitute for height (stem density)
[ ] Geoff will then run the traitors model for stem height

We also need to:

[ ] Prior predictive check (or some other prior check) on SLA
[ ] run SLA model on real data and check output
[ ] Prior Predictive check (or some other prior check) on LNC
[ ] run LNC on real data and check model fit and output
[ ] Prior Predictive check (or some other prior check) on seedmass
[ ] run seed mass model on real data and check model fit and output

DeirdreLoughnan commented 3 years ago

@legault @FaithJones your comment about the SLA values being odd surprised me and I wanted to quickly check what data files you are using. In the files I was working with SLA ranges 1.4 to 138 and produces this histogram:

Are you using the cleaned data files with the data subsetted for the species of interest?

"input/try_bien_nodups_1.csv"
"input/try_bien_nodups_2.csv"

traitors.sp <- c("Acer_pensylvanicum", "Acer_pseudoplatanus", "Acer_saccharum", "Aesculus_hippocastanum", "Alnus_glutinosa", "Alnus_incana", "Betula_pendula", "Betula_populifolia", "Corylus_avellana", "Fagus_grandifolia","Fagus_sylvatica", "Fraxinus_excelsior", "Juglans_regia", "Populus_tremula", "Prunus_padus", "Prunus_serotina", "Quercus_alba", "Quercus_coccifera", "Quercus_ilex", "Quercus_petraea", "Quercus_robur", "Quercus_rubra", "Quercus_velutina", "Rhamnus_cathartica", "Sorbus_aucuparia", "Ulmus_pumila")

FaithJones commented 3 years ago

@DeirdreLoughnan why do we use this subset of species, again? Are these the species that we have phenology data for too?

legault commented 3 years ago

@DeirdreLoughnan

I don't know much about the traits, so I was curious if it is was odd looking? Do those values look reasonable? Should we expect two modes? Do terrestrial plants have SLAs > 100? If yes, great.

Relatedly, there seems to be a problem with the plant heights. Specifically, multiple rows of apparently different data with identical traitvalues. For example screenshot :

DeirdreLoughnan commented 3 years ago

@FaithJones these are the 26 species that we had a at least one observation for the six traits we decided to explore further in the PCA analysis. OSPREE has data for a lot more species, but we were limited by the completeness of the trait data.

@legault, I could see higher values of SLA for species with thin compound leaves, but that is not the case for the species with high SLA values (Quercus ilex and Acer pensylvanicum). High values close or over 100 would make sense if the units were cm^2/g, but less so for this data which is supposedly in mm^2/mg. There are only four entries with values over 60 mm^2/mg, so it is possible that these are typos.

I recall us discussing ages ago what to do about extreme values in the trait data. Am I correct in remembering that we decided not to exclude any data because our models would pool towards the species means? For this Quercus and Acer species we do have a lot of data (610 and 910 rows of SLA data respectively), the histograms of which is below:

Screen Shot 2021-09-27 at 10 19 31 AM

Regarding the height data, you are right that there is a lot of repetition in the height values. But that issue was in the original datafile as well and not an issue introduced by our cleaning. We do have a lot of height data (~65000 rows) and most of the height data, and the data in the issue you found above, comes from that dataset from Greg Reams. We could discuss moving forward without that dataset, but we would loose height data for three species.

We can discuss how we want to deal with these extreme values and data issues at our meeting Wednesday.

FaithJones commented 3 years ago

@legault is going to ask the Bien team about the duplicated height data. If we don't get answers then we may drop the suspect height data

FaithJones commented 2 years ago

@legault ran the full model with real SLA data, and the model seemed to fit OK. Our next steps are:

[x] I will chose which simulation code to use
[x] @DeirdreLoughnan will run prior predictive checks on the model
[ ] @legault will extract model parameter estimations so we can start interpreting results

FaithJones commented 2 years ago

@legault @DeirdreLoughnan - It looks like the model is generally doing a good job of estimating parameters. The only parameters it isn't doing a great job of is sigmapheno_y and muChillSp. I do think we need to get the model estimating values well using priors that are more systematically chosen though - at the moment the model sets teh priors to all be centred around simulated values.

simPosteriorHist_fullModel.pdf

FaithJones commented 2 years ago

I also ran the model and checked the posterior parameters for the model using wider priors for the phenology section of the model, and found very little difference. The model doesn't get worse with wider priors. simPosteriorHist_fullModel_widePriors.pdf

DeirdreLoughnan commented 2 years ago

Thanks @FaithJones, it is great that the model is working with wider priors!

I have done the prior predictive checks with Geoffs priors. The code can be found here. The plots for the prior predictive checks are in the traits/figures folder.

The plots for the cues are not as good as they were for the mean trait model, estimating almost half the dates to be below zero. As shown for chilling below: chillingPlotPrior_joint

DeirdreLoughnan commented 2 years ago

Thanks @legault for helping run the model for the rest of the traits. I can run the model for SLA and LNC if you have the time to run it for seed mass and SSD. In brainstorming the priors to start with, I came up with the following:

LNC: mu = 20, sigma = 5 SSD: mu = 0.5, sigma = 0.1 Seed mass: mu = 100, sigma = 250

Do these seem reasonable to everyone? I was unsure what a good prior would be for seed mass. The trait is the mass of 1000 seeds and our study includes several species with very large seeds, like chestnut and butternut.

FaithJones commented 2 years ago

@legault @DeirdreLoughnan running the code with twice as many reps caused 12 divergent transitions and no improvement in model fit.

legault commented 2 years ago

Weird. When I run simulation_traitsandmore.R with more replicates, I don't get any divergences.

legault commented 2 years ago

Fits for SeedMass, LNC, and SSD were added in eeb6c6e8

legault commented 2 years ago

Also, the fitted models are all on midge on home/data/Ospree_traits (see *_stan.RDS files)

lizzieinvancouver commented 2 years ago

This seems like pretty great progress! Suggested next steps:

[x] Run the simulated data with no more reps but up spp and study to 25 and see what happens to muchillsp ... also check that running the chains longer doesn't help.
[x] Run the phenology-only model and the traits-only models and plot the posteriors against each other (cue on y, trait on x) so we can get a sense of the data
[ ] Plot the model results! (Some notes on a similar model here, see March 26 image.

FaithJones commented 2 years ago

I ran the model with 25 species and 25 studies, using the wider priors. I got 88 divergent transitions, and the model parameter estimation was not improved.

DeirdreLoughnan commented 2 years ago

@FaithJones @legault @lizzieinvancouver I made some rough plots using the cue estimates from the ospree model and the mean trait values. Forcing vs mean traits Chilling vs mean traits Photoperiod vs mean traits

FaithJones commented 2 years ago

@legault I only see .RDS files for SLA and seed mass on Midge. Could you tell me where the fits are for LNC, and SSD so I can make sure I have a local copy before the retreat?

ALso @legault @DeirdreLoughnan - Has anyone run the model on clean height data? If so, where do I find the model fit data for this model?

DeirdreLoughnan commented 2 years ago

@FaithJones I thought that as a general rule we don't push model output. Is it not too big?

I have not got around to working on the height data yet, but have it on the to-do for today.

FaithJones commented 2 years ago

@DeirdreLoughnan you are right! We don't push output. @legault put some of the output on Midge, where I downloaded it from. It would be helpful if you could also save the height model output on Midge as well so we can all access it if we need next week without running any models. The folder on Midge is home/data/Ospree_traits

DeirdreLoughnan commented 2 years ago

@lizzieinvancouver @FaithJones @legault @md-garner We left the retreat with several more plotting tasks and additional analyses we would like to include. These tasks span several issues, but in general our next steps are:

[x] @legault will get the model running for log seed mass
[x] Once we have the model output for log seed mass, @DeirdreLoughnan will make the plots of the cue responses agains the beta/alpha/difference estimates and run Faith's code to look at the data vs posterior trait estimates
[x] Explore the relationship between early and late species. @DeirdreLoughnan will pull the Harvard Forest data and look for/request PEP data (this was dine before by Dan Flyn. See closed issue 429)
[x] @DeirdreLoughnan will run a basic PGLS on the mean traits vs cue estimates
[x] Start an outline of the results and identifying key references

Our goal is to have these tasks completed by December 3rd

legault commented 2 years ago

Updated seedmass model added in commit fc5e19d8

FYI, if anyone is still receiving n_effective warnings, try saving more iterations in the final stanfit object (e.g., change the value of (iteration - warming) from 1000 -> 1500 or 2000). This won't fix the problem for really inefficient MCMCs, but I have found it useful for the traitors work.

lizzieinvancouver commented 2 years ago

@DeirdreLoughnan Are we all set here? Can we close?

lizzieinvancouver / ospree

Running traitors model for each trait #417