lizzieinvancouver / localadaptclim

1 stars 0 forks source link

finalize models ... #16

Open lizzieinvancouver opened 2 years ago

lizzieinvancouver commented 2 years ago

@alinazeng I am looking at issue #10 .... to help figure what models to run. One big question is how much inference we have to fit garden versus species. Can you make a table of species (row) and garden (column) putting a 1 in each row where the species occurs?

alinazeng commented 2 years ago

Hi @lizzieinvancouver Let me know if this works. Thank you. species_garden.csv

species_garden

lizzieinvancouver commented 2 years ago

Okay, so the issue we're running into and why we have not settled on models is that we don't have a lot of information to estimate both species and garden. It would be nice to estimate both as you can imagine a garden location may somewhat effect the day of year, and species likely should have different inherent leafout dates and slopes with MAT, latitude etc..

Given that most gardens observe just ONE species you can imagine it's pretty hard for a model to know if variation in a garden is due to the garden or the species. I have been running models this weekend (some notes in script_stan_glmer.R) and am impressed that Stan is estimating these effects at all! However, I am not sure I trust these models. So, what to do?

option a) I suggest we focus on estimating species effects, and do not include garden_identifier in our model. I expect a larger effect from species and some of our questions are specific to species so we need species in the model.

option b) We could run model comparisons. For example compare fit2 and fit4 here:

fit2 <- stan_glmer(spring_event~(lat_prov|species), data = d)
fit4 <- stan_glmer(spring_event~(lat_prov|species) + (1|garden_identifier), data = d)

Using loo, which can tell you which model is more predictive. We may need loo later to compare models (with different predictors, such as MAT versus latitude), so it's worth looking into, but I don't think it's what we should do here.

For now I pretty strongly feel that we should go with species and have random slopes (I also checked that it's not dramatically changing our estimates, see R code). That would be something like: fit2 <- stan_glmer(spring_event~(lat_prov|species)*prov_continent, data = d) For all predictors.

However, in running models and thinking on this more, I don't think we should include the prov_continent in the models. Why? Because species (or garden) is nested within continent; that is, a species occurs on one continent only in our data and a garden only occurs on one continent, but the way the model is formulated it means any species (or garden) could be on either. This is not true, and I found no easy way to tell the model otherwise (I also asked a colleague). While this is a common problem in ecology and evolution, that is often ignored, I don't think we should ignore it.

So what to do about this issue? We can run the models without prov_continent and then extract the posteriors and group by continent to get the continent effect. I can add some code on this. For starters though it means our models would be versions of this:

fit <- stan_glmer(spring_event~(lat_prov|species), data = d) But with a separate model for MAT, overlap etc.. And a separate set of models for fall events. Can you also try one model with TWO predictors. fit2 <- stan_glmer(spring_event~((overlap*sd)|species), data = d) Once you have these fits you can plot the lines from the models for EACH species on the new plots you've made (each species will have a unique intercept and slope, you can extract them from rstanarm and feed them into ggplot via attributes or such).

Side note: I realize this leads to some further issues about what to do with studies of species across North America and Europe (on Doug fir?). We should discuss. I think maybe we can just do a small analysis of this species ALONE.

I will work later on code to extract the posteriors so we can estimate the continent-level effects. We can estimate the deciduous versus evergreen effects the same way!

We're definitely getting there.

lizzieinvancouver commented 2 years ago

Check out my notes regarding the above also in commit 3ca14eb0e5a5cc3826243d97c7852641139674ef

And two more small notes:

I think you only need stan_lmer ... you don't need stan_glmer. Wait! I started running your models and it defaults to stan_lmer, so rstanarm is taking care of it for you (and thus using the stan_glmer command is fine).

Also you can run these models faster if you use more cores (depending on how many cores your computer has).

alinazeng commented 2 years ago

@lizzieinvancouver Hi lizzie. Thanks for the code. Very quick question:

  1. do we want to use spring_event (doy) for the models or spring_event_difference (corrected doy)?
  2. is there a quick way to put the values of coef(fit) or summary(fit) into a dataframe? I manually pasted the slope and intercept values into a table and it takes so much time.
  3. "Once you have these fits you can plot the lines from the models for EACH species on the new plots you've made (each species will have a unique intercept and slope, you can extract them from rstanarm and feed them into ggplot via attributes or such)."

the locations of the scatter points would not change, right? I just need to plot in the lines of unique intercepts and slopes -> if so, I plotted the slopes and intercepts derived from stan_glmer(spring_event~(lat_prov|species), data = d) as a trial run.

lat~spring_lines_added_fitA

let me know if this looks right (there's no line going through Quercus or Tsuga... possibly because I ran out of linetypes.

*it be wonderful if you could get back to me today or tomorrow cuz I have tomorrow scheduled to work on the models :))) thanks a lot

lizzieinvancouver commented 2 years ago

@alinazeng Nice work!

Right, the scatterplots do not change. Your plot looks good, save for the missing lines.

And no (oh no!) you should extract the coef ...

coef(fit)$species
test <- as.data.frame(coef(fit)$species)
row.names(test)

As for Question 1: I think the spring_event_difference is the most important one so focus on that. For the paper, I suspect someone will want to know the answer is not dramatically different using spring_event (doy) so you'll want that model output in the paper somewhere.

I aim to get you more code and info on extracting posteriors by the end of the weekend (I hope).

alinazeng commented 2 years ago

Hi Lizzie, @lizzieinvancouver

I have updated plots for the one predictor model fit <- stan_glmer(spring_event~(lat_prov|species), data = d) plotted for lat, mat, overlap percentage and overlap standard deviation, and the same for fall events.

I have saved the intercepts and slopes for each species for each model that I have run so far in this csv

The plots are as follows. I will try to do the model with two predictors later this week when I have time to spare. Let me know how these plots can be improved.

(p.s. IBS presentation is due in two weeks, I plan on drafting the script by May 15).

lat~spring_lines_added_fitA

lat~spring_diffo_lines_added_fitA

lat~fall_lines_added_fitA

lat~fall_diffo_lines_added_fitA

mat~spring_lines_added_fitA

mat~spring_diffo_lines_added_fitA

mat~fall_lines_added_fitA

mat~fall_diffo_lines_added_fitA

overlap~spring_lines_added_fitA

overlap~spring_diffo_lines_added_fitA

overlap~fall_lines_added_fitA

overlap~fall_diffo_lines_added_fitA

sd~spring_lines_added_fitA

sd~spring_diffo_lines_added_fitA

sd~fall_lines_added_fitA

sd~fall_diffo_lines_added_fitA

lizzieinvancouver commented 2 years ago

@alinazeng These look great! I am surprised how the correction works on fall event (it reverses some relationships). Based on this I suggest you focus on presenting the uncorrected DOY for all events (spring and fall) in your talk.

I wrote up sample code so you can calculate the effect of continent and leaf type from these new models. Let me know if you have questions.

## Extracting model fits sidebar ...
# Following here: https://mc-stan.org/rstanarm/reference/as.matrix.stanreg.html

fit5 <- readRDS("notposting_exactly/notposting_exactly.fit5.RDS")
draws <- as.matrix(fit5)
# the full posterior for Alnus rubra slope:
alnrublope <- draws[,"b[lat_prov species:Alnus_rubra]"]
hist(alnrublope)

# To get the estimate for continent or evergreen/deciduous:
# Here I show an incomplete example for evergreen/deciduous,
# but we especially want to do for continent (group species by continent)
# Add the posteriors of all the species in one group ...
# Example ... many ways to do this, I just created different vectors (I made very incomplete lists!)
colnames(draws)
evergreens <- c("Picea_engelmannii",
                "Picea_mariana",
                "Tsuga_heterophylla",
                "Pseudotsuga_menziesii") # put the rest of the species names in ... 
deciduous <- c("Alnus_rubra",
               "Betula_papyrifera",
               "Fagus_sylvatica") # put the rest of the species names in ...

# Get all the species of one type now
evergreensdraws <- matrix(data=NA,
                          nrow=nrow(draws),
                          ncol=length(evergreens))
for (i in c(1:length(evergreens))){
    evergreensdraws[,i] <- draws[,paste("b[lat_prov species:", evergreens[i], "]", sep="")]
    }

deciduousdraws <- matrix(data=NA,
                          nrow=nrow(draws),
                          ncol=length(deciduous))
for (i in c(1:length(deciduous))){
    deciduousdraws[,i] <- draws[,paste("b[lat_prov species:", deciduous[i], "]", sep="")]
    }

# Now you can manipulate these posteriors with basic math ...  
# We want the mean across all of each type -- you need to take the rowMeans beacause the samples are related and so you keep the rows together
evergreenspost <- rowMeans(evergreensdraws)
deciduouspost <- rowMeans(deciduousdraws)

# And then you can plot them as we did before
leaftypeplot <- as.matrix(cbind(evergreenspost, deciduouspost))
mcmc_areas(leaftypeplot)
lizzieinvancouver commented 2 years ago

@alinazeng Also the trends generally look the same or stronger using uncorrected DOY, which makes me think we should focus on that perhaps for the talk and paper and put the 'corrected' DOY as supplements, but let's see how things look in the end with everything.

alinazeng commented 2 years ago

@lizzieinvancouver Morning Lizzie.

fit5 <- readRDS("notposting_exactly/notposting_exactly.fit5.RDS")

did you meant to upload this file to the repo? I dont know where to get this .rds file.

Thank you.

lizzieinvancouver commented 2 years ago

@alinazeng Sorry, I saved some of the output before and am just reading it back in... you could equally do:

fit5 <- stan_glmer(spring_event~(lat_prov|species), data = d) 
alinazeng commented 2 years ago

@lizzieinvancouver thanks lizzie.

I updated the species list and did the same for continents.

Here are the histograms. Do they mean anything? (also, is there a way to make these plots look nicer? adjust the labels and stuff, I do not know base R very well) leaf_type_plot

continentplot

lizzieinvancouver commented 2 years ago

@alinazeng You can interpret them the same way you interpreted these plots when they were model coefficients, so I would say that evergreen and deciduous trees both do not show a strong relationship with lat provenance (if that's the model you're plotting)? Decid does seem to have a weakly positive relationship. I don't know what model you're plotting for sure though so hard to interpret.

Similar for the second plots, but even more similar and more close to 0 (no effect).

I don't know bayesplot, so I cannot help you make better graphs, but there's lots of info online.

alinazeng commented 2 years ago

@lizzieinvancouver Thanks! Is there a way to save the fit model? I dont want to have to run them every time I open Rstudio :))

Here are the updated histograms.

continent_effect_lat_spring continent_effect_lat_fall continent_effect_mat_spring continent_effect_mat_fall continent_effect_percentage_spring continent_effect_percentage_fall continent_effect_sd_spring continent_effect_sd_fall

leaf_effect_lat_spring

leaf_effect_lat_fall leaf_effect_mat_spring leaf_effect_mat_fall leaf_effect_percentage_spring leaf_effect_percentage_fall leaf_effect_sd_spring leaf_effect_sd_fall

lizzieinvancouver commented 2 years ago

Thanks! Is there a way to save the fit model? I dont want to have to run them every time I open Rstudio :))

Yes! Check out:

?saveRDS

This will let you save the model object (e.g., fit5) and reload later with readRDS. You can also save as .Rdata. I am not sure what is better.

lizzieinvancouver commented 2 years ago

@alinazeng Very cool results! I encourage you to work on what you think they mean and I can take a look.

alinazeng commented 2 years ago

@lizzieinvancouver

Hi Lizzie could you help reword/edit this? I have trouble wording my interpretation.

Continental effects in relation to provenance latitude on fall events are way stronger than spring events. There is no effect on North American provenances and gardens, but there is a very small effect on European provenances and gardens. For fall events, North American studies experience stronger effects both in terms of provenance latitude and mean annual temperature. We do not observe much continental effect on spring DOY in relation to climate overlap percentage, but it does seem to be a stronger effect on North American studies regarding fall DOY. While there is no effect associated with climate overlap standard deviation and spring DOY, we do observe a huge effect on fall DOY, with stronger effects in North America than in Europe.

I am not sure how to interpret the leaf type plots: are they suggesting that there are stronger effects associated with gymnosperms?

lizzieinvancouver commented 2 years ago

@alinazeng If you can put the plots with the sentences then I can help, but I would like to check the interpretation at the same time I edit the text. Thank you!

alinazeng commented 2 years ago

@lizzieinvancouver Morning.

Continental effects in relation to provenance latitude on fall events are way stronger than spring events. There is no effect on North American provenances and gardens' spring events, but there is a very small effect on European provenances and gardens.

continent_effect_lat_spring

For fall events, North American studies experience stronger effects both in terms of provenance latitude and mean annual temperature.

continent_effect_lat_fall continent_effect_mat_spring continent_effect_mat_fall

We do not observe much continental effect on spring DOY in relation to climate overlap percentage, but it does seem to be a stronger effect on North American studies regarding fall DOY.

continent_effect_percentage_spring continent_effect_percentage_fall

While there is no effect associated with climate overlap standard deviation and spring DOY, we do observe a huge effect on fall DOY, with stronger effects in North America than in Europe.

continent_effect_sd_spring continent_effect_sd_fall

I am not sure how to interpret the leaf type plots: are they suggesting that there are stronger effects associated with gymnosperms for all models but the first one (leaf type ~ spring doy in relation to provenance latitude?

leaf_effect_lat_spring

leaf_effect_lat_fall leaf_effect_mat_spring leaf_effect_mat_fall leaf_effect_percentage_spring leaf_effect_percentage_fall leaf_effect_sd_spring leaf_effect_sd_fall

alinazeng commented 2 years ago

@lizzieinvancouver Also, for the models with TWO predictors that you wanted me to try.

fitB_spring_percentage_sd <- stan_glmer(spring_event~((percentage*sd)|species), data = d)
fitB_fall_percentage_sd <- stan_glmer(fall_event~((percentage*sd)|species), data = d)

is that it? Or should I also pair other predictors too?

Thank you.

lizzieinvancouver commented 2 years ago

Also, for the models with TWO predictors that you wanted me to try.

fitB_spring_percentage_sd <- stan_glmer(spring_event~((percentage*sd)|species), data = d)
fitB_fall_percentage_sd <- stan_glmer(fall_event~((percentage*sd)|species), data = d)

is that it? Or should I also pair other predictors too?

Thank you.

Yes! This is what I was thinking of.

lizzieinvancouver commented 2 years ago

@alinazeng Thank you!

You want to explain the biology not the stats, so as much as you can avoid terms like 'slope' or 'effects' etc.. You can't ALWAYS avoid them, just avoid them when you can and START/leaf with the biology.

I will start by telling you how I would interpret each plot from top to bottom. Can you then update the text and I will work on it?

1) Spring events (leafout?) is not related to latitude (in Europe or North America). 2) In contrast, fall events advance (? Check the scatterplots for my interpretations throughout ... it is late here: northern sites budset first?) strongly with latitude. This relationship, however, is driven mostly by North America where fall events advance ~4 days per degree you move north [I would say weak to nonexistent relationship in Europe]. 3) Same as 1) but with MAT. 4) Same as 2) but with MAT and the relationship is positive so ... events delay at warmer sites (again, check with other plots).

Here in your talk, you might transition by talking about coarse metrics such as latitude and MAT and how what we think these stand in for is likely how similar these climates are in times that matter for the events: So for spring, spring months. Explain overlap metric.

5) With this new climate metric, we again see that spring events are not related to climate on average. This is especially clear in Europe, but there is a weak delay (or just divergence?) in spring events as overlap increases in North America ... 6) fall events delay (CHECK ... or just divergence? -- plot the lines on the data) as overlap increases. Again, this is driven by North America.

7) Effects [see, I used it!] of latitude on spring events are similar across angiosperms and gymnosperms. 8) Effects of latitude on fall events are very similar across angiosperms and gymnosperms. 9) Effects of MAT on spring events weakly diverge: suggesting spring events get earlier with latitude in angiosperms (CHECK), and delay with latitude in gymnosperms. 10) Fall events delay in warmer locations for both angiosperms and gymnosperms, but slightly more strongly for gymnosperms. 11) Very weak effects of overlap on spring events are nearly identical similar across angiosperms and gymnosperms. 12) fall events diverge as overlap declines for both angiosperms and gymnosperms, but slightly more strongly for gymnosperms. 13) Effects of latitude on spring events are not related to type. 14) You can try this one?

alinazeng commented 2 years ago

Yes! This is what I was thinking of.

Hi Lizzie. @lizzieinvancouver I am unsure about how to plot the models with two predictors. If Spring DOY is on one axis, what should be on the other axis?

Are you in France at the moment? Thanks for the interpretation, I will update the text and write to you tomorrow.

lizzieinvancouver commented 2 years ago

@alinazeng I am in France! I am in Avignon this week (and working too late).

alinazeng commented 2 years ago

@lizzieinvancouver

Thank you. I have slightly updated the interpretation for the plots. Went and checked the scatter plots and clarified whether there is an effect or a divergence. Lizzie maybe you could look through it again and finalize the wording?

Question: a. If we want to avoid using the term effect, then how should I title each plot? any suggestion? b. I am unsure about how to plot the models with two predictors. If Spring DOY is on one axis, what should be on the other axis?

Let me know if you want me to insert the plots in between text if that's any easier.

  1. Spring events (budburst and leafout) are not related to provenance latitude in Europe or North America.

  2. In contrast, fall events (budset, leaf senescence, leaf abcission) advance strongly with provenancelatitude. The higher the latitude (provenances further to the North), the earlier fall events tend to happen.This relationship, however, is observed mostly in North America where fall events advance ~4 days per degree we move north. In Europe, such relationship is weak (advance ~0.5 days per degree we move north).

  3. Spring events (budburst and leafout) are not related to provenance mean annual temperature in Europe or North America.

  4. In contrast, fall events (budset, leaf senescence, leaf abcission) advance strongly with provenance mean annual temperature. The lower the mean annual temperature, the earlier fall events tend to happen.This relationship is observed mostly in North America where fall events advance ~6 days per degree celsius lower the mean annual temperature gets. In Europe, such relationship is weak (advance ~0.5 days per degree celsius lower the mean annual temperature gets).

^^my esl braincell worded this terribly^^ please help :)))

Coarse metrics such as latitude and MAT are ultimately representing how similar the climates are between the provenances and gardens in times that matter for the events
For spring events, we look at March, April, and May -> spring months -> Overlap metrics will be explained in Methodology section -> spring months -> March April May -> Overlap metrics suggest climate similarity and therefore, similar timing of events?

  1. With this new climate metric, we again see that spring events are not related to climate on average, with significant divergence in both Europe and North America.

  2. Fall events delay as overlap increases in both North America and Europe (we observe more pronounced delay in North America).

  3. Effects of latitude on spring events are similar across angiosperms and gymnosperms.

  4. Effects of latitude on fall events are very similar across angiosperms and gymnosperms.

  5. Effects of MAT on spring events weakly diverge: suggesting spring events get earlier as MAT increases in angiosperms, and delay as MAT increases in gymnosperms (except for Pseudotsuga menziesii).

  6. Fall events delay in warmer locations for both angiosperms (~3.8 days) and gymnosperms (6.2 days), but slightly more strongly for gymnosperms.

  7. Very weak effects of climate overlap on spring events, nearly identical similar across angiosperms and gymnosperms.

  8. Fall events diverge as climate overlap declines for both angiosperms and gymnosperms, but slightly more strongly for gymnosperms.

  9. Effects of latitude on spring events are not related to leaf type.

  10. I am not fully understanding why we are using climate overlap standard deviation as a predictor so I still don't know how to word the results shown in plot 14. Lizzie your wording for plot 13 goes " effects of latitude on spring events", are we treating standard deviation as effects of latitude?

lizzieinvancouver commented 2 years ago

Question: a. If we want to avoid using the term effect, then how should I title each plot? any suggestion? b. I am unsure about how to plot the models with two predictors. If Spring DOY is on one axis, what should be on the other axis?

a) plots don't usually have titles in papers or talks ... also it is okay to use effect, just try not to use it when you can something more biological in the same number of words. b) You have to make two plots. If there is an interaction effect, you have to make interaction plots see here -- since both of yours are continuous you would need to subset one to high and low values (top 20%, bottom 20% for example).

lizzieinvancouver commented 2 years ago

In contrast, fall events (budset, leaf senescence, leaf abcission) advance strongly with provenance latitude. The higher the latitude (provenances further to the North), the earlier fall events tend to happen.

In contrast, fall events (budset, leaf senescence, leaf abcission) advance strongly with provenance latitude, meaning fall events are earlier at higher (more northern) latitudes.

Same change for similar sentences. Also for your paper or slide you can extract the mean estimate of the slope and report that instead of ~4 or such.

lizzieinvancouver commented 2 years ago

Coarse metrics such as latitude and MAT are ultimately representing how similar the climates are between the provenances and gardens in times that matter for the events For spring events, we look at March, April, and May -> spring months -> Overlap metrics will be explained in Methodology section -> spring months -> March April May -> Overlap metrics suggest climate similarity and therefore, similar timing of events?

We hope that coarse metrics such as latitude and MAT ultimately represent how similar the climates are between the provenances and gardens in times that matter for the events. If climates are very similar, then we would expect similar timings. [We could add more here someday.] To this end, we estimated climate overlap in relevant months: For spring events, we considered overlap across March to May.

I am not fully understanding why we are using climate overlap standard deviation as a predictor so I still don't know how to word the results shown in plot 14. Lizzie your wording for plot 13 goes " effects of latitude on spring events", are we treating standard deviation as effects of latitude?

Me neither! That's why I asked for the SD*overlap. I would say we should skip interpreting the SD models alone. It really depends on what overlap and SD are doing together.

Other than above, things look pretty good to me!

alinazeng commented 2 years ago

Thank you! @lizzieinvancouver

Also for your paper or slide you can extract the mean estimate of the slope and report that instead of ~4 or such.

Done.

Questions:

You have to make two plots. If there is an interaction effect, you have to make interaction plots see here -- since both of yours are continuous you would need to subset one to high and low values (top 20%, bottom 20% for example).

  1. Would you say it makes more sense to subset percentage or sd to high and low values? And how do I accomplish that while plotting? I tried to use mdrt.values argument but couldn't get it to work.

plot_model(fitB_spring_percentage_sd, type = "pred", terms = c("percentage", "sd"),mdrt.values = "minmax")

this plot clearly looks wrong :))) two_predictors_testrun

  1. Do you have any critiques for how the other plots look right now (scatterplots with lines and the histograms)? I will get rid of the titles, but is there anything you'd like to change with color, font size, etc.? Are they okay enough for publication purposes?

  2. What title should we use for the paper/talk? since we are no longer talking about just spring events. Currently we have Changes and trends in budburst and leaf flush across Europe and North America: A meta-analysis of local adaptation in spring phenology studies

  3. For the paper/talk, do we want Results & Discussion as ONE section or TWO? I am unsure if I have too much to say for the discussion part.

lizzieinvancouver commented 2 years ago
  1. Sorry, I sent the pages as a guide. However, it may be that there is not a package that will do this for you with Stan models. You may likely need to make the plots yourself. I will put the basic steps below, but you likely need to read more about interactions to understand what you are doing. You can read about them in this book and other places.
  2. We likely need to combine some of the plots! As there too many for a paper (where you can have 4-6 figures, depending on journal). You will have to think about your messages and decide what goes in the main text and supp, but even in the supp, you will need multi-panel figures.
  3. I do not worry about the title now -- you can leave it as it is for the talk, and you can change the title once you have the paper further along. For the talk I guess you can mention in your title slide that you will also look at fall events, but I don't think you need to change the title.
  4. Two sections, as most journals want two sections. You may want to review this book.

Steps for plotting your own interactions:

First, there is no easy way to know if you should subset percentage or sd to high and low values. I generally just try both. I will call these just x1 and x2 below.

  1. Select which you will subset (x1 or x2). For our example, we pick x2.
  2. Label -- say -- the top 25% of x2 as 'high' and low 25% as 'low' (make a new column or such).
  3. Take the mean of low and high, we call them x2high and x2low.
  4. Now you have do the math of your equation: ypredicted = intercept + b1x1 + b2x2 + b3(x1*x2) ... make a set of x1 data from min(x1) to max(x1) with small steps -- so if my min was 5 and max was 25 I might do x1seq <- seq(5, 25, length.out=300)
  5. Now you can calculate what your equation PREDICTS for the sequence of x1 and for high and low x2 .... (you need a loop or apply command). For the first value of x1seq at low x2 it would be: ypredicted = intercept + b1x1[1] + b2x2[x2low] + b3(x1[1]*x2[x2low]) ... you have to extract from your model the intercept, b1, b2, b3. And then you need to plot your ypred for high and low x2 across the data (ypredicted = ypred = predicted y, so you can plot ypredicted ~ x ... with small enough divisions of x1, it will plot as a line).
alinazeng commented 2 years ago

@lizzieinvancouver

Thank you Lizzie! Please bear with my lacking knowledge of interaction models.

Now you have do the math of your equation: ypredicted = intercept + b1x1 + b2x2 + b3(x1*x2) ...

  1. are b1, b2 and b3 the slopes?
  2. does the equation go on after b3? so if I have 15 species the equation would be ypredicted = intercept + b1x1 + b2x2 + b3(x1*x2) ... + b15(I don't know what goes in the parenthesis at this point)
  3. may I confirm if none of intercept, b1, b2, and b3 is a constant? so the code would actually be: df <- as.data.frame(coef(fitB_spring_percentage_sd)$species) ypredicted <- df$intercept +df$slopex1[1] + df$slopex2[x2low] + df$slope(x1[1]*x2[x2low]) +...? I am unsure about where the loop or the apply command comes in.

Here is the .RDS of the spring two predictors model if you could help with the code using it that'd be great.

lizzieinvancouver commented 2 years ago
  1. Yes! Sorry (b_0 or alpha or a are often intercepts and b1, b2 etc. are slopes in the traditional notation, though the nomenclature is not standardized at all).
  2. No. Your b1, b2, b3 are the slopes you get for sd, overlap and sdoverlap -- you probably get these slopes using summary or such (not using coef) -- these are often called the main effects and the interaction. Those coef estimates are for each species relative* to the main effects and interaction and I do not think you need them for this plot.
  3. Yes, once you extract your slope and intercept estimates they're the same for that one equation; as is the x2low (or x2 high). But the x1[1] would need to be in a loop... (for (i in c(1:length(x1seq))){ ypredicted[i] <- df$intercept +df$slopex1[i] + df$slopex2[x2low] + df$slope(x1[i]*x2[x2low])).
alinazeng commented 2 years ago

Hi Lizzie @lizzieinvancouver thanks, I was able to calculate the predicted y.

My plots are very strange though... I am not sure where went wrong. Perhaps you could check my code?

Here are the plots

  1. making a sequence of percentage, and have low and high sd (the two lines are almost overlapping cuz too little difference) spring_sd_highlow_fitB

  2. making a sequence of sd, and have low and high percentage spring_percentage_highlow_fitB

Here is my code: https://github.com/lizzieinvancouver/localadaptclim/blob/main/Analyses/script_two_predictors.R

lizzieinvancouver commented 2 years ago

@alinazeng Could you paste into the issue your summary model output?

I can't check your code for a few days, so I suggest you first look at the numbers of your model output and the math you're doing to get these lines. Before I check your code I need to understand from you what the math says. Perhaps these plots are correct ... you should be able to tell from the math. Take some time and review the texts I sent also.

alinazeng commented 2 years ago

@lizzieinvancouver

Below is the summary output of the spring event model with two predictors.

(I will send your way my draft slides with speaker notes around May 18. It has been very hectic the past few days for me and I hadn't gotten a chance to work on the talk yet.

summary(fitB_spring_percentage_sd)

Model Info: function: stan_glmer family: gaussian [identity] formula: spring_event ~ ((percentage * sd) | species) algorithm: sampling sample: 4000 (posterior sample size) priors: see help('prior_summary') observations: 667 groups: species (15)

Estimates: mean sd 10% 50% 90% (Intercept) 112.0 6.1 104.4 112.0 119.5 b[(Intercept) species:Alnus_rubra] -29.7 7.2 -38.8 -29.8 -20.7 b[percentage species:Alnus_rubra] 0.1 0.1 0.0 0.1 0.2 b[sd species:Alnus_rubra] -0.1 0.6 -0.4 -0.1 0.2 b[percentage:sd species:Alnus_rubra] 0.0 0.0 0.0 0.0 0.0 b[(Intercept) species:Betula_papyrifera] -40.6 7.7 -49.5 -40.9 -32.4 b[percentage species:Betula_papyrifera] 0.1 0.1 -0.1 0.1 0.2 b[sd species:Betula_papyrifera] -0.2 0.8 -0.5 -0.1 0.3 b[percentage:sd species:Betula_papyrifera] 0.1 0.0 0.1 0.1 0.1 b[(Intercept) species:Fagus_sylvatica] 0.9 8.9 -10.4 0.9 12.1 b[percentage species:Fagus_sylvatica] 0.5 0.2 0.3 0.5 0.7 b[sd species:Fagus_sylvatica] 0.0 0.5 -0.3 0.0 0.3 b[percentage:sd species:Fagus_sylvatica] -0.1 0.0 -0.1 -0.1 0.0 b[(Intercept) species:Fraxinus_excelsior] 32.2 9.9 19.7 32.3 44.6 b[percentage species:Fraxinus_excelsior] 0.2 0.2 0.0 0.2 0.4 b[sd species:Fraxinus_excelsior] 0.1 0.7 -0.3 0.1 0.4 b[percentage:sd species:Fraxinus_excelsior] 0.0 0.0 -0.1 0.0 0.0 b[(Intercept) species:Picea_abies] 12.9 10.5 -0.6 12.9 26.3 b[percentage species:Picea_abies] 0.0 0.3 -0.4 0.0 0.4 b[sd species:Picea_abies] 0.1 0.5 -0.2 0.0 0.3 b[percentage:sd species:Picea_abies] 0.0 0.1 -0.1 0.0 0.0 b[(Intercept) species:Picea_engelmannii] -8.5 6.8 -17.1 -8.6 0.3 b[percentage species:Picea_engelmannii] 0.0 0.2 -0.3 0.0 0.3 b[sd species:Picea_engelmannii] 0.0 0.3 -0.2 0.0 0.2 b[percentage:sd species:Picea_engelmannii] 0.0 0.0 -0.1 0.0 0.1 b[(Intercept) species:Picea_mariana] 13.7 11.5 -0.7 13.5 28.7 b[percentage species:Picea_mariana] 0.0 0.2 -0.3 0.0 0.3 b[sd species:Picea_mariana] 0.0 0.3 -0.2 0.0 0.3 b[percentage:sd species:Picea_mariana] 0.0 0.0 0.0 0.0 0.1 b[(Intercept) species:Picea_sitchensis] 9.5 7.5 0.1 9.4 19.2 b[percentage species:Picea_sitchensis] 0.1 0.1 -0.1 0.1 0.3 b[sd species:Picea_sitchensis] 0.0 0.3 -0.1 0.0 0.2 b[percentage:sd species:Picea_sitchensis] 0.0 0.0 -0.1 0.0 0.0 b[(Intercept) species:Pinus_albicaulis] -15.9 7.3 -25.2 -15.9 -6.9 b[percentage species:Pinus_albicaulis] 0.1 0.3 -0.3 0.1 0.4 b[sd species:Pinus_albicaulis] -0.1 0.5 -0.3 0.0 0.2 b[percentage:sd species:Pinus_albicaulis] 0.0 0.1 -0.1 0.0 0.1 b[(Intercept) species:Pinus_ponderosa] 48.3 9.1 37.7 48.5 59.5 b[percentage species:Pinus_ponderosa] 0.1 0.3 -0.2 0.1 0.4 b[sd species:Pinus_ponderosa] 0.2 0.9 -0.4 0.1 0.6 b[percentage:sd species:Pinus_ponderosa] 0.0 0.1 -0.1 0.0 0.1 b[(Intercept) species:Populus_balsamifera] 0.4 13.6 -16.8 0.7 17.3 b[percentage species:Populus_balsamifera] 0.0 0.2 -0.2 0.0 0.3 b[sd species:Populus_balsamifera] 0.0 0.3 -0.2 0.0 0.2 b[percentage:sd species:Populus_balsamifera] 0.0 0.0 0.0 0.0 0.0 b[(Intercept) species:Populus_trichocarpa] -34.5 7.3 -43.6 -34.6 -25.6 b[percentage species:Populus_trichocarpa] 0.0 0.1 -0.1 0.0 0.1 b[sd species:Populus_trichocarpa] -0.1 0.6 -0.4 -0.1 0.3 b[percentage:sd species:Populus_trichocarpa] 0.0 0.0 -0.1 0.0 0.0 b[(Intercept) species:Pseudotsuga_menziesii] 11.3 6.8 2.8 11.2 19.8 b[percentage species:Pseudotsuga_menziesii] 0.0 0.1 -0.2 0.0 0.2 b[sd species:Pseudotsuga_menziesii] 0.0 0.3 -0.1 0.0 0.2 b[percentage:sd species:Pseudotsuga_menziesii] 0.0 0.0 0.0 0.0 0.1 b[(Intercept) species:Quercus_petraea] 0.5 6.7 -8.1 0.3 9.0 b[percentage species:Quercus_petraea] -0.1 0.2 -0.4 -0.1 0.2 b[sd species:Quercus_petraea] 0.0 0.3 -0.2 0.0 0.2 b[percentage:sd species:Quercus_petraea] 0.0 0.1 -0.1 0.0 0.1 b[(Intercept) species:Tsuga_heterophylla] -6.9 8.6 -17.9 -6.8 3.9 b[percentage species:Tsuga_heterophylla] 0.5 0.2 0.3 0.5 0.7 b[sd species:Tsuga_heterophylla] -0.1 0.5 -0.4 0.0 0.3 b[percentage:sd species:Tsuga_heterophylla] -0.1 0.0 -0.1 -0.1 0.0 sigma 6.7 0.2 6.4 6.7 6.9 Sigma[species:(Intercept),(Intercept)] 509.1 199.8 297.3 478.5 761.9 Sigma[species:percentage,(Intercept)] -0.2 2.0 -2.5 -0.2 2.1 Sigma[species:sd,(Intercept)] 0.7 3.4 -3.5 0.9 4.8 Sigma[species:percentage:sd,(Intercept)] -0.2 0.4 -0.7 -0.1 0.3 Sigma[species:percentage,percentage] 0.1 0.1 0.0 0.1 0.2 Sigma[species:sd,percentage] 0.0 0.1 -0.1 0.0 0.1 Sigma[species:percentage:sd,percentage] 0.0 0.0 0.0 0.0 0.0 Sigma[species:sd,sd] 0.3 1.9 0.0 0.0 0.1 Sigma[species:percentage:sd,sd] 0.0 0.0 0.0 0.0 0.0 Sigma[species:percentage:sd,percentage:sd] 0.0 0.0 0.0 0.0 0.0

Fit Diagnostics: mean sd 10% 50% 90% mean_PPD 106.7 0.4 106.3 106.7 107.2

The mean_ppd is the sample average posterior predictive distribution of the outcome variable (for details see help('summary.stanreg')).

MCMC diagnostics mcse Rhat n_eff (Intercept) 0.2 1.0 926 b[(Intercept) species:Alnus_rubra] 0.3 1.0 763 b[percentage species:Alnus_rubra] 0.0 1.0 1261 b[sd species:Alnus_rubra] 0.0 1.0 236 b[percentage:sd species:Alnus_rubra] 0.0 1.0 1497 b[(Intercept) species:Betula_papyrifera] 0.3 1.0 526 b[percentage species:Betula_papyrifera] 0.0 1.0 826 b[sd species:Betula_papyrifera] 0.0 1.0 253 b[percentage:sd species:Betula_papyrifera] 0.0 1.0 751 b[(Intercept) species:Fagus_sylvatica] 0.2 1.0 1329 b[percentage species:Fagus_sylvatica] 0.0 1.0 2788 b[sd species:Fagus_sylvatica] 0.0 1.0 593 b[percentage:sd species:Fagus_sylvatica] 0.0 1.0 2338 b[(Intercept) species:Fraxinus_excelsior] 0.3 1.0 1560 b[percentage species:Fraxinus_excelsior] 0.0 1.0 4042 b[sd species:Fraxinus_excelsior] 0.0 1.0 291 b[percentage:sd species:Fraxinus_excelsior] 0.0 1.0 4155 b[(Intercept) species:Picea_abies] 0.2 1.0 2197 b[percentage species:Picea_abies] 0.0 1.0 4313 b[sd species:Picea_abies] 0.0 1.0 639 b[percentage:sd species:Picea_abies] 0.0 1.0 4903 b[(Intercept) species:Picea_engelmannii] 0.2 1.0 1087 b[percentage species:Picea_engelmannii] 0.0 1.0 1809 b[sd species:Picea_engelmannii] 0.0 1.0 1622 b[percentage:sd species:Picea_engelmannii] 0.0 1.0 2303 b[(Intercept) species:Picea_mariana] 0.2 1.0 2708 b[percentage species:Picea_mariana] 0.0 1.0 3507 b[sd species:Picea_mariana] 0.0 1.0 1079 b[percentage:sd species:Picea_mariana] 0.0 1.0 3667 b[(Intercept) species:Picea_sitchensis] 0.2 1.0 1265 b[percentage species:Picea_sitchensis] 0.0 1.0 5291 b[sd species:Picea_sitchensis] 0.0 1.0 735 b[percentage:sd species:Picea_sitchensis] 0.0 1.0 5306 b[(Intercept) species:Pinus_albicaulis] 0.2 1.0 931 b[percentage species:Pinus_albicaulis] 0.0 1.0 4283 b[sd species:Pinus_albicaulis] 0.0 1.0 310 b[percentage:sd species:Pinus_albicaulis] 0.0 1.0 4516 b[(Intercept) species:Pinus_ponderosa] 0.3 1.0 755 b[percentage species:Pinus_ponderosa] 0.0 1.0 3382 b[sd species:Pinus_ponderosa] 0.1 1.0 236 b[percentage:sd species:Pinus_ponderosa] 0.0 1.0 3460 b[(Intercept) species:Populus_balsamifera] 0.2 1.0 3810 b[percentage species:Populus_balsamifera] 0.0 1.0 3891 b[sd species:Populus_balsamifera] 0.0 1.0 3115 b[percentage:sd species:Populus_balsamifera] 0.0 1.0 4145 b[(Intercept) species:Populus_trichocarpa] 0.3 1.0 775 b[percentage species:Populus_trichocarpa] 0.0 1.0 855 b[sd species:Populus_trichocarpa] 0.0 1.0 309 b[percentage:sd species:Populus_trichocarpa] 0.0 1.0 1243 b[(Intercept) species:Pseudotsuga_menziesii] 0.2 1.0 1029 b[percentage species:Pseudotsuga_menziesii] 0.0 1.0 5516 b[sd species:Pseudotsuga_menziesii] 0.0 1.0 487 b[percentage:sd species:Pseudotsuga_menziesii] 0.0 1.0 5394 b[(Intercept) species:Quercus_petraea] 0.2 1.0 1041 b[percentage species:Quercus_petraea] 0.0 1.0 1899 b[sd species:Quercus_petraea] 0.0 1.0 2439 b[percentage:sd species:Quercus_petraea] 0.0 1.0 1860 b[(Intercept) species:Tsuga_heterophylla] 0.2 1.0 1254 b[percentage species:Tsuga_heterophylla] 0.0 1.0 2891 b[sd species:Tsuga_heterophylla] 0.0 1.0 403 b[percentage:sd species:Tsuga_heterophylla] 0.0 1.0 2790 sigma 0.0 1.0 6336 Sigma[species:(Intercept),(Intercept)] 5.0 1.0 1579 Sigma[species:percentage,(Intercept)] 0.0 1.0 1572 Sigma[species:sd,(Intercept)] 0.1 1.0 3129 Sigma[species:percentage:sd,(Intercept)] 0.0 1.0 2084 Sigma[species:percentage,percentage] 0.0 1.0 1613 Sigma[species:sd,percentage] 0.0 1.0 3497 Sigma[species:percentage:sd,percentage] 0.0 1.0 1631 Sigma[species:sd,sd] 0.1 1.0 230 Sigma[species:percentage:sd,sd] 0.0 1.0 455 Sigma[species:percentage:sd,percentage:sd] 0.0 1.0 1805 mean_PPD 0.0 1.0 3565 log-posterior 0.3 1.0 852

For each parameter, mcse is Monte Carlo standard error, n_eff is a crude measure of effective sample size, and Rhat is the potential scale reduction factor on split chains (at convergence Rhat=1).

lizzieinvancouver commented 2 years ago

@alinazeng Interesting ... this is similar to models I code, but definitely done differently. To get an overall effect of the interaction you would need to add all the posteriors of each species for the interaction (coefficients starting with b[percentage:sd). I also suggest you make a histogram of those mean estimates (that is, the numbers you see here instead of the full posterior) just to see if any species has a large effect. It doesn't look like it from quick glance.

alinazeng commented 2 years ago

@lizzieinvancouver Thanks Lizzie. Could you please elaborate on what this means exactly? Do I take the mean?

To get an overall effect of the interaction you would need to add all the posteriors of each species for the interaction (coefficients starting with b[percentage:sd).

lizzieinvancouver commented 2 years ago

@alinazeng Sorry, I have been using different terms across the issue. This is what we started discussing if you follow this issue up to my post 17 days ago (I wrote, "I will work later on code to extract the posteriors so we can estimate the continent-level effects. We can estimate the deciduous versus evergreen effects the same way!") and then 11 days ago I wrote "I wrote up sample code so you can calculate the effect of continent and leaf type from these new models. Let me know if you have questions" -- this was the code to extract the posteriors and add them ... so you need to follow what we did above (extract posteriors) in a slightly different way.

alinazeng commented 2 years ago

Thanks @lizzieinvancouver

Here are the plots. 1) I haven't done fall ones yet because I wanted to confirm if I should calculate overlap percentage and sd for fall months and if so, which months? 2) for the talk, could I not talk about the Corrected DOY? I fear I won't have time to go over that given the 12 min limit. 3) could you put the main models we are working with into words? I am unsure about how to verbally describe them.

fitA_spring_lat <- stan_glmer(spring_event~(lat_prov|species), data = d)
fitB_spring_percentage_sd <- stan_glmer(spring_event~((percentage*sd)|species), data = d)

Thanks a lot!

leaf_effect_percentage_sd_spring

continent_effect_percentage_sd_spring

lizzieinvancouver commented 2 years ago

@alinazeng

  1. I am looking for SD, percentage and SD:percentage on the Y axis -- for all the species (not just re-running what we did before; we could do that next). However, if easier you can just do these plots, but you need to do three sets for the three predictors in your model: SD, percentage and SD:percentage
  2. You can definitely skip talking about stuff that is not critical to the main message.
  3. [In your talk I would focus on setting up the question, then how you collected data, and very quickly your models, then revisit your questions and show your results. ] We used hierarchical Bayesian models where we partially pooled by species; then estimated effects of continent and functional type from posterior estimates.
alinazeng commented 2 years ago

Thanks! @lizzieinvancouver

I am looking for SD, percentage and SD:percentage on the Y axis -- for all the species (not just re-running what we did before; we could do that next).

Just to confirm, what do you want for x-axis? And should I do separate plots for sd, percentage, and sd:percentage or have all of them on the same plot?

Should I calculate overlap percentage and sd for fall months and if so, which months?

lizzieinvancouver commented 2 years ago

The x axis are the values after you take the mean across your posteriors (as they have been in the other plots you've made) the difference here is that you'll compare different predictors already in the model. So it would be great to have them all on the same plot.

I thought we picked the spring months based on comparing when the climates went above zero or such. Can you remind me of the methods? That will help for selecting fall months ... though fall events are more photoperiod controlled so I am not sure you need that analysis for this.

alinazeng commented 2 years ago

Hi Lizzie, Thank you. The talk recording is due May 23 night, so I actually have a few more days to refine things....

Will you be available to meet briefly May 20, 10 AM Vancouver time (which is 7 pm your time if you are in Avignon still? Please let me know (if this doesn't work, perhaps let me know your availability?))

Here are my draft draft slides. Haven't put much for discussions yet... would like your enlightenment. IBS Presentation Slides Alina Zeng.pdf

Thanks much!