lizzieinvancouver / ospree

Budbreak review paper database
3 stars 0 forks source link

Fix North America (models) #409

Open dbuona opened 3 years ago

dbuona commented 3 years ago

There are a lot of things that need to be fixed about North America, but lets start with our range models. Both stv and gdd2lf models throw 10-20 divergent transitions. Dan thinks its a problem with photoperiod parameter. Code is in popup folder: rangeleadin_osp.R

@lizzieinvancouver will try adding a NCP to our model and kick it to @cchambe12 for help.

lizzieinvancouver commented 3 years ago

@dbuona I think this is fixed (see commit df1c36fdd0e27d5219acebe698c7596c3b0da8d5) -- I at least fixed the one where 10 divergent transitions were mentioned. There was shape in the photoperiod and the forcing, I tried the photo NCP first (being hopeful) but it needed both for 0 div transitions. Please check it out if you can make time!

lizzieinvancouver commented 3 years ago

@dbuona I have no recollection of this issue! But I opened the R file and saw this at the bottom:

# Lizzie playing around with another model
# Can we include continent in our current model so we don't have to split them?
if(FALSE){
# add continent dummy variable to bb.stan ...
bb.stan$continentdummy <- 1
bb.stan$continentdummy[which(bb.stan$continent=="Europe")] <- 0
continentquick <- subset(bb.stan, select=c("complex.wname", "continentdummy"))
continentquick <- continentquick[!duplicated(continentquick), ]

bb.3paramcont.gddlf <- with(bb.stan, 
                      list(yPhenoi = resp, 
                           forcingi = force.z,
                           photoi = photo.z,
                           chillingi = chill.z,
                           species = latbinum,
                           N = nrow(bb.stan),
                           n_spec = length(unique(bb.stan$complex.wname)),
                           climvar=unique(bb.stan$Temp.SD.z),
                           continent=continentquick$continentdummy
                      ))

goober = stan('popUP/stan/joint_climvar_3paramwCont.stan', data=bb.3paramcont.gddlf,
                     iter = 4000, warmup=3000)

goobsumCont <- summary(goober)$summary
goobsumCont[grep("mu", rownames(goobsumCont)),]
goobsumCont[grep("betaTraitx", rownames(goobsumCont)),]
goobsumCont[grep("betaFS", rownames(goobsumCont)),]
goobsumCont[grep("betaPS", rownames(goobsumCont)),]
goobsumCont[grep("betaCS", rownames(goobsumCont)),]

    }

It seems to run okay ... at least on a brief poking around. Overall estimates of cues:

> goobsumCont[grep("mu", rownames(goobsumCont)),]
                mean    se_mean        sd       2.5%        25%        50%         75%      97.5%
muForceSp  -7.822617 0.08303630  1.940208 -11.459133  -9.156003  -7.875380  -6.5541803  -3.861461
muPhotoSp  -1.235230 0.07549334  2.079886  -5.324006  -2.619054  -1.229703   0.1687659   2.854667
muChillSp -12.800070 0.10339721  3.764572 -19.968837 -15.370325 -12.800035 -10.3013112  -5.169555
muPhenoSp  60.593725 1.85395311 30.741882  31.690120  36.535476  42.054593  91.6091632 121.096950
              n_eff     Rhat
muForceSp  545.9599 1.011434
muPhotoSp  759.0353 1.008701
muChillSp 1325.6035 1.002448
muPhenoSp  274.9561 1.019118

Then this thing that sounds important:

> goobsumCont[grep("betaTraitx", rownames(goobsumCont)),]
                       mean   se_mean       sd      2.5%       25%       50%       75%    97.5%
betaTraitxForcing -4.050649 0.1616783 3.746246 -10.93592 -6.682268 -4.102578 -1.561580 3.722677
betaTraitxPhoto    1.925220 0.1378281 3.881603  -5.47800 -0.777388  1.965556  4.527668 9.444354
betaTraitxChill   -2.009869 0.2047362 5.673513 -13.09619 -5.921583 -1.913027  1.847468 9.048912
                     n_eff     Rhat
betaTraitxForcing 536.8949 1.013292
betaTraitxPhoto   793.1336 1.006390
betaTraitxChill   767.9177 1.004256

And then there's betaFS, betaCS, betaPS ... which I think are what you need to add to the betaTraitx to get the North American estimates:

> goobsumCont[grep("betaFS", rownames(goobsumCont)),]
       mean     se_mean          sd        2.5%         25%         50%         75%       97.5% 
  3.7333861   0.2117132   4.8142118  -6.1251240   0.5817259   3.8167975   7.0861640  12.6516991 
      n_eff        Rhat 
517.0762837   1.0139966 
> goobsumCont[grep("betaPS", rownames(goobsumCont)),]
       mean     se_mean          sd        2.5%         25%         50%         75%       97.5% 
 -0.6943716   0.1838669   5.0087648 -10.3556288  -4.0282369  -0.7743164   2.7717651   9.0613890 
      n_eff        Rhat 
742.0858244   1.0073181 
> goobsumCont[grep("betaCS", rownames(goobsumCont)),]
       mean     se_mean          sd        2.5%         25%         50%         75%       97.5% 
  6.1184221   0.2614802   7.1286918  -7.7693951   1.3416197   5.9788705  10.9411490  20.2623842 
      n_eff        Rhat 
743.2622957   1.0033755 

Do the numbers and estimates make sense? This could be an easier all inclusive model to include, if so....

lizzieinvancouver commented 3 years ago

@dbuona Hello! And because this hurts my brain I am also pinging @cchambe12 @AileneKane @MoralesCastilla ... I have deduced that:

  1. I wrote the code that adds continent to the model (joint_climvar_3paramwCont.stan, effectively is days ~ alpha_sp + betaForce + betaChill + betaPhoto where betaForce, betaChill, betaPhoto all decompose into: alphaForcingSp[isp] + betaTraitxForcingclimvar[isp] + betaFSinter_fs[isp] ... and that inter_fs is climvar*continent) back in April.
  2. What's weird about it (which I am sure we discussed before) is that species is effectively nested within continent in real life, but we don't do anything about that in the model. So we estimate the effect across all species ... and I think that might be bad/weird or at least a definite reason that @dbuona would get different answers using teh data subsetted in pieces.
  3. I think that I also did not include a continent-only effect (just the interaction term), which is also weird, but adding a continent-only effect will likely not solve our problem.

I am not sure how to fix this! I don't think this modeling approach is okay as is, do you guys (maybe I am crazy and it's fine)? Other ideas are to run the two datasets separately or ... try to nest species within continent but I do not know how to do that and am not sure it will separate out the effects as we want?

dbuona commented 3 years ago

I can confirm these results are different than what you get running each continent separately.

rowname mean se_mean sd
EU betaTraitxForcing 2.9059298 0.38142453 8.5128219 EU betaTraitxPhoto 0.2297738 0.33291006 8.5513643 EU betaTraitxChill 3.5158592 0.32955781 8.8201720

NA betaTraitxForcing -2.9883411 0.03101558 1.5331734
NA betaTraitxPhoto -0.5529028 0.04248994 1.9151328
NA betaTraitxChill -5.7006044 0.13562867 4.3659333

lizzieinvancouver commented 3 years ago

@dbuona @cchambe12 @MoralesCastilla @AileneKane

Thinking more with the help of @FaithJones @legault ... I think this is sort of an inherent issue in biogeography and our options may be to:

  1. Run the two separate models.
  2. Go back to the old idea of running the model in steps .... so, right now we have this cool joint modeling approach where we divvy the forcing effect into some intrinsic forcing effect and some contribution from climate (of the range). That's cool. But we could also just estimate the species level cues and then ask, does that cue correlate with climate? So ... run the phenology model we have for OSPREE then add a line in the Stan code that is something like betaForce[sp]~climate + climate*continent... and have three lines for each of the cues.
lizzieinvancouver commented 3 years ago

I can confirm these results are different than what you get running each continent separately.

rowname mean se_mean sd EU betaTraitxForcing 2.9059298 0.38142453 8.5128219 EU betaTraitxPhoto 0.2297738 0.33291006 8.5513643 EU betaTraitxChill 3.5158592 0.32955781 8.8201720

NA betaTraitxForcing -2.9883411 0.03101558 1.5331734 NA betaTraitxPhoto -0.5529028 0.04248994 1.9151328 NA betaTraitxChill -5.7006044 0.13562867 4.3659333

@dbuona While it's a bit hacky to run the model together then two separate models, I am not opposed to it anymore (though it may freak reviewers out).

lizzieinvancouver commented 3 years ago

BTW, I don't think we can nest species within continent. Even if we have five continents, species would mostly still be confounded with continent .... that's the intrinsic biogeography issue we cannot get around.

dbuona commented 3 years ago

I can try and code up the above mentioned "sequential" model and see how it compares to the separate data set approach