Structure in sigma_interceptsbf and sigma_interceptsbp

lizzieinvancouver commented 3 years ago

I have tried the following (see lessmini_threeslopeintercept.stan and phlyo_opsree_compact.R, there's also phlyo_opsree_serve1.R for running on midge, as these models are SLOW):

try <- stan("stan/lessmini_threeslopeintercept.stan",
    data=list(N=nrow(d), n_sp=nspecies, sp=d$sppnum,
       x1=d$force.z, x2=d$chill.z, x3=d$photo.z,
       y=d$resp, Vphy=vcv(phylo)),
    iter=5000, warmup=3500, chains=4, cores=4)

## This gives 338 divergent transitions

try <- stan("stan/lessmini_threeslopeintercept.stan",
    data=list(N=nrow(d), n_sp=nspecies, sp=d$sppnum,
       x1=d$force.z, x2=d$chill.z, x3=d$photo.z,
       y=d$resp, Vphy=vcv(phylo)),
    iter=12000, warmup=11000, chains=4, cores=4, control=list(adapt_delta=0.99))

## This gives 24 divergent transitions

try <- stan("stan/lessmini_threeslopeintercept.stan",
    data=list(N=nrow(d), n_sp=nspecies, sp=d$sppnum,
       x1=d$force.z, x2=d$chill.z, x3=d$photo.z,
       y=d$resp, Vphy=vcv(phylo)),
    iter=12000, warmup=11000, chains=4, cores=4, control=list(adapt_delta=0.99))

## This gives 14 divergent transitions

testme8 <- stan("stan/lessmini_threeslopeintercept.stan",
    data=list(N=nrow(d), n_sp=nspecies, sp=d$sppnum,
       x1=d$force.z, x2=d$chill.z, x3=d$photo.z,
       y=d$resp, Vphy=vcv(phylo)),
    iter=20000, warmup=19000, chains=4, cores=4, control=list(adapt_delta=0.999))

## This gives 19 divergent transitions

See phlyo_opsree_compact.R for more notes on other runs.

Here's what a run with iter=9000, warmup=8000, chains=4, cores=4, control=list(adapt_delta=0.99) looks like:

sigmaf sigmap

lizzieinvancouver commented 3 years ago

@FaithJones @legault

priors land diagnostics look okay, but another eye on the priors would be great!

  sigma_interceptsbf ~ normal(0, 1); // model estimate: 0.48
  lam_interceptsbf ~ beta(2, 2); // model estimate: 0.60
  b_zf ~ normal(0, 5); // model estimate: -5.15
  sigma_interceptsbc ~ normal(0, 1); // model estimate: 1.2
  lam_interceptsbc ~ beta(2, 2); // model estimate: 0.31
  b_zc ~ normal(0, 5); // model estimate: -5.95
  sigma_interceptsbp ~ normal(0, 1); // model estimate: 0.21
  lam_interceptsbp ~ beta(2, 2); // model estimate: 0.44
  b_zp ~ normal(0, 5); // model estimate: -1.43
  sigma_interceptsa ~ normal(0, 1); // model estimate: 1.5
  lam_interceptsa ~ beta(2, 2); // model estimate: 0.35
  a_z ~ normal(20, 10); // model estimate: 28.7
  sigma_y ~ normal(0, 10)  // model estimate: 13

check_all_diagnostics(fit)
[1] "n_eff / iter looks reasonable for all parameters"
[1] "Rhat looks reasonable for all parameters"
[1] "32 of 4000 iterations ended with a divergence (0.8%)"
[1] "  Try running with larger adapt_delta to remove the divergences"
[1] "0 of 4000 iterations saturated the maximum tree depth of 10 (0%)"
[1] "E-BFMI indicated no pathological behavior"

The pairs plot still doesn't show anything! Any ideas on what to look at would help. https://github.com/lizzieinvancouver/pmm/blob/master/analyses/figures/ospreepairs.pdf

Also, do either of you have nice code to make pairs plots? This is how I did it and it felt clunky:

pdf(file="figures/ospreepairs.pdf")
pairs(~sigma_y + lam_interceptsa + sigma_interceptsa + lam_interceptsbf +  sigma_interceptsbf + 
    lam_interceptsbc +  sigma_interceptsbc +  lam_interceptsbp +  sigma_interceptsbp + 
    b_zf + b_zc + b_zp + a_z + lp__, 
    data=rbind(nondiv_params[pars], div_params[pars]), pch=21, cex=0.8, bg=colz)
dev.off()

lizzieinvancouver commented 3 years ago

Here's the model estimates @legault @FaithJones @MoralesCastilla

> sumfit[which(rownames(sumfit) %in% pars),]
                            mean     se_mean          sd          2.5%           25%           50%           75%         97.5%
sigma_y             1.309226e+01 0.003220159  0.19341720  1.271869e+01  1.295783e+01  1.309134e+01  1.322328e+01  1.347630e+01
lam_interceptsa     3.491009e-01 0.002017819  0.09399209  1.741771e-01  2.841903e-01  3.469415e-01  4.118906e-01  5.392201e-01
sigma_interceptsa   1.465400e+00 0.002187387  0.10660082  1.274079e+00  1.391966e+00  1.458352e+00  1.534572e+00  1.691753e+00
lam_interceptsbf    6.006408e-01 0.009704927  0.17595572  2.346146e-01  4.797236e-01  6.129569e-01  7.285988e-01  9.134766e-01
sigma_interceptsbf  4.833773e-01 0.005585958  0.08372441  3.406178e-01  4.234401e-01  4.783556e-01  5.373665e-01  6.642449e-01
lam_interceptsbc    3.138255e-01 0.001848241  0.09067556  1.445583e-01  2.479237e-01  3.126518e-01  3.756852e-01  4.960912e-01
sigma_interceptsbc  1.219035e+00 0.002587599  0.10155465  1.034409e+00  1.149473e+00  1.213177e+00  1.283561e+00  1.432985e+00
lam_interceptsbp    4.370213e-01 0.013425549  0.19968870  8.809602e-02  2.827428e-01  4.245305e-01  5.801684e-01  8.407862e-01
sigma_interceptsbp  2.076898e-01 0.002191782  0.03551768  1.434399e-01  1.828181e-01  2.056370e-01  2.302719e-01  2.816963e-01
b_zf               -5.159007e+00 0.055477603  1.91888987 -8.838852e+00 -6.426638e+00 -5.194980e+00 -3.944072e+00 -1.240598e+00
b_zc               -5.949112e+00 0.045622676  2.81093627 -1.123390e+01 -7.915563e+00 -5.960093e+00 -4.088093e+00 -2.636374e-01
b_zp               -1.429105e+00 0.024939643  0.80183839 -3.010519e+00 -1.943406e+00 -1.438278e+00 -9.083322e-01  1.486301e-01
a_z                 2.873508e+01 0.056776305  3.70046927  2.132495e+01  2.644504e+01  2.876046e+01  3.126842e+01  3.589489e+01
lp__               -1.102446e+04 3.555151310 47.05369124 -1.111389e+04 -1.105717e+04 -1.102583e+04 -1.099220e+04 -1.093278e+04
                       n_eff      Rhat
sigma_y            3607.7426 1.0007873
lam_interceptsa    2169.7919 1.0034337
sigma_interceptsa  2375.0349 0.9997258
lam_interceptsbf    328.7171 1.0082229
sigma_interceptsbf  224.6513 1.0249223
lam_interceptsbc   2406.9295 0.9999574
sigma_interceptsbc 1540.3018 1.0005693
lam_interceptsbp    221.2294 1.0089693
sigma_interceptsbp  262.5998 1.0106994
b_zf               1196.3678 1.0045307
b_zc               3796.1249 1.0003183
b_zp               1033.6969 1.0024223
a_z                4247.9496 0.9993499
lp__                175.1746 1.0185487

lizzieinvancouver commented 3 years ago

Ideas are:

cholesky decomposition
add a multivariate normal to the lambdas?
double-check priors

willpearse commented 3 years ago

Thought I'd stick my nose in here (thanks for this link @lizzieinvancouver !) - this is a doozy, isn't it?... My thoughts, in an almost stream-of-consciousness fashion, are:

It doesn't look like this is causing any problems (the divergences aren't in weird positions in the space, and the summary stats are fine). BUT I see this isn't something to be ignored, so we need to figure it out...
The ESSes are lower for some of those lambdas than the other coefficients (but not pathological). My guess would be the search is slower/harder for these coefficients. I wonder if there is something in the calculation of these coefficients that is causing this, particularly as it's cropping up when, to all intents and purposes, the model otherwise seems to have converged.
Apparently these problems crop up when the curvature of the posterior distribution varies (https://mc-stan.org/docs/2_19/reference-manual/divergent-transitions). Thus I think it is literally that these coefficients are weird and are causing this, and so no amount of running this longer/faster is going to help (which seems the case from the above anyway).

...so, from this, maybe two ideas:

Reduce the step size (control=list(stepsize=0.5)) or something). Don't run a very very long chain with this, but keep the adapt_delta close to 1 still. Compare with other shorter chains to see if that helps.
Transform lambda in some way (logit?) so that the search isn't hitting such a hard boundary at 0/1. This is a big problem in ML search, so it stands to reason it should also be here...

lizzieinvancouver commented 3 years ago

@willpearse Thanks!

I got 26 divergences from iter=6000, warmup=5000, chains=4, cores=4, control=list(stepsize=0.5, adapt_delta=0.99)) and 25 divergences from iter=4000, warmup=3000, chains=4, cores=4, control=list(stepsize=0.5, adapt_delta=0.99)) and 151 from iter=4000, warmup=3000, chains=4, cores=4, control=list(stepsize=0.5, adapt_delta=0.9))

Just to check I am running one with control=list(stepsize=0.3, adapt_delta=0.99) and control=list(stepsize=0.5, adapt_delta=0.999).

@legault and I discussed but we think maybe lamba is already transformed under the stan hood, no? Or maybe I just need more info on how you see the transformation working.

We also tried to think more on how to formulate the model (should the lambdas all be independent for example?). I think it's fine and discussed a little with Jonathan who generally agrees ...

lizzieinvancouver commented 3 years ago

@legault Some notes on how correlated traits evolve in phylo stuff. Basically, you can evolve them together and set the correlation between them... see sim.char in geiger package (Continuous character -- multivariate example)

See also ... see 'Now, let's simulate the evolution of two traits on the tree under three different correlation coefficients'

And from Harmon "To simulate n continuous characters, the program requires an nxn evolutionary variance–covariance matrix. This matrix describes the expected variances and covariances among characters per unit time; characters evolve under a multivariate Brownian motion model described by this matrix. A"

Jonathan didn't think these types of correlated analyses of empirical data were super common, here's one I am across while looking for other stuff

willpearse commented 3 years ago

I would agree with the correlated models not being very common (although they really should be more common in my opinion!). Off-hand I can't think how to set up a direct analogue of the above simulation for the estimation. I know you can do it in PCMBase or mvMORPH, for example, but it is a bit trickier to do than the univariate case in geiger.

legault commented 3 years ago

I believe the current 3 slope model (i.e., 3 character model [+intercept]) assumes that slopes evolve independently across a shared tree. In this case, I think it possible to generate fake data by simulating the univariate case 3 times (with fastBM), once for each set of lambda and sig2, and bringing them all together. Maybe with this simulated data we can see what's going wrong on the Stan side of things.

lizzieinvancouver commented 3 years ago

iter=6000, warmup=5000, chains=4, cores=4, control=list(stepsize=0.2, adapt_delta=0.999))

19 divergences

iter=8000, warmup=7000, chains=4, cores=4, control=list(stepsize=0.2, adapt_delta=0.999)) 14 divergences

iter=6000, warmup=5000, chains=4, cores=4, control=list(stepsize=0.1, adapt_delta=0.999)) 3 divergences

willpearse commented 3 years ago

Gaaaaaaarg, so close... I don't suppse you've considered running a couple of these at the same time and could keep the one where there are no divergences? I have a suspicion that these posteriors are going to be essentially indistinguishable from the ones with many more divergences anyway...

On Thu, 14 Jan 2021 at 18:19, lizzieinvancouver notifications@github.com wrote:

iter=6000, warmup=5000, chains=4, cores=4, control=list(stepsize=0.2, adapt_delta=0.999))

19 divergences

iter=8000, warmup=7000, chains=4, cores=4, control=list(stepsize=0.2, adapt_delta=0.999)) 14 divergences

iter=6000, warmup=5000, chains=4, cores=4, control=list(stepsize=0.1, adapt_delta=0.999)) 3 divergences

— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub https://github.com/lizzieinvancouver/pmm/issues/13#issuecomment-760379250, or unsubscribe https://github.com/notifications/unsubscribe-auth/AAJNYUT6MN7OOSV5A6XUDZ3SZ4YUPANCNFSM4VTPXLWA .

lizzieinvancouver commented 3 years ago

@willpearse I agree! I think we have the model results, just not ones I'd feel okay publishing.... I had not considered your idea but can try it next week. For now I sent a few more runs to the server (midge!) with a mix of even smaller step sizes, longer warmup etc..

willpearse commented 3 years ago

That's great; I'm just glad you've got loads of things sent off because I think that will crack it. This is a bit weird and frustrating; I am going to try and look more into this when I have time because there ought to be a general fix for this...

legault commented 3 years ago

In case it's helpful, I've pushed code that simulates 1, 2, and 3 slope models and fits these models to the data, see adfd84d . With tight priors, we see good behavior and get back the true parameter values

lizzieinvancouver commented 3 years ago

@legault Just confirming that I went back to the ospree code where I wrote out the data and it does look to be z-scored!

legault commented 3 years ago

I just pushed 3bde38c which includes a successful fit of the three slope (i.e., three trait) model to the OSPREE data. The fitted model can also be downloaded from the data folder on the Wolkovich server (testme.rds)

lizzieinvancouver commented 3 years ago

Values from @legault model:

                        mean se_mean    sd      2.5%       25%       50%       75%
sigma_y                13.34    0.00  0.19     12.97     13.21     13.34     13.47
lam_interceptsa         0.35    0.00  0.10      0.16      0.27      0.34      0.42
sigma_interceptsa      17.19    0.02  1.30     14.83     16.27     17.11     18.02
lam_interceptsbf        0.67    0.01  0.21      0.22      0.52      0.69      0.84
sigma_interceptsbf      5.67    0.05  1.02      3.87      4.95      5.60      6.31
lam_interceptsbc        0.28    0.00  0.09      0.11      0.21      0.27      0.34
sigma_interceptsbc     13.56    0.03  1.09     11.59     12.81     13.51     14.26
lam_interceptsbp        0.39    0.02  0.24      0.02      0.19      0.37      0.57
sigma_interceptsbp      2.40    0.02  0.42      1.68      2.11      2.35      2.64
b_zf                   -5.59    0.05  2.01     -9.40     -6.92     -5.63     -4.30
b_zc                   -8.46    0.04  2.52    -13.38    -10.12     -8.46     -6.84
b_zp                   -1.48    0.02  0.78     -3.03     -1.99     -1.48     -0.98
a_z                    30.30    0.05  3.58     23.17     27.98     30.38     32.67

lizzieinvancouver commented 3 years ago

@legault This model looks great to me -- thank you! I am adding a few notes below.

@MoralesCastilla ... it looks like we have the PMM running. We should pull the code from here back to the ospree repo, run the model and then, let's write that paper!

lizzieinvancouver commented 3 years ago

A few notes from Geoff we may want ...

It takes some time to run (~2 hours on midge), so you can alternatively download the fitted model from our friendly lab computer's /home/data folder ("testme.rds" 160 MB)

Attached are the estimates. The priors are relatively wide and could probably be widened if desired (see object "phypriors").

...

When I asked, wow! How did you get this to work? Geoff said, "I made/tested a number of changes with simulation code first, then applied it all to the OSPREE data. The most important change, probably, was constraining Pagel's lambda to [0, 1]." It sounds like we (I really am not smart enough to use a beta, I think @willpearse helped us there) had a beta prior in there that was not quite correctly specified for our crazy model.

willpearse commented 3 years ago

Thanks for looking into this, and also for updating me. To be clear, then, it's a beta prior + a hard [0,1] constraint that fixed it? Good to know going forward.

On Thu, 8 Jul 2021 at 03:16, lizzieinvancouver @.***> wrote:

A few notes from Geoff we may want ...

It takes some time to run (~2 hours on midge), so you can alternatively download the fitted model from our friendly lab computer's /home/data folder ("testme.rds" 160 MB)

Attached are the estimates. The priors are relatively wide and could probably be widened if desired (see object "phypriors").

...

When I asked, wow! How did you get this to work? Geoff said, "I made/tested a number of changes with simulation code first, then applied it all to the OSPREE data. The most important change, probably, was constraining Pagel's lambda to [0, 1]." It sounds like we (I really am not smart enough to use a beta, I think @willpearse https://github.com/willpearse helped us there) had a beta prior in there that was not quite correctly specified for our crazy model.

— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub https://github.com/lizzieinvancouver/pmm/issues/13#issuecomment-876066473, or unsubscribe https://github.com/notifications/unsubscribe-auth/AAJNYUWF6POSYMC6BP6ROFDTWUDANANCNFSM4VTPXLWA .

lizzieinvancouver commented 3 years ago

@willpearse Thanks to you for writing up the model! Looking back at the code, I think @legault added the beta distribution and initially it was beta(2,2) and now is beta(1,1), which means that making it more square made the difference ... I think.

Also, this all applies for our real data; we got the model running on test data with beta(2,2) and normal(0, 1) for lambda.

willpearse commented 3 years ago

This is super useful to know. Well done and thank you!

On Fri, 9 Jul 2021 at 18:31, lizzieinvancouver @.***> wrote:

@willpearse https://github.com/willpearse Thanks to you for writing up the model! Looking back at the code, I think @legault https://github.com/legault added the beta distribution and initially it was beta(2,2) and now is beta(1,1), which means that making it more square made the difference ... I think.

Also, this all applies for our real data; we got the model running on test data with beta(2,2) and normal(0, 1) for lambda.

— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub https://github.com/lizzieinvancouver/pmm/issues/13#issuecomment-877344706, or unsubscribe https://github.com/notifications/unsubscribe-auth/AAJNYUVF3BFCEROMZCNR5UTTW4W7PANCNFSM4VTPXLWA .

lizzieinvancouver / pmm

Structure in sigma_interceptsbf and sigma_interceptsbp #13