Open DeirdreLoughnan opened 1 year ago
@DeirdreLoughnan Thanks for checking this! I am hopeful @FaithJones and @legault might have better insights that you and me.
Can you also add:
@lizzieinvancouver @legault @FaithJones
The test data has 20 studies for 20 species and 10 reps.
Here is the scatterplot of the muPhenoSp:
Here is a scatterplot of the muStudy values:
But I am also unsure what you mean by "study level estimates that are part of sigma_sp". It doesn't appear to me as though study contributes to sigma_sp, which we define as 5.
@DeirdreLoughnan Thanks! The muPhenoSp looks good, so it feels we're narrowing in on the problem.
The test data has 20 studies for 20 species and 10 reps.
So does every study have 200 reps in the end? That seems a decent number.
a scatterplot with 1:1 line for the study level estimates that are part of sigma_sp?
Sorry this was a typo on my part! I meant sigma_study so I think you plotted what I wanted. There's a few things I personally would check and/or change here. You may have done some of these but in case not:
In the future, please add the 1:1 lines and put mdl on the y axis.
@lizzieinvancouver yes that is correct, 200 reps per study.
Great, I thought that is what you meant!
I think we do have the grand mean separated in both the model and test data.
Yes the species level difference (sigma.species) was 5 and greater than the study level difference (sigma.study) which was 2. I reran the model using 10 for both and the output is the following:
mean 2.5% 97.5% Rhat parameter
mu_grand 13.19022862 8.106933482 18.65216331 1.0030183 15
sigma_traity 7.98191621 7.814585015 8.16155945 0.9994544 5
sigma_sp 11.58467484 8.794021355 15.10560547 0.9994799 10
sigma_study 0.77632537 0.413696205 1.23839203 1.0008111 10
muForceSp -13.28390840 -17.218852896 -9.44098944 1.0047887 -15
sigmaForceSp 5.72875735 4.071913140 8.15598279 0.9997479 4
muChillSp -3.02747266 -6.488389816 0.33995822 1.0055816 -0.6
sigmaChillSp 5.05980338 3.565515897 7.31086702 1.0005934 4
muPhotoSp -2.92811052 -6.553851083 0.64289864 1.0021419 -0.05
sigmaPhotoSp 5.03572293 3.258500598 7.46892750 0.9997932 4
muPhenoSp 75.50392288 67.613913830 82.81681998 0.9993126 80
sigmaPhenoSp 17.48891700 13.918994708 22.04346324 0.9997798 20
betaTraitxForce 0.22861253 0.002628756 0.46150674 1.0051098 0.3
betaTraitxChill -0.26833743 -0.461583155 -0.06312454 1.0096514 -0.2
betaTraitxPhoto -0.04966157 -0.253066545 0.16680646 1.0017828 -0.4
sigmapheno_y 4.79500320 4.442195385 5.17203850 1.0001405 5
The estimates for PhenoSp look good still, but muStudy still look quite poor:
@DeirdreLoughnan Have you confirmed that the simple model that just estimates trait ~ sp + study works? I would set up a test data and code file to do that. You should have all the code and just need to delete a LOT of code to check the simpler version.
@DeirdreLoughnan @lizzieinvancouver I have taken a look at the issue today. I havent solved it though Im afraid. It looks to me like the model is struggling to tell the difference between sigma for study and sigma for general variance. When I ran the model values 10 and 10 for species and study, but 1 for general sigma - sigmaTrait_y- the model swapped the study and general sigmas around. This is why the estimate for study level sigma is so small. I therefore second Lizzies suggestion to check its working with just the trait part of the model (but check that the sigmas are 0 centred in the traits only model). Maybe we can discuss why this might happen and how much of an issue it is if we correctly estimate species level trait differences in out upcoming meeting?
@FaithJones @DeirdreLoughnan @legault Thanks for all your efforts on this!
I tried to look at the code, but there is no README in the (I think) relevant folder so it's hard to see what code we're using. @DeirdreLoughnan Could you add a README explaining all the files in the traitors/stan folder?
@lizzieinvancouver @legault @FaithJones
I feel good that I might have found the issue with the model. But for posterity, I am going to replicate what I said in my email here in more detail. The code I am looking at is [here].(https://github.com/lizzieinvancouver/ospree/blob/master/analyses/traits/Rfiles/height_simPriorChecks.R)
I think the reason that our model was struggling to estimate sigma_study and sigma_sp was because of how the test data was coded. In Faith's test data each study has perfectly replicated species, with all species occurring in all studies (lines 67, 68, 73):
trt.dat$study <- rep(rep(c(1:Nstudy), each = Nspp), each = Nrep)
trt.dat$species <- rep(rep(c(1:Nspp), times = Nstudy), each = Nrep)
....
trt.dat$alphaTraitSp <- rep(rep(alphaTraitSp, times = Nstudy), each = Nrep)
This generates rather poor estimates for sigma.study:
But in my test data, I had different species for different studies, which I think better reflects what our actual data looks like.
trt.dat$study <- rep(c(1:Nstudy), each = Nspp)
trt.dat$species <- rep(1:Nspp, Nstudy)
....
trt.dat$alphaTraitSp <- rep(alphaTraitSp, times = Nstudy)
And this test data doesn't have the same issues estimating sigma.study:
What do others think? Is the model actually fine and the test data the issue?
@DeirdreLoughnan I appreciate your work on this!
All species in all studies would be the best case scenario for the test data (that's what I would personally code by default, and then degrade to less than that). This suggests some other issue.
@lizzieinvancouver to see if we can make some more progress on this issue, I made this stan discourse post, but have not yet received a reply.
For my own reference and for those later looking at this (myself and others!) I am adding some excellent info from emails last week here:
I have also had time to look over some of our old code and issues, which others might find helpful too. Of particular relevance is this issue:
https://github.com/lizzieinvancouver/ospree/issues/390
We had the model working on test data in the late summer of 2021, with rcode and stan code. Faith, I am not sure the stan code is wrong, as the prior for mu_study and mu_sp are zero in the rcode, but correct me if you still think it is not centered on zero and an issue. Based on my June 22, 2021 post this model was doing a pretty good job of estimating all the parameters for the test data.
But then in this issue you say Faith that Geoff found “a bug with the traits only Stan code where the hierarchical effects were not quite non-centered but also not centered” (Sept 22). Is this why we switched to Geoff’s code (I think this is Geoff’s commit for that change)? We were also having issues with low n_eff, right? Somewhere between Sept 22 and Oct 10th we switched models completely and started using phenology_combined.stan, which does not do a great job of estimating the study effects.
Regarding posting to stan discourse, I forgot that we kind of already did, at least regarding the intercept only portion of the model: https://discourse.mc-stan.org/t/simple-intercept-only-model-with-poor-chain-mixing/24335 It doesn’t seem to have been very helpful.
Which leads me to what I have been working on this week.
I started by running the test data mentioned in the above issues. The trait only test data did produce some issues (2 transitions exceeded max tree depth) but did estimate the true values pretty well.
I compared Geoff’s model to mine, as far as I can see the biggest differences are lines 88-102, where Geoff defines mu_grand_sp, in calculating betaForeSp (line 96), Geoff is using mu_grand_sp, where as I just use muSp and do not include the grand mean. The other difference is line 109, yTraiti is defined by the normal of that and sigma_traity in Geoff’s model, where as I have a loop:
for (i in 1:N){ yTraiti[i] ~ normal(ymu[i], sigmaTrait_y); }
Would this later difference be affecting the models ability to differentiate the error?
And the reply was:
There are several assumptions underlying my use of mu_grand_sp on line 96:
- mu_grand = the average trait value of all species
- mu_sp = the offset from the average trait value (mu_grand) for a particular species
- If mu_sp is an offset, then a species' trait value must be mu_grand + mu_sp (I call this sum mu_grand_sp)
- betaForceSp is determined by a species' actual trait value (as opposed to its offset from a grand trait value), hence it involves mu_grand_sp
Regarding line 109, that is the "observation error" or "random noise" part of the model. It assumes:
- There are trait values associated with particular species and studies, y_hat
- y_hat[i] = mu_grand + species_offset + study_offset
- Observations of these trait values, yTraiti, are imperfect and/or there is random error such that yTraiti ~ Normal(y_hat, sigma_traity)
- Without such structure, all individuals of a particular species and study are assumed to have identical traits
@DeirdreLoughnan Your investigative work on this has been great and seems like we should be able to figure this out.
I reviewed these emails more closely and stared for a while at the joint_3cue_newprior.stan model and phenology_combined.stan model and one contrast I see is that joint_3cue_newprior.stan does not properly define muStudy or muSp. These very relevant lines appear to be missing:
muSp ~ normal(0, sigma_sp);
muStudy ~ normal(0, sigma_study);
If it were me my next step would be one of the following: add those lines to joint_3cue_newprior.stan while checking the test data and see if that fixes things and/or cut down phenology_combined.stan to ONLY a trait version and see how it does on recovering test data.
As for some other queries:
One email included:
Faith’s code specifies the study and species :
trt.dat$study <- rep(rep(c(1:Nstudy), each = Nspp), each = Nrep) trt.dat$species <- rep(rep(c(1:Nspp), times = Nstudy), each = Nrep) trt.dat$alphaTraitSp <- rep(rep(alphaTraitSp, times = Nstudy), each = Nrep)
With each study having every species.
Where as I specify it as:
trt.dat$study <- rep(c(1:Nstudy), each = Nspp) trt.dat$species <- rep(1:Nspp, Nstudy) trt.dat$alphaTraitSp <- rep(alphaTraitSp, times = Nstudy)
With each study having two unique species.
In comparing the two models, it seems like Faith’s code does not estimate the sigma_study very well (for example estimating 0.26, while the parameter value was 2), while my code does estimate a value of 2 for the same parameter.
for (i in 1:N){
yTraiti[i] ~ normal(ymu[i], sigmaTrait_y);
}
I am fairly sure is identical to yTraiti ~ normal(y_hat, sigma_traity);
so I doubt that is a problem.
BTW, when you refer to `your code' what do you mean? We have two different Stan models discussed above, is there a third also (If so, it would be good for me to start here and figure out these two files before trying to compare to a third version).
@DeirdreLoughnan One more thing! I forgot to say that whether you use the offset of species or the offset + muGrand should not matter too much. It will affect your estimated parameters (the intercepts) but it should NOT matter to the model, it's more about what you'd prefer to interpret. But this actually did flag for me my concern about the phenology_combined.stan model, which is this:
// Traits
for(i in 1:n_spec){
mu_grand_sp[i] = mu_grand + muSp[i];
}
for (i in 1:N){
y_hat[i] = mu_grand + muSp[trait_species[i]] + muStudy[study[i]];
}
I see now why @legault is doing this, but the way it is implemented it seems to me that you're telling the model TWO separate equations where mu_grand and muSp appears so it will try to find an answer that works for both the first equation and the second (here you would hope mu_grand and muSp are unaffected by the first eqn and all variation there goes to mu_grand_sp but the model is written that they should be jointly estimated, I think ... instead of written as things to be estimated and then new parameters to be built AFTER estimation) -- I think this could screw up the model. You definitely can use Stan to define parameters based on other parameters but you need to be careful about how you implement that (and here my knowledge is limited). Since it doesn't really matter if you add in mu_grand to get your species estimates I would also consider reformulating the code to remove this. The proper workflow though would be to run this model as trait only and diagnose if this is an issue first.
@legault replied in an email:
I think we are building the transformed parameters properly in phenology_combined.stan. Specifically, I think it's ok to use the same parameter (mu_grand, muSp) in two different definitions within the transformed parameters block.
My understanding is that transformed parameters only become tricky / complicated when they appear on the left side of an equation (e.g., transformed_parameter ~ normal(....) ), which ours don't.
@DeirdreLoughnan posted here to Discourse, though when I asked Will Pearse about this he pointed out that we never say what is actually wrong (we mention it but we don't SHOW it). @DeirdreLoughnan Could you perhaps update with a plot?
@legault any chance you could make a simplified version of your code with the:
mu_grand_sp[i] = mu_grand + muSp[i];
part in it to show it actually works? When I asked Jonathan Auerbach to take a quick glance he noted mu_grand_sp was lacking a constraint/prior which seemed wrong to him.
@DeirdreLoughnan, have you tried running the other version ( joint_3cue_newprior.stan) with the 0s in place of the priors? I also suggested you could update your Discourse post, see:
@DeirdreLoughnan I'm sorry we haven't solved this yet. I can take another look in a few weeks when I'm back off my holiday. But in the mean time, could you clarify a few things for me? I think it will help me a lot when it comes to me checking this issue again!
[x] As @lizzieinvancouver mentioned above, there is no readme in the Stan folder, so it is very difficult to see which models to focus on. Could you add a READme? And could you also say which is "Geoff's model" and which is "your model"? I think we are comparing joint_3cue_newprior.stan model and phenology_combined.stan, but I can't remember what the difference is. Also is there the same problem in both joint_3cue_newprior.stan model and phenology_combined.stan?
[ ] Did you get the traits only part working OK? And if so, is it the traits part of joint_3cue_newprior.stan model or phenology_combined.stan? Are they the same? I think this could be helpful as we seem to have narrowed the problem down to some issue with parameter or priors in the traits part of the model.
[ ] Above, @lizzieinvancouver mentioned she is concerned that the following lines are missing from joint_3cue_newprior.stan
muSp ~ normal(0, sigma_sp);
muStudy ~ normal(0, sigma_study);
Instead I think we have:
muSp ~ normal(prior_mu_sp, sigma_sp);
muStudy ~ normal(prior_mu_study, sigma_study);//
where mu sp is centred around a prior. I think I asked about this earlier, and you said it was unlikely to be the problem because prior_mu_sp is set to 0 in the R code. I still think it is worth setting prior_mu_sp and prior_mu_study to 0 in the Stan code instead as it seems to be causing us confusion when we are trying to debug.
@FaithJones I am happy to help clarify things.
You are correct that we are comparing those two models. joint_3cue_newprior.stan is my model, and phenology_combined.stan is Geoff's model. The main difference is line 88, where Geoff defines:
for(i in 1:n_spec){
mu_grand_sp[i] = mu_grand + muSp[i];
}
But when both models run with the same test data (with every species in every study like you coded) both produce the same issue with estimating the study level variation.
The trait only version I have been comparing for Geoff's model is: phenology_combined_traitonly.stan
muSp ~ normal(0, sigma_sp);
muStudy ~ normal(0, sigma_study);
Running only the trait model, the output is as follows:
The study level estimates do not compare with the values in the test data:
@lizzieinvancouver Thanks for asking other people for help with this model.
Unfortunately adding the zeros (and the actual priors) to the code did not make a difference. I have pushed versions of the model code with the values updated.
I can add the above figure and model output to the discourse post. If you have a different plot in mind, let me know.
I also asked Harold for his input and I am hoping to meet with him next week to discuss it in more detail.
@DeirdreLoughnan This is super helpful! Can you point me to R code to run the trait only models?
I can add the above figure and model output to the discourse post. If you have a different plot in mind, let me know.
Yes! I think it would be very helpful to update the post that you have figured a MUCH simpler version of the model is not working and post both the overall estimates (you screen shot -- but add some intervals beyond just the mean) and the study-level values.
@legault any chance you could make a simplified version of your code with the:
mu_grand_sp[i] = mu_grand + muSp[i];
part in it to show it actually works? When I asked Jonathan Auerbach to take a quick glance he noted mu_grand_sp was lacking a constraint/prior which seemed wrong to him.
Sure, I'd be happy to. Shall I do model + simulation, or do we already have simulation code we trust for trait only, phenology only, trait + pheno?
@legault @lizzieinvancouver I think Lizzie might have found the issue in the test data, as seen here on the discourse post
I think line 77 in height_simPriorChecks.R is the issue.
I ran this with our actual model (phenology_combined.stan) replacing that line with:
trt.dat$muStudy <- rep(muStudy, each = Nrep*Nspp)
and it also looks better overall:
Parameter Test.data.values Estiamte X2.5 X97.5
1 mu_grand 15.00 16.01908859 13.5397050 18.4869751
2 sigma_sp 5.00 5.35412218 3.8968155 7.4898319
3 sigma_study 2.00 1.93231087 1.3732528 2.7625541
4 sigmaTrait_y 5.00 4.98806477 4.8803267 5.1020800
5 mu_forcesp -15.00 -15.59616857 -22.7232720 -8.4926162
6 mu_chillsp -0.60 -3.46138315 -10.8339588 3.5252414
7 mu_photosp -0.05 -2.38646152 -9.5448180 4.3770382
8 mu_phenosp 80.00 74.02744108 65.2840913 82.0617510
9 sigma_forcesp 4.00 4.90133400 3.4118200 7.0754314
10 sigma_chillsp 4.00 5.11622622 3.6135319 7.2797117
11 sigma_photosp 4.00 3.94990738 2.0717647 6.3535937
12 sigma_phenosp 20.00 19.55091346 15.5992742 24.2691799
13 sigma_phenoy 5.00 5.00622719 4.6387458 5.4166076
14 beta_tf 0.30 0.20834453 -0.2166053 0.6313222
15 beta_tc -0.40 -0.22584344 -0.6441839 0.2124457
16 beta_tp -0.20 -0.09475335 -0.5032739 0.3364490
But the model isn't doing an amazing job of estimating the study level estimates, they are still a little off:
muStudy_param muStudy_esti 2.5 97.5
muStudy[1] 2.2003837 1.3262796 0.2359061 2.44093717
muStudy[2] 3.5901186 2.3643136 1.2883896 3.49366564
muStudy[3] 0.4826985 0.7937980 -0.2745960 1.87449508
muStudy[4] -0.8945716 -1.7133714 -2.8218982 -0.60596056
muStudy[5] 2.7978355 1.6858797 0.6043657 2.77084509
muStudy[6] 1.4295162 0.6467955 -0.4588590 1.73014157
muStudy[7] -0.7098701 -1.4228737 -2.5093543 -0.33308995
muStudy[8] 2.9547440 1.8332000 0.7228610 2.93581746
muStudy[9] 0.6409530 -0.3889308 -1.5039259 0.69115912
muStudy[10] 0.5418631 -0.6141944 -1.6879694 0.47972570
muStudy[11] 1.6992508 1.5108284 0.4287605 2.60476951
muStudy[12] -2.6489345 -3.3812484 -4.5131360 -2.29851085
muStudy[13] -0.2488066 -1.0464371 -2.1515146 0.05018247
muStudy[14] -0.9562877 -1.2435484 -2.3309650 -0.14993122
muStudy[15] 2.4035960 0.9069504 -0.1495524 2.01440122
muStudy[16] 3.9322565 2.8746845 1.7979854 4.00503831
muStudy[17] 1.0685968 0.7460977 -0.3411571 1.85181987
muStudy[18] -0.4173179 -0.9738202 -2.0763313 0.12040927
muStudy[19] -2.6562617 -3.5237605 -4.6384199 -2.44506186
muStudy[20] 0.4437431 -0.3223652 -1.3949906 0.79097590
@DeirdreLoughnan Nice! Can you push updated code?
Also, we should check a few more things while we're here:
@legault Thanks for the offer! If you have time, I'd still like to check it. I think once @DeirdreLoughnan updates the test code we should be have good test data so it may be as simple as just quick edits on the Stan code to check it's good.
@lizzieinvancouver @legault the files that I have been running are:
ospree/analyses/traits/Rfiles/height_simPriorChecks_review.R
I have added the priors to the stan file, and they are not centred on the test data values, I used the prior values that we were using to run the previous models, not the priors we used in our previous test data development.
I agree it is concerning. I re-ran the model with 5 for both sigmaSp and sigmaStudy in generating the test data. The results from that model run are:
Parameter Test.data.values Estiamte X2.5 X97.5
1 mu_grand 15.00 16.7188908 13.9840642 19.3542189
2 sigma_sp 5.00 3.4857020 2.5374770 4.8901387
3 sigma_study 5.00 5.0252892 3.6716381 6.9352909
4 sigmaTrait_y 5.00 5.0191700 4.9085245 5.1325411
5 mu_forcesp -10.00 -11.4256336 -20.2137695 -3.0014465
6 mu_chillsp -0.60 -4.8331040 -14.3886746 4.4395909
7 mu_photosp -0.05 -4.1468477 -15.5421126 6.1258949
8 mu_phenosp 80.00 78.2028727 70.6213107 84.9466380
9 sigma_forcesp 4.00 3.9881950 2.7713044 5.7742445
10 sigma_chillsp 4.00 4.2675381 2.9130261 6.2975133
11 sigma_photosp 4.00 5.1969329 3.3829690 7.7616732
12 sigma_phenosp 20.00 15.9589350 12.3312508 20.5391411
13 sigma_phenoy 5.00 5.0398670 4.6536010 5.4644334
14 beta_tf 0.30 0.3952195 -0.1034595 0.8969261
15 beta_tc -0.40 -0.1337660 -0.6826459 0.4404821
16 beta_tp -0.20 0.0551441 -0.5667491 0.7188601
The comparison of the test data and model estimates are below, the trait species level estimates (muTraitSp) do not include mu_grand, so I included the estimate of mu_grand_sp as well for comparison.
@legault @DeirdreLoughnan I should have a bit of time this week to take another look at this issue, nut I don't want to repeat others efforts. Can I just check that there has been no more work on this question in the last two weeks?
I have not worked on this in the last two weeks. Your help would be greatly appreciated!
On Aug 1, 2023, at 2:52 AM, Faith Jones @.***> wrote:
@legault https://github.com/legault @DeirdreLoughnan https://github.com/DeirdreLoughnan I should have a bit of time this week to take another look at this issue, nut I don't want to repeat others efforts. Can I just check that there has been no more work on this question in the last two weeks?
— Reply to this email directly, view it on GitHub https://github.com/lizzieinvancouver/ospree/issues/460#issuecomment-1659971722, or unsubscribe https://github.com/notifications/unsubscribe-auth/AD3XCZMAH232EWY36OHVNCLXTDGXTANCNFSM6AAAAAAY3ZEANE. You are receiving this because you were mentioned.
Sorry all, I was on vacation last week and have been catching up on work this week. I'll review the two files Deirdre linked above and post my thoughts within the next day or two.
From looking at the latest posts and the discourse post, I understood that you found the issue - a bug in the simulation code? But I also saw some discussion on the priors, so I thought I would weigh in here.
I see that we now have the priors in the Stan code rather than the R code, so our original prior checking code in ospree/analyses/traits/Rfiles/height_simPriorChecks.R won't work automatically any more. I went through this code an manually updated the priors in the prior check code to match that in phenology_combined.stan. The prior predictive plots are in figures/priorChecks. My main concern about the priors for Height is whether we are constraining it to too few possible heights. Below is a plot of the predicted mean height values against the predicted values including all sources of error. Perhaps there is not a great enough variation in heights? Do trees often grow more than 60m high???
While doing this, I noticed a different (but solvable!) problem. Originally we were reading in the priors from R so we could use the same Stand model for the different trait values, despite them having different priors in the traits section. So if we want the priors in the Stan code instead we will need different Stan models.
@FaithJones yes, we found that the main issue was in the test data. But the model is still not performing as well as we could like. As Lizzie mentioned above, some of our estimates are still off the 1:1, suggesting there still might be an issue with the model.
We decided to switch back to having the priors in the Stan code. Dinara found that this made the model run faster and we wanted to be sure there were no additional issues influencing our model run. I think once we are confident the model is working it will be simple to create trait specific Stan models with the corresponding priors.
Thanks for looking at the priors. To clarify, in the above plot muGrandSp is the predicted mean height values and Trait is the predicted values including all sources of error, correct?
For the subset of 37 species that we have, no measurements of the raw data are above 60 m, so I think it is a fair assumption. Below is a histogram of our raw height data for the subset of our species:
This might work better as a Zoom call (I'm free this week), but to clarify: have you tested code where every prior in the Stan model is centered on the true values used in the simulation code? Quite a few priors in the linked files are not centered on the true values, so it's no surprise estimates wouldn't be 1:1.
@DeirdreLoughnan maybe this is what you were alluding to in your earlier comment. Does a centering every prior still lead to results you don't trust?
@DeirdreLoughnan @FaithJones @legault Hello! I am just back from a week off ... I have nothing that useful to offer (beyond that the Stan code does seem a little faster with the priors in the Stan code, and it's better for keeping the full model in one file), I just wanted to say how great this was to read. Thank you all!
@legault @FaithJones Lizzie and I looked into this a little more and believe have zeroed in on the problem.
The model is struggling with the species and study level estimates in the trait portion of the model. It seems as though the model can predict one but not both well. For example, when you run the model for 40 species from 40 studies, the model estimates are slightly off for muSpecies:
But this flips when the number of studies and species are not equal (as with our data), for example running 40 spp with 20 studies shows poor muStudy estimates:
Could you and Faith look at the trait only code (height_simPriorChecks_review.R) and the stan code (phenology_combined_traitonly.stan).
Do you see any errors in either that might explain why our model might be struggling to estimate the two effects in this model? We just want to be sure this is not due to an error in the code or model.
@legault at larger sample sizes (nspp and anstudy = 50) we still see an offset in the estimates:
but it improves the fit at low sample sizes (nspp and nstudy = 20):
@lizzieinvancouver I also looked into whether the estimates change with each run and yes, when you re-run the model multiple times, the offset switches from being on the study effect to species effect. It also varies as to whether it is over or underestimating the parameters relative to the simulated value.
@legault @FaithJones @lizzieinvancouver I also posted this issue on the Stan discourse to see what insights they might have. The new post is here.
It would be worth trying a non-centred version of our model, as suggested.
@FaithJones this week, can you write and run a non-centered version of the trait only model?
@DeirdreLoughnan Great we got a suggestion on Discourse! We already have a traits only NCP stan model though, I think, [joint_traitonly_ncp.stan] (https://github.com/lizzieinvancouver/ospree/blob/master/analyses/traits/stan/joint_traitonly_ncp.stan). I think it justs needs updating with your current priors. I won't be able to work on this for a few weeks (I'm on vacation from tomorrow, and then I have fieldwork) but I will check in again on this issue when I'm back in my office again.
@legault @FaithJones @lizzieinvancouver I pushed a version with the priors.
Running the NCP model with Nspp and Nstudy = 50 does appear to improve the fit between the simulated values and the model estimates. BUT the the overall estimates are terrible.
Parameter Test.data.values Estiamte X2.5 X97.5
1 mu_grand 15 14.149382 11.374312 16.981339
2 sigma_sp 5 2.682284 2.411675 3.000864
3 sigma_study 5 2.326995 2.091457 2.597030
4 sigmaTrait_y 2 2.000561 1.977089 2.024073
@FaithJones thanks for offering to help with this. When you get back from fieldwork could you take a look at this model and try to figure out why we are now getting such poor estimates for the sigma_sp and sigma_study?
@legault @FaithJones @lizzieinvancouver I also came across another Stan discourse post we made on this topic. I think it is from the same time that I wrote the NCP model Faith linked to above. I am adding it here in case it is useful.
Simple intercept only hierarchical model with two groups deadly slow poor convergence
@FaithJones @legault @lizzieinvancouver I have been having issues with my trait joint model and have been going back over our joint model test code.
In doing so, I felt that our model was not doing a great job at estimating some of our parameters and I was hoping we could discuss this further. From what I can tell, the most recent test data was pushed by Faith and can be found in the files height_simPriorChecks.R in the Rfiles folder.
When I run this test data, I get ok estimates for some of the estimates, such as the sigma's but not others. Below is the output of my model run with what I think are the test data vales.
It seems to like the estimate for sigma_study and sigma_sp are pretty poor. Is this not worrisome? Could you remind me why it is not worrisome or why we were not able to better estimate these parameters?