lizzieinvancouver / pmm

phylogenetic mixed models -- and we all learn together
2 stars 1 forks source link

Family level phylogeny model #16

Open DeirdreLoughnan opened 2 years ago

DeirdreLoughnan commented 2 years ago

@lizzieinvancouver I have added the code for the family level Stan and associated R code to the repo.

The model runs with a few changes to the model code you linked to. I initially had line 33, where famnum is defined in the data block as famnum[Nspp], as you had sitenum[J] in the original code. But this gave the error message that there was a mismatch in the dimensions declared for famnum (dim declared = 1279, dim found = 42558). So I changed it to be declared as famnum[N] and the model runs.

I am currently running this model for and the estimated leapfrog time is 137s. I will post an update when it is finished running.

DeirdreLoughnan commented 2 years ago

@lizzieinvancouver The model ran relatively quickly (less than 2h) but produced 8000 divergent transitions the chains mixed poorly for most parameters. Screen Shot 2022-01-20 at 12 28 07 PM

Screen Shot 2022-01-20 at 12 53 38 PM

I will continue to think on it and see if I can figure out where I am going wrong.

lizzieinvancouver commented 2 years ago

@DeirdreLoughnan Usually that many divergences suggests the code is just wrong -- that the model is totally mis-specified. Re-reading above, where you had dimension issues "(dim declared = 1279, dim found = 42558)", I wonder if we should check that issue first -- I would suggest running the original sites within plots test data R code and stan model (you may need to update the Stan from using <- to using =; please push that) to make sure the correction is necessary there also ... otherwise it suggests we need to adapt here to match that old code.

lizzieinvancouver commented 2 years ago

@DeirdreLoughnan I updated the repo with the plotsinsites code, so please pull! The model had divergent transitions (yikes), and the test data looks messed up (I think that's the problem, not the Stan code, but I ran out of time to work on this). I did not need to change the dimensions to get the Stan code to run, so that could be related to the issue in your model.

I'd like to spend a little time trying to code the family model soon (ideally in the next week); then I can hopefully fix the dimensions issue you're having and/or figure out something we might be not thinking of. To help me with that, can you:

  1. Move the synchrony code for this into a 'sync' or 'sandbox' folder in pmm/analyses. In there please put:

    • [x] the R code and the data file it calls
    • [x] Stan code for phylogeny on the slopes and unpooled intercepts (that you have tested with test data, include the R code with test data and running the Stan code) Then I will try to use the Stan code with the plotsinsites Stan code to make new Stan code and think through it.
  2. Not critical, but could you see if Faith:

    • [x] Has R test data and Stan model code for a simple nested model we could look at. I am not sure the example I gave you is great (I am sorry about that). If she does, please add it to the examples repo (or ask Faith, I added her to that repo also).

Thank you! I know you must be tired of this, but we're making progress.

PS: I should be in Monday afternoon if you want to meet to discuss this.

DeirdreLoughnan commented 2 years ago

@lizzieinvancouver I was looking over the code for this model and your old model again and it occurred to me that the issue with my first attempt might have been the data I inputted into the model for famnum. Instead of simply inputting a list of family names from the full dataset (which would be length N), it occurred to me after looking at your R test code, that I should actually be inputting a list of family names for each species (so it would be Nspp in length).

I have pushed my code to the pmm repo that tries this and will post an update of the model output when it is finished running.

DeirdreLoughnan commented 2 years ago

@lizzieinvancouver I also checked how well my test data was estimating the intercepts like you asked and it looks great!

Screen Shot 2022-01-25 at 11 56 22 AM

DeirdreLoughnan commented 2 years ago

@lizzieinvancouver the issue I thought I found yesterday made essentially not difference and the model still produces 7998 divergent transitions. The pairs plot looks very similar to the one above.

DeirdreLoughnan commented 2 years ago

@lizzieinvancouver I am sorry to hear that you are having a computer nightmare, I hope you can get it sorted quickly!

I asked Faith if she had code for a nested model and the only example should could think of was in: bayes2020/Projects/Faith/rangesModel bayes2020/Projects/Faith/stan/nointer 3levelwpop force&photo ncp FaithExample.stan

It looks promising! I will take another go at getting the model working early next week.

I am planning to be in the office Monday and Wednesday if you are in and have time to chat!

lizzieinvancouver commented 2 years ago

@DeirdreLoughnan Since I am with Mike for a few days I thought I would work on this, if time allows. Can you please check everything in the sandbox is up to date (aka correct for what you're using now)? It should have test data and code.

Can you also add a real data example? Perhaps for birds or some relatively small clade?

Thank you!

DeirdreLoughnan commented 2 years ago

@lizzieinvancouver to clarify, do you want to work with Mike on the family level model we never got working, or on the species level model that I tried to add a grand mean and study level effect to (what I am using now)?

I assumed you mean the model with the grand mean/study, so I did push code/models and a real data example for this today. I had test data for the simple version of the model, but quickly tried to write some test data for a model with a grand mean and study, but it is currently producing a large Rhat and takes a very long time to run. I also wrote example code that focuses on just amphibians (which rungs very quickly) and birds (which still takes a few hours to run).

If this is wrong and you are interested in the family level model, I can work on that as well.

lizzieinvancouver commented 2 years ago

@DeirdreLoughnan I mean working on the family level model we never got working. So no study. Can you please post that (and move whatever you have there to a separate folder)? Thanks.

DeirdreLoughnan commented 2 years ago

Oh ok great! I have separated the code and taken a go and setting up some test data and examples using amphibian and aves data.

The models do not run well and produce divergent transitions. Do you want me to keep working on this and try to fix it? Or would it be better for you to look it over and chat with Mike about it first?

Thanks!

lizzieinvancouver commented 2 years ago

the species level model that I tried to add a grand mean and study level effect to (what I am using now)?

@DeirdreLoughnan Thanks for the new files. Just to clarify -- you did get the grand mean model running, right? I would have thought that would be easy. Adding study might be harder.

DeirdreLoughnan commented 2 years ago

@lizzieinvancouver I ran the model with the grand mean for the species group data, but for some of the models the Rhat's were too high (aves had an Rhat of 1.17 for example). I was running it for the full dataset, but I was getting some weird warnings on midge this week so I stopped the model run and restarted it to see if that helps.

I can try running it again and see, it does seem to take a long time to run though which might be a sign something is not yet right with it.

lizzieinvancouver commented 2 years ago

@DeirdreLoughnan You should get it running on a small (or test) dataset before running the full model. It should run the same as the regular model unless you have a coding error. Can you point me to the phylogeny model you're using (no grand mean, no study effect) that you believe is working and is your current `main' model? You may have already put it in the pmm repo -- if so, just remind me which files.

Thank you!

DeirdreLoughnan commented 2 years ago

@lizzieinvancouver ok, I will keep working on it.

The phylogeny model that I am using currently as the main model is:

pmm/analyses/sync/stan/phylogeny_synchrony_cholesky_geoffmod.stan

lizzieinvancouver commented 2 years ago

@DeirdreLoughnan Okay, I actually made some progress! I just pushed edits, but they need work: