Open DeirdreLoughnan opened 1 year ago
@DeirdreLoughnan Thanks! If I am understanding your code though I am not sure this is a good test to compare the models. First, we need multiple runs (that's what I meant when I said 10 reps) to have sense of what is going on, and second it looks like you build data under different models for the the lambda and no-lambda runs -- is that correct?
Can you update the README with the new R file and mention where you got the model and which Stan model you tried that had divergences issues? Then we have a record.
> x <- rnorm(100, 4, 5)
> y <- rnorm(100, 4, 5)
> mymodelinfo <- summary(lm(y~x))
> # str(mymodelinfo)
> mymodelinfo["r.squared"]
$r.squared
[1] 0.001323001
@lizzieinvancouver I am putting a pin in this for today, but think I have the code we want to run to test this. I started to think about the figures, but did not get very far.
I think I have it set up to make comparisons easier and help with plotting.
The final datasets (sumDatL for the lamdba models and sumDatN for the model without Lambda) includes the true values of the parameters and the test data values for the slopes and intercepts.
This code is not super sophisticated and currently is saving everything ( the RDA and csv summary model output,and test data used to run the models for each pair of runs). I am currently running it on midge for 40 spp and 10 indiv and can post the two final csv files with all the models outputs once it is finished running.
@DeirdreLoughnan Well done on two accounts! It sounds like you have nice code, and I salute you putting this aside on such a lovely day on a holiday weekend.
It would be great to see the output when it is done. I was thinking of no magical figures -- just maybe boxplots that tell how the two models compare. I am happy to work on this.
@lizzieinvancouver I pushed the two output files to the analyses/output folder, mdlOutNoLambdaRepped.csv and mdlOutLambdaRepped.csv).
I also did a quick check in the code (testLambdaEffect.R) to see if the R squares differ when the lambda and no lambda model species level estimates are compared to the known test values and really there was no difference! But I look forward to seeing what the figures look like! Thanks for your help finishing this!
@DeirdreLoughnan Thanks!
@DeirdreLoughnan I haven't looked at the data yet, but it just occurred to me that the simple model might do quite well with evenly sampled spp.... anyway, I will let you know what I find. Thanks again!
@DeirdreLoughnan I just pushed some edits, but basically I got the same answer as you. The answers are actually SO similar it's crazy, but I don't see anything wrong in the code. Nice work!
I edited the R file (can you check my edit to the comment on line 76/84 is correct? see here) ... I am not sure the FLAG part I added is working, but you could easily remove it (and make the plotting code just (if(FALSE)
) and I adjusted nind to be uneven (in a way where it's easy to make it even). When you have time, could you possibly run the updated code on the server (and save the current and new results)? I cannot seem to get VPN currently.
@lizzieinvancouver That is so interesting, I agree that values are so similar!
I could not find anything wrong with your edits and the flags worked fine. I just pushed the results using uneven nind (mdlOutLambdaUneven.csv and mdlOutNoLambdaUneven.csv) for 40 sp.
@DeirdreLoughnan Thanks! This is super interesting to me too. One thing I started wondering is how uneven ecological sampling is. So I checked the phylo OSPREE data:
Ran phylo_ospree_LeaveCladeOut_models.R to line 229 then:
obsperspp <- aggregate( d["resp"], d["spps"], FUN=length)
hist(obsperspp$resp, breaks=100, xlab="OSPREE phylo data: n per sp", main="")
And got this plot:
Contrasted with our sim code:
So, I think I picked the wrong distribution for simulating uneven data that looks like ours, but all the same.... I ran the basic comparisons and you can see at lamba=1, the PMM starts to out-perform the HM a little:
> # Smaller is better (less deviations from true value)
> aggregate(sppDatN["intabstrue"], sppDatN["lambda"], FUN=sum)
lambda intabstrue
1 0.0 5.471262
2 0.2 5.648249
3 1.0 5.387421
> aggregate(sppDatL["intabstrue"], sppDatL["lambda"], FUN=sum)
lambda intabstrue
1 0.0 5.615144
2 0.2 5.739171
3 1.0 4.008772
>
> aggregate(sppDatN["slopeabstrue"], sppDatN["lambda"], FUN=sum)
lambda slopeabstrue
1 0.0 0.5771075
2 0.2 0.5678893
3 1.0 0.5455379
> aggregate(sppDatL["slopeabstrue"], sppDatL["lambda"], FUN=sum)
lambda slopeabstrue
1 0.0 0.5903542
2 0.2 0.5825415
3 1.0 0.3900198
> # Bigger is better (50% interval captures true value more often)
> aggregate(sppDatN["intntrue"], sppDatN["lambda"], FUN=sum)
lambda intntrue
1 0.0 206
2 0.2 190
3 1.0 202
> aggregate(sppDatL["intntrue"], sppDatL["lambda"], FUN=sum)
lambda intntrue
1 0.0 205
2 0.2 193
3 1.0 204
>
> aggregate(sppDatN["slopentrue"], sppDatN["lambda"], FUN=sum)
lambda slopentrue
1 0.0 192
2 0.2 194
3 1.0 196
> aggregate(sppDatL["slopentrue"], sppDatL["lambda"], FUN=sum)
lambda slopentrue
1 0.0 195
2 0.2 192
3 1.0 201
But they're basically neck and neck when lambda is small.
@DeirdreLoughnan I suspect as the sampling becomes less even it could change the level at which the lambda model outperforms the HM, which is interesting. But for now I think we have done more than enough on this -- thank you!
@lizzieinvancouver I have been running the phylogeny model with different lambda values to test whether the value of lambda effects the models performance. In particular, I am comparing a model run with: Lambda =0 for both the intercept and slope, Lambda = 0.2, and Lambda = 0.8, and comparing it to a model and test data generated without lambda's in the model.
There were no issues with this model run, but the estimates for the lambda are a bit off. The species level estimates look okay:
This model runs fine, but again the estimates for lam_interceptsa are not great. The plot of the species level estimates is similar to the above.
I have pushed the code I used to the pmm repo here, in which I run a model with lambda and without. I saved the summary output as csv files and added them to the output folder and figures like the one above for each model to the figures folder (although they all look similar).