lizzieinvancouver / pmm

phylogenetic mixed models -- and we all learn together
2 stars 1 forks source link

Effects of lambda #23

Open DeirdreLoughnan opened 1 year ago

DeirdreLoughnan commented 1 year ago

@lizzieinvancouver I have been running the phylogeny model with different lambda values to test whether the value of lambda effects the models performance. In particular, I am comparing a model run with: Lambda =0 for both the intercept and slope, Lambda = 0.2, and Lambda = 0.8, and comparing it to a model and test data generated without lambda's in the model.

  1. Lambda = 0 for both the intercept and slope
            parameters       mean       X2.5.        X25.        X50.        X75.      X97.5.    n_eff      Rhat testV
1            sigma_y 0.00912269 0.008449808 0.008876798 0.009111302 0.009350369 0.009856429 4247.996 0.9995874  0.01
2    lam_interceptsa 0.15296127 0.003769640 0.043666877 0.106507603 0.220598602 0.538788074 2884.150 1.0002637  0.00
3  sigma_interceptsa 0.24406920 0.189312179 0.219046488 0.238455210 0.263690512 0.329026818 3380.403 0.9993758  0.20
4    lam_interceptsb 0.13804947 0.003561588 0.037966482 0.094608964 0.193878026 0.506469919 2649.433 1.0012650  0.00
5  sigma_interceptsb 0.08757128 0.068345211 0.078908642 0.085611625 0.093913563 0.120093188 3237.615 1.0004979  0.10
46               b_z 0.57701172 0.534653773 0.564018526 0.576767434 0.589453028 0.619730992 2727.220 1.0012232  0.60
87               a_z 3.96074967 3.832045555 3.924414578 3.959967227 3.996341495 4.085511452 3167.207 1.0004212  4.00

There were no issues with this model run, but the estimates for the lambda are a bit off. The species level estimates look okay:

Screen Shot 2023-10-04 at 11 17 34 PM
  1. Lambda = 0.2 for both the intercept and slope
          parameters       mean       X2.5.        X25.        X50.       X75.     X97.5.    n_eff      Rhat testV
1            sigma_y 0.00988751 0.009146666 0.009620622 0.009872289 0.01013604 0.01072371 3832.745 1.0013451  0.01
2    lam_interceptsa 0.27683467 0.007458079 0.105120926 0.242997542 0.41764457 0.71925039 2537.579 1.0001190  0.20
3  sigma_interceptsa 0.20384209 0.153666314 0.180776370 0.199138656 0.22129081 0.28303299 2693.728 0.9996709  0.20
4    lam_interceptsb 0.46819047 0.059077751 0.294001560 0.475591544 0.64207687 0.87452337 2896.388 0.9996435  0.20
5  sigma_interceptsb 0.11528848 0.086258060 0.101527129 0.111825915 0.12556136 0.16427702 2880.073 0.9998503  0.10
46               b_z 0.59433007 0.500313434 0.570154177 0.597259190 0.62199951 0.67536819 3999.117 0.9995980  0.60
87               a_z 3.95448998 3.807442501 3.918471252 3.957307388 3.99333556 4.08262300 2801.249 0.9995250  4.00

This model runs fine, but again the estimates for lam_interceptsa are not great. The plot of the species level estimates is similar to the above.

  1. Lambda = 0.8 for both the intercept and slope
          parameters        mean       X2.5.        X25.        X50.       X75.     X97.5.    n_eff      Rhat testV
1            sigma_y 0.009895051 0.009167393 0.009618613 0.009884478 0.01016053 0.01068552 5063.927 0.9999321  0.01
2    lam_interceptsa 0.690164734 0.254476227 0.569894921 0.724267893 0.84461089 0.96097189 3312.784 1.0016373  0.80
3  sigma_interceptsa 0.202106073 0.147294409 0.176556353 0.198380572 0.22342611 0.28022172 2946.042 1.0023932  0.20
4    lam_interceptsb 0.727167841 0.341245620 0.630428848 0.757903639 0.85188542 0.96209512 3434.627 1.0007209  0.80
5  sigma_interceptsb 0.111682771 0.081954343 0.098236839 0.109370769 0.12207386 0.15478700 2748.237 1.0006963  0.10
46               b_z 0.588543277 0.491890186 0.557025821 0.587412048 0.61906415 0.69013382 4498.250 1.0000421  0.60
87               a_z 4.129370214 3.960697926 4.076429311 4.129792149 4.18186132 4.29718818 4848.594 1.0003763  4.00
  1. The model with no lambda:
   parameters       mean       X2.5.        X25.        X50.       X75.    X97.5.    n_eff      Rhat testV
1     sigma_y 0.00985885 0.009131382 0.009582857 0.009838708 0.01013554 0.0106794 4833.199 1.0009398  0.01
2        mu_a 4.03936301 3.972960692 4.018703731 4.039389510 4.06078290 4.1013598 6689.424 0.9996430  4.00
43    sigma_a 0.19930559 0.159369625 0.181845017 0.196604565 0.21358547 0.2543764 5059.139 0.9995760  0.20
44       mu_b 0.61920342 0.592414464 0.610057048 0.619209676 0.62813179 0.6459676 5837.980 1.0002099  0.60
45    sigma_b 0.08886636 0.071302751 0.081085476 0.087903379 0.09519112 0.1123692 5144.533 0.9995502  0.10

I have pushed the code I used to the pmm repo here, in which I run a model with lambda and without. I saved the summary output as csv files and added them to the output folder and figures like the one above for each model to the figures folder (although they all look similar).

lizzieinvancouver commented 1 year ago

@DeirdreLoughnan Thanks! If I am understanding your code though I am not sure this is a good test to compare the models. First, we need multiple runs (that's what I meant when I said 10 reps) to have sense of what is going on, and second it looks like you build data under different models for the the lambda and no-lambda runs -- is that correct?

Can you update the README with the new R file and mention where you got the model and which Stan model you tried that had divergences issues? Then we have a record.

lizzieinvancouver commented 1 year ago
> x <- rnorm(100, 4, 5)
> y <- rnorm(100, 4, 5)
> mymodelinfo <- summary(lm(y~x))
> # str(mymodelinfo)
> mymodelinfo["r.squared"]
$r.squared
[1] 0.001323001
DeirdreLoughnan commented 1 year ago

@lizzieinvancouver I am putting a pin in this for today, but think I have the code we want to run to test this. I started to think about the figures, but did not get very far.

I think I have it set up to make comparisons easier and help with plotting.

The final datasets (sumDatL for the lamdba models and sumDatN for the model without Lambda) includes the true values of the parameters and the test data values for the slopes and intercepts.

This code is not super sophisticated and currently is saving everything ( the RDA and csv summary model output,and test data used to run the models for each pair of runs). I am currently running it on midge for 40 spp and 10 indiv and can post the two final csv files with all the models outputs once it is finished running.

lizzieinvancouver commented 1 year ago

@DeirdreLoughnan Well done on two accounts! It sounds like you have nice code, and I salute you putting this aside on such a lovely day on a holiday weekend.

It would be great to see the output when it is done. I was thinking of no magical figures -- just maybe boxplots that tell how the two models compare. I am happy to work on this.

DeirdreLoughnan commented 1 year ago

@lizzieinvancouver I pushed the two output files to the analyses/output folder, mdlOutNoLambdaRepped.csv and mdlOutLambdaRepped.csv).

I also did a quick check in the code (testLambdaEffect.R) to see if the R squares differ when the lambda and no lambda model species level estimates are compared to the known test values and really there was no difference! But I look forward to seeing what the figures look like! Thanks for your help finishing this!

lizzieinvancouver commented 1 year ago

@DeirdreLoughnan Thanks!

lizzieinvancouver commented 1 year ago

@DeirdreLoughnan I haven't looked at the data yet, but it just occurred to me that the simple model might do quite well with evenly sampled spp.... anyway, I will let you know what I find. Thanks again!

lizzieinvancouver commented 1 year ago

@DeirdreLoughnan I just pushed some edits, but basically I got the same answer as you. The answers are actually SO similar it's crazy, but I don't see anything wrong in the code. Nice work!

I edited the R file (can you check my edit to the comment on line 76/84 is correct? see here) ... I am not sure the FLAG part I added is working, but you could easily remove it (and make the plotting code just (if(FALSE)) and I adjusted nind to be uneven (in a way where it's easy to make it even). When you have time, could you possibly run the updated code on the server (and save the current and new results)? I cannot seem to get VPN currently.

DeirdreLoughnan commented 1 year ago

@lizzieinvancouver That is so interesting, I agree that values are so similar!

I could not find anything wrong with your edits and the flags worked fine. I just pushed the results using uneven nind (mdlOutLambdaUneven.csv and mdlOutNoLambdaUneven.csv) for 40 sp.

lizzieinvancouver commented 1 year ago

@DeirdreLoughnan Thanks! This is super interesting to me too. One thing I started wondering is how uneven ecological sampling is. So I checked the phylo OSPREE data:

Ran phylo_ospree_LeaveCladeOut_models.R to line 229 then:

obsperspp <- aggregate( d["resp"], d["spps"], FUN=length)
hist(obsperspp$resp, breaks=100, xlab="OSPREE phylo data: n per sp", main="")

And got this plot:

Rplot

Contrasted with our sim code:

Rplot2

lizzieinvancouver commented 1 year ago

So, I think I picked the wrong distribution for simulating uneven data that looks like ours, but all the same.... I ran the basic comparisons and you can see at lamba=1, the PMM starts to out-perform the HM a little:

RplotSims

> # Smaller is better (less deviations from true value)
> aggregate(sppDatN["intabstrue"], sppDatN["lambda"], FUN=sum)
  lambda intabstrue
1    0.0   5.471262
2    0.2   5.648249
3    1.0   5.387421
> aggregate(sppDatL["intabstrue"], sppDatL["lambda"], FUN=sum)
  lambda intabstrue
1    0.0   5.615144
2    0.2   5.739171
3    1.0   4.008772
> 
> aggregate(sppDatN["slopeabstrue"], sppDatN["lambda"], FUN=sum)
  lambda slopeabstrue
1    0.0    0.5771075
2    0.2    0.5678893
3    1.0    0.5455379
> aggregate(sppDatL["slopeabstrue"], sppDatL["lambda"], FUN=sum)
  lambda slopeabstrue
1    0.0    0.5903542
2    0.2    0.5825415
3    1.0    0.3900198
> # Bigger is better (50% interval captures true value more often)
> aggregate(sppDatN["intntrue"], sppDatN["lambda"], FUN=sum)
  lambda intntrue
1    0.0      206
2    0.2      190
3    1.0      202
> aggregate(sppDatL["intntrue"], sppDatL["lambda"], FUN=sum)
  lambda intntrue
1    0.0      205
2    0.2      193
3    1.0      204
> 
> aggregate(sppDatN["slopentrue"], sppDatN["lambda"], FUN=sum)
  lambda slopentrue
1    0.0        192
2    0.2        194
3    1.0        196
> aggregate(sppDatL["slopentrue"], sppDatL["lambda"], FUN=sum)
  lambda slopentrue
1    0.0        195
2    0.2        192
3    1.0        201

But they're basically neck and neck when lambda is small.

lizzieinvancouver commented 1 year ago

@DeirdreLoughnan I suspect as the sampling becomes less even it could change the level at which the lambda model outperforms the HM, which is interesting. But for now I think we have done more than enough on this -- thank you!