uqrmaie1 / admixtools

https://uqrmaie1.github.io/admixtools
71 stars 14 forks source link

Suggestion: Change to p_emp calculation in compare_fits() #75

Closed YourePrettyGood closed 1 month ago

YourePrettyGood commented 1 month ago

Hi Robert, I've been working on some admixture graph inference using find_graphs() following the suggested protocol in your eLife paper. In step 2, while comparing some of the best-fitting graphs at a fixed n_admix using the bootstrap test in compare_fits(), I was curious to see how the bootstrap p-value was calculated, and saw your note about truncating the p-value to avoid p_emp==0.

My suggestion is to change the way that p_emp is evaluated to be consistent with Phipson & Smyth 2010, which is effectively to adjust R/qpgraph.R:695-696 to be:

frac1 = (sum(scorediff > 0) + 1)/(length(scorediff) + 1)
frac2 = (sum(scorediff < 0) + 1)/(length(scorediff) + 1)

As the paper says, these are exact p-values for Monte Carlo tests like the bootstrap that will never evaluate to 0. (Well, probably not exact due to the same argument as in section 5 of the paper, but at least closer to the appropriate size than the unbiased estimator that can evaluate to 0.)

Thanks for all your hard work on ADMIXTOOLS2! It's incredibly useful, and (no offense intended) so much easier to work with than ADMIXTOOLS 1.

Best regards, Patrick Reilly

uqrmaie1 commented 1 month ago

Hi Patrick, thanks for pointing me to this paper! I agree that this is a better way to calculate permutation p-values, and I changed the code accordingly. So p_emp is now p_emp = min(frac1, frac2)*2 with frac1 and frac2 defined as above. Earlier versions, and the current p_emp_nocorr don't add + 1 in the numerator and denominator.