what is the output of compare_fits and what does it mean?

uqrmaie1 / admixtools

https://uqrmaie1.github.io/admixtools

75 stars 14 forks source link

what is the output of compare_fits and what does it mean? #28

Open anubhabkhan opened 2 years ago

anubhabkhan commented 2 years ago

Hey,

I tried using Admixtools2 and I used the function compare_fits to test which qpgraph is a betterfit for my data. I used it in the following way: graph1=winner$edges[[1]] graph2=winner2$edges[[1]] fits = qpgraph_resample_multi(f2_blocks, list(graph1, graph2), nboot = 100) compare_fits(fits[[1]]$score_test, fits[[2]]$score_test)

I got the output:

$diff [1] 69.8988

$se [,1] [1,] 33.6833

$z [,1] [1,] 2.075177

$p [1] 0.03797017

$p_emp [1] 0.04

$p_emp_nocorr [1] 0.04

$ci_low [1] 5.121868

$ci_high [1] 144.4687

What does this mean? I am not able to find any explanation for this function the manuals. It would be very helpful if you could please explain. Thanks for so much the wonderful method.

Anubhab

uqrmaie1 commented 2 years ago

I added some documentation to the function. p_emp is the important item here, the other outputs can mostly be ignored. The function is simply a bootstrap test for testing whether the median score difference is different from zero. It's a bit awkward because the scores are log-likelihoods, but for this test all that matters is that the input vectors come from two models evaluated on bootstrap resampled SNP blocks (using the same resamplings for both models). You could also supply worst f-statistic residuals, or any other estimated model parameter as input.

p_emp is two times the fraction of bootstrap replicates where model 1 has a lower score than model 2 (or vice-vera, whichever is less), truncated at 1/(number of bootstrap replicates).

ci_low and ci_high are the 2.5% and 97.5% quantiles of the score difference distribution. And since the input vectors represent bootstrap replicates, that should be the same as the 2.5% and 97.5% confidence intervals of the score difference distribution.

anubhabkhan commented 2 years ago

Hey,

So if the p_emp is 0.05 do I interpret it as I would interpret any other p value? Would it mean model 1 is significantly better than model 2?

Thanks for the response

Anubhab

On Mon, 7 Nov 2022, 07:00 Robert Maier, @.***> wrote:

I added some documentation to the function. p_emp is the important item here, the other outputs can mostly be ignored. The function is simply a bootstrap test for testing whether the median score difference is different from zero. It's a bit awkward because the scores are log-likelihoods, but for this test all that matters is that the input vectors come from two models evaluated on bootstrap resampled SNP blocks (using the same resamplings for both models). You could also supply worst f-statistic residuals, or any other estimated model parameter as input.

p_emp is two times the fraction of bootstrap replicates where model 1 has a lower score than model 2 (or vice-vera, whichever is less), truncated at 1/(number of bootstrap replicates).

ci_low and ci_high are the 2.5% and 97.5% quantiles of the score difference distribution. And since the input vectors represent bootstrap replicates, that should be the same as the 2.5% and 97.5% confidence intervals of the score difference distribution.

— Reply to this email directly, view it on GitHub https://github.com/uqrmaie1/admixtools/issues/28#issuecomment-1304972189, or unsubscribe https://github.com/notifications/unsubscribe-auth/AFU7WMP6LOI75FNPA4MMOHDWHBESHANCNFSM6AAAAAARV67LPI . You are receiving this because you authored the thread.Message ID: @.***>

uqrmaie1 commented 2 years ago

Yes, you can interpret it like a p-value, but it's a two-sided test, so it tests whether either of the two models is significantly better than the other model.

anubhabkhan commented 2 years ago

I am extremely sorry but I am not able to understand. What values of p_emp would suggest model 1 is better than model 2 and what values would suggest vice versa. Could you please give a small example? It would be very helpful.

Thanks again for helping with this.

Anubhab

On Mon, 7 Nov 2022, 07:54 Robert Maier, @.***> wrote:

Yes, you can interpret it like a p-value, but it's a two-sided test, so it tests whether either of the two models is significantly better than the other model.

— Reply to this email directly, view it on GitHub https://github.com/uqrmaie1/admixtools/issues/28#issuecomment-1305000787, or unsubscribe https://github.com/notifications/unsubscribe-auth/AFU7WMIJTFKTWPACBWMZ2SDWHBK3ZANCNFSM6AAAAAARV67LPI . You are receiving this because you authored the thread.Message ID: @.***>

uqrmaie1 commented 2 years ago

Sorry for not being clear!

Low values of p_emp suggest that the difference in scores between the two models is unlikely to be due to the variability across SNP blocks. But this value will be low if model 1 has higher scores than model 2, or the other way round.

If you want to know which of the two models is better, you can look at whether the diff value is positive or negative. That's the estimate of the mean difference scores1 - scores2, so positive means model 1 has higher scores (is worse).

That difference will almost always have the same sign as the score difference that you get when evaluating both models the standard way (using all SNPs with qpgraph, not using bootstrapping with qpgraph_resample_multi). If you have already evaluated both models the standard way, you probably already know which of the two models has a lower score, and you just want to test whether that difference is significant. That's why I didn't emphasize the diff output here!

anubhabkhan commented 2 years ago

This is helpful! Thanks a lot for the prompt responses!

Anubhab

On Mon, 7 Nov 2022, 08:41 Robert Maier, @.***> wrote:

Sorry for not being clear!

Low values of p_emp suggest that the difference in scores between the two models is unlikely to be due to the variability across SNP blocks. But this value will be low if model 1 has higher scores than model 2, or the other way round.

If you want to know which of the two models is better, you can look at whether the diff value is positive or negative. That's the estimate of the mean difference scores1 - scores2, so positive means model 1 has higher scores (is worse).

That difference will almost always have the same sign as the score difference that you get when evaluating both models the standard way (using all SNPs with qpgraph, not using bootstrapping with qpgraph_resample_multi). If you have already evaluated both models the standard way, you probably already know which of the two models has a lower score, and you just want to test whether that difference is significant. That's why I didn't emphasize the diff output here!

— Reply to this email directly, view it on GitHub https://github.com/uqrmaie1/admixtools/issues/28#issuecomment-1305025188, or unsubscribe https://github.com/notifications/unsubscribe-auth/AFU7WMMZDWSRFXLT7OFZM53WHBQMBANCNFSM6AAAAAARV67LPI . You are receiving this because you authored the thread.Message ID: @.***>

ndussex commented 7 months ago

Thanks for the question and answer, but I may need some clarification for some of my results. One of my model (Nadmix = 1) has a score of 0.438 and the other (Nadmix = 2) has a score of 0.000603. However, when I run the qpgraph_resample_multi and copmare_fits functions, I get the following:

$diff [1] -3.223325 $se [1,] 6.29504 $z [1,] -0.5120419 $p [1] 0.6086217 $p_emp [1] 0.6 $p_emp_nocorr [1] 0.6 $ci_low [1] -22.78158 $ci_high [1] 2.032134

This suggests to me that none of the models is better than the other. note that I also get a similar result (i.e. no significant p. val.) when comparing these models with higher admixture events (up to 7).

Is my interpretation correct?

Thanks in advance!