Can SMMAT work for small sample size?

jxcao98 commented 10 months ago

Hi,

Thanks for your nice tool!

In your paper in the AJHG, you discussed that the tool might not be suitable for small samples.

"SMMAT p values are computed based on asymptotic distributions, which may be not be accurate in small samples, especially for binary traits and heavily skewed continuous traits."

However, I noticed the "davies" method was used to calculate pvalues in the GMMAT R package description, which is an exact method.

"davies" represents an exact method that computes a p-value by inverting the characteristic function of the mixture chisq distribution, with an accuracy of 1e-6. When "davies" p-value is less than 1e-5, it defaults to method "kuonen".

So I'm wondering if SMMAT works for small sample size? I have some cohorts containing no more than 1000 participants, all of whom have binary and continuous traits. Can I employ SMMAT for rare variant burden tests?

By the way, should PCs included as covariances since the GRM has been calculated?

Thanks in advance!

Best regards, Jixin

hanchenphd commented 10 months ago

Hi Jixin,

Thank you for your interest. Whether/how each rare variant test would work for your sample size depends on a lot of factors, such as the distribution of your phenotype, as well as the allele frequencies. If the distribution is not heavily skewed, then the burden test is probably okay, but you will need to check your results carefully.

I always recommend adjusting for top PCs as fixed-effects covariates.

Best, Han

jxcao98 commented 10 months ago

Thanks for your quick and comprehensive reply!

When conducting rare variant burden tests on a cohort of 1,200 individuals with binary traits (the ratio of cases to controls is roughly 1.6:1), I got some atypical Q-Q plots by O.pval (only genes with n.variants ≥ 3 are shown). However, the inflation factors seem not to show deflation, even greater than 1 (lambda = 1.008 for synonymous variants). Does this mean that the model is not fitting well enough?

Are there any other flags that might indicate how good a model fit is?

Thank you once again for your guidance on this matter! QQPlot

hanchenphd commented 10 months ago

I would like to clarify that O.pval is not a burden test. If you are interested in the burden test, you would check out B.pval instead. See ?SMMAT for details.

jxcao98 commented 10 months ago

I apologize for the confusion in my description. I have been focusing on SKAT-O and mistakenly referred to it as a burden test. I also conduct burden tests only to get the effect size.

In my tests, SMMAT is robust and very efficient. However, in some tests for binary traits, SMMAT SKAT-O would get "deflation" Q-Q plots, which made me worry if this was due to the small sample size. I also tried the SKAT R package, which by default adjusts for small samples, and I noticed the Q-Q plot is closer to the diagonal.

So I would like to ask if this Q-Q plot looks normal and if there is any other way to indicate how good the model fit is.

Thank you once again for your valuable guidance!

hanchenphd commented 10 months ago

If you were referring to the 4 QQ plots above, they did not seem deflated to me as most p-values were within the confidence intervals. Just my two cents.

jxcao98 commented 10 months ago

So... is there anything other than Q-Q plots to indicate how well the model fits?

hanchenphd commented 10 months ago

The function SMMAT only gives p-values.

jxcao98 commented 10 months ago

Thanks very much for your great tool and your kind guidance. I have no other questions and will close this issue.

hanchenphd / GMMAT

Can SMMAT work for small sample size? #60