About rank-based inverse normal transformation

xihaoli / STAARpipeline-Tutorial

The tutorial for performing single-/multi-trait association analysis of whole-genome/whole-exome sequencing (WGS/WES) studies using FAVORannotator, STAARpipeline and STAARpipelineSummary

GNU General Public License v3.0

21 stars 17 forks source link

About rank-based inverse normal transformation #21

Closed hmutanqilong closed 1 year ago

hmutanqilong commented 1 year ago

Dear Xihao, I have some questions about when I perform rank-based inverse normal transformation for my phenotype before fitting the null model. The phenotype is directly performed to the inverse normal transformation? like using the R package RNOmni. OR I need to perform two steps. The first step is that the null model is fit using the original outcome values. And the second step is using residuals to rank-based inverse Normal transform, and then fit the model again, but it seems not to be rescaled. So, which one is more recommended?

Best, Qilong

xihaoli commented 1 year ago

Hi Qilong,

Thanks for your question. Directly transforming the phenotype using the inverse normal transformation is referred to as D-INT, while transforming the phenotype residuals using the inverse normal transformation is referred to as I-INT, both are valid and illustrated with more details in the RNOmni paper. In the STAAR paper, the inverse normal transformation was applied on the phenotype residuals and we then rescaled the inverse normal transformed residuals to the original scale of the phenotype before fitting the null (mixed) model. More details can be found in the STAAR paper.

Hope this is clear.

Best, Xihao

hmutanqilong commented 1 year ago

Thanks for your explanation. And I found that the package GENESIS can also apply the strategy mentioned in the STAAR paper during fitting the null model. Then I can apply the function genesis2staar_nullmodel to convert for STAAR's class.

obj.model.genesis = fitNullModel(Pheno.df, outcome = 'Pheno', 
                                 covars = c('Age' ,'Age2' ,'Sex' ,'PC1' ,'PC2' ,'PC3' ,'PC4' ,'PC5' ,'PC6' ,'PC7' ,'PC8' ,'PC9' ,'PC10'),
                                 cov.mat = Sparse_GRM, family = "gaussian", AIREML.tol=1e-4,
                                 norm.option = 'all', two.stage = T, rescale = 'residSD', verbose = T)
obj_nullmodel_staar <- genesis2staar_nullmodel(obj.model.genesis)

Would it be all right?

xihaoli commented 1 year ago

Hi Qilong,

Yes, that's correct and it is one of the two recommended strategies to fit a null model in the STAARpipeline tutorial.

Best, Xihao

hmutanqilong commented 1 year ago

Very grateful! Xihao. I think I should solve this problem.