RESULT: gene centric coding

DamienTan commented 1 year ago

Dear Dr. Li When I ran the gene_centric_coding analysis, I find that the Q-Q plot is strange. Some of first half lines are under the red line, instead of closing the red line. Hear are the manhattan and Q-Q plot of gene_centric_coding analysis. Is that a bad analysis result? If is bad, what I should do to adjust the analysis step. Hope to receive your reply.

DamienTan commented 1 year ago

Following the question above, are there any parameter to change the manhattan plot color? It seems to be just blue and orange. Should it change the colors I want by some parameters?

xihaoli commented 1 year ago

Hi Damien,

Thanks for the follow-up. For your questions:

Based on your attached manhattan and Q-Q plots, I suggest you look at the null model fitting step. For example, are there any important covariates you did not include as confounders? Are there any higher-order effects (e.g. age^2) you did not include in the null model? If your phenotype is continuous, did you perform rank-based inverse normal transformation before fitting the null model (for more details, please see the STAAR paper), etc.? These are some of the thoughts that may potentially produce the results not looking as good as a well-calibrated model.
When you try to test the good-of-fit of another null model, you may first draw the manhattan and Q-Q plots for individual (single-variant) analysis. If the plots already look not ideal, then you don't need to proceed with gene-centric analysis and other rare variant analyses.
Yes, it is possible to change the manhattan plot color. To do this, you may need to fork the STAARpipelineSummary repo and customize your own manhattan_plot() function. In this way, you may change the color of the manhattan plot, as well as other display options.

Best, Xihao

DamienTan commented 1 year ago

Dear Dr. Li, Thanks for your reply! I have run the individual analysis and I already knew how to change the plot color, here is the manhattan plot of individual analysis. Maybe it seems not bad. I used z-score to transform my phenotype data instead of using rank-based inverse normal transformation and I did not add age^2 , here are the covariates I used obj_nullmodel <- fit_nullmodel(PLT_zscore~age+sex+first_transfusion+pc1+pc2+pc3+pc4+pc5+pc6+pc7+pc8+pc9+pc10,data=phenotype, kins=sgrm,use_sparse=TRUE,kins_cutoff=0.022,id="sample.id",family=gaussian(link="identity"),verbose=TRUE)

for the results of gene_centric_noncoding analysis, the Q-Q plot seems not bad. Maybe it's my delusion.( ⊙‿⊙) In another issue I submitted #17 , I ran the gene_centric_noncoding_annotation.r script and there are some confusion about the result. In noncoding_sig.csv, the top one gene(most significant) is UEB3A. There are 9 rare variants in UEB3A and I knew what these 9 variants are after annotation. And I am confused with the difference of the p-value in noncoding_sig.csv and its annotation result. Why the UEB3A gene are very significant (p-value=5.6e-10) in noncoding_sig.csv, but when we focus on the RVs in UEB3A, their p-values are not all significant(I think their p-values may all be very small). Please correct my misunderstanding. At last, I want to know if it can give a phenotype variantion explained(PVE,maybe be discribed as R^2) after running the rare variant analysis in STAARpipeline. I knew R^2 just be suitable for the traditional GWAS (common and low-frequency snps), for rare variant analysis, are there some concepts like this that RVs can explain how many percent of phenotype? Sorry to ask you so many questions and take up so much of your time.

xihaoli commented 1 year ago

Hi Damien,

For your new questions, I think it would be better to discuss via email, as you may or may not want to share additional results here in public.

Please feel free to send an email. Hope this helps.

Best, Xihao

DamienTan commented 1 year ago

Got it. I will send you an email about my new questions and I will delete some pictures I uploaded above. Thanks for reminding me.

xihaoli / STAARpipeline-Tutorial

RESULT: gene centric coding #18