rgcgithub / regenie

regenie is a C++ program for whole genome regression modelling of large genome-wide association studies.
https://rgcgithub.github.io/regenie
Other
185 stars 55 forks source link

Fraction of genes have no gene_p results #518

Open Mathias0077 opened 6 months ago

Mathias0077 commented 6 months ago

Hi Joelle,

I hope you're well. I'd be grateful if you could please help me with a problem in gene based burden testing using the --rgc-gene-p command.

The output to the console reports no issue or problem during the run but in the results file we find the gene_p statistics only for a fraction (~60%) of all genes. I paste you an example output of results (i, gene SPTLC1) with gene_p and (ii, gene SUSD3) without gene_p results. While there is no output of a problem on insufficient data/ rare allele counts or similar, I wonder, what happened and if there might a way to get information about the problem to potentially solve it and to have the gene_p test for all genes? By the way, we cannot observe a pattern, that for example, only for genes with a low number of rare variants the gene_p is not in the results file - this also happens for genes with a higher number of rare variants compared to other genes with a lower number of rare variants and gene_p in the results. And we do the analyses for all autosomal genes for a balanced (1:1) cases/ control outcome.

Any help is highly appreciated.

Thank you, best Mathias

grafik

joellembatchou commented 5 months ago

Hi Mathias,

Can you include the REGENIE log as well as the full sumstats (ie across all masks & tests) for the gene “SUSD3” that has no GENE_P results?

Thank you, Joelle

From: Mathias Gorski @.> Date: Tuesday, April 23, 2024 at 7:32 AM To: rgcgithub/regenie @.> Cc: Subscribed @.***> Subject: [External] [rgcgithub/regenie] Fraction of genes have no gene_p results (Issue #518) Hi Joelle, I hope you're well. I'd be grateful if you could please help me with a problem in gene based burden testing using the --rgc-gene-p command. The output to the console reports no issue or problem during the run but in the results file

Hi Joelle,

I hope you're well. I'd be grateful if you could please help me with a problem in gene based burden testing using the --rgc-gene-p command.

The output to the console reports no issue or problem during the run but in the results file we find the gene_p statistics only for a fraction (~60%) of all genes. I paste you an example output of results (i, gene SPTLC1) with gene_p and (ii, gene SUSD3) without gene_p results. While there is no output of a problem on insufficient data/ rare allele counts or similar, I wonder, what happened and if there might a way to get information about the problem to potentially solve it and to have the gene_p test for all genes? By the way, we cannot observe a pattern, that for example, only for genes with a low number of rare variants the gene_p is not in the results file - this also happens for genes with a higher number of rare variants compared to other genes with a lower number of rare variants and gene_p in the results. And we do the analyses for all autosomal genes for a balanced (1:1) cases/ control outcome.

Any help is highly appreciated.

Thank you, best Mathias

grafik.png (view on web)https://urldefense.com/v3/__https:/github.com/rgcgithub/regenie/assets/103117148/3cec0e21-cad5-4c97-a558-585deca8931b__;!!ODpDvJZr5w!FCaNWnrujC8W2_kUxdvJT-I1p4eiu4wRpS2KNUq49lG9bVKIRrrpEqdhgi5Tomf8r43ViRkNF9qvZuPQAtGUCOClCA54Fyc$

— Reply to this email directly, view it on GitHubhttps://urldefense.com/v3/__https:/github.com/rgcgithub/regenie/issues/518__;!!ODpDvJZr5w!FCaNWnrujC8W2_kUxdvJT-I1p4eiu4wRpS2KNUq49lG9bVKIRrrpEqdhgi5Tomf8r43ViRkNF9qvZuPQAtGUCOClvg7Ybrg$, or unsubscribehttps://urldefense.com/v3/__https:/github.com/notifications/unsubscribe-auth/APRNX2FRAHVCJDV3TOGAHHLY6ZIGJAVCNFSM6AAAAABGUZNPY6VHI2DSMVQWIX3LMV43ASLTON2WKOZSGI2TQNZWGA4DOMY__;!!ODpDvJZr5w!FCaNWnrujC8W2_kUxdvJT-I1p4eiu4wRpS2KNUq49lG9bVKIRrrpEqdhgi5Tomf8r43ViRkNF9qvZuPQAtGUCOCl4fWDMeY$. You are receiving this because you are subscribed to this thread.Message ID: @.***>


This e-mail and any attachment hereto, is intended only for use by the addressee(s) named above and may contain legally privileged and/or confidential information. If you are not the intended recipient of this e-mail, any dissemination, distribution or copying of this email, or any attachment hereto, is strictly prohibited. If you receive this email in error please immediately notify me by return electronic mail and permanently delete this email and any attachment hereto, any copy of this e-mail and of any such attachment, and any printout thereof. Finally, please note that only authorized representatives of Regeneron Pharmaceuticals, Inc. have the power and authority to enter into business dealings with any third party.


Mathias0077 commented 5 months ago

Hi Joelle,

thank you for looking into this. The summary stats are those posted in my initial message. I added the logfile to this thread. There are 94 and 35 variants in the genes and the exepected information is provided for both genes:

 -reading in genotypes, computing gene-based tests and building masks...done (xxx ms) 
 -computing association tests...done (xx ms) 
 -computing joint association tests...done (x ms) 

the logfiles do not report an error this. I hope this information is helpful.

Best, Mathias

tocheck.log

joellembatchou commented 4 months ago

Hi,

Can you re-run just for gene "SUSD3" including the option --debug? Please send both the output from stdout & stderr.