pmglab / KGGSeq

MIT License
2 stars 0 forks source link

no *.gene.mutationburden.txt in the output #1

Open SheaCheng2000 opened 2 years ago

SheaCheng2000 commented 2 years ago

Hi!

I ran the RUNNER pipeline code, but I only got the excel file of variant annotations, with no qqplot, no *.gene.mutationburden.txt, which is described in https://pmglab.top/kggseq/doc10/UserManual.html#ITER.

Here is my command:

java -jar ./kggseq.jar \ --runner-gene-coding \ --runner-coding-gene-cov mu_mis,mu_lof,oe_mis,oe_lof,ExonGC \ --db-score dbnsfp \ --disease-causing-predict best \ --out /mnt/work/research/geyx/RUNNER/cx/example_out_RSAIGEenv_runner_javaenv \ --vcf-file /mnt/work/research/geyx/RUNNER/kggseqhg38/examples/simu100.coding.vcf.gz \ --ped-file /mnt/work/research/geyx/RUNNER/kggseqhg38/examples/simu100.ped \ --excel \ --nt 6 \ --buildver hg38 \ --hwe-all 0.001 \ --max-allele 4 \ --gty-qual 20.0 \ --gty-sec-pl 20 \ --gty-dp 8 \ --gty-af-ref 0.05 \ --gty-af-het 0.25 \ --gty-af-alt 0.75 \ --min-obsu-rate 0.9 \ --min-obsa-rate 0.9 \ --filter-case-maf-oe 0.1 \ --db-gene refgene,gencode \ --gene-feature-in 0,1,2,3,4,5,6 \ --db-filter gadexome.eas,gadgenome.eas \ --rare-allele-freq 0.01 \ --ignore-cnv \ --min-case-control-freq-ratio 3.0 \ --gene-freq-score eas \ --qqplot

And this is my log file example_out_RSAIGEenv_runner_javaenv.log

I noticed that in the log file, it reported 1 error:

ERROR 2022-09-08 22:11:02 - Sorry, I cannot connect to website to update kggseq and relevant resources! Please check your internet configurations!

I am wondering if the internet is the key point, and I don't know how to solve it. Could you please give me some advice?

Thanks a lot!!

Shea

limx54 commented 2 years ago

Sorry, the message type is not correct. It is just a warning NOT an error. If KGGSeq has problem to download the resource data, you can download the package with bundled resource data for the analysis, http://pmglab.top/kggseq/download.htm.

mangoJH commented 2 years ago

The main reason why RUNNER can not run successfully is that the number of genes remaining after truncation is too small, which does not meet its minimum number of requirements (>500), so the RUNNER analysis was stopped. INFO 2022-09-08 22:23:19 - The non-zero mutation counts of genes is 322 at truncation point 0. The analysis of RUNNER is stopped. Please confirm that you are using the appropriate reference genome version (hg38 or hg19). You can also try loose QC filtering to ensure that enough variants and genes are retained.

melnel000 commented 2 years ago

Hi

I am experiencing the same issue that this user describes, i.e. no qqplot, no *.gene.mutationburden.txt in the output.

For example:

INFO 2022-10-17 12:50:02 - 37109 variant(s) are retained after filtering by gene features. INFO 2022-10-17 12:50:02 - 29297 variant(s) exist in gnomad.exome. INFO 2022-10-17 12:50:02 - 29622 variant(s) exist in gnomad.genome. INFO 2022-10-17 12:50:02 - 36876 variant(s) with minor allele frequency [0, 0.01) in the reference datasets above are retained! INFO 2022-10-17 12:50:02 - 36162 coding nonsynonymous variants are assigned functional prediction scores. INFO 2022-10-17 12:50:02 - 17874 variants (in 9208 genes) are predicted to be disease-causal; 19002 variants are predicted to be non-disease-causal according to the Logistic regression prediction model trained by ExoVar dataset (http://pmglab.top/kggseq/download/ExoVar.xls) INFO 2022-10-17 12:50:02 - ------------------------------------------------------------ INFO 2022-10-17 12:50:14 - Finally, 36876 variants are saved in /cbio/projects/003/melissa/black_als_analysis/RUNNER/sa_als_runner_analysis_coding_v8.flt.xlsx with Excel format.

In other cases I don't get the output because the number of remaining genes is too small (here I get the error message unlike the first example):

INFO 2022-10-17 12:48:04 - 22956 variant(s) are retained after filtering by gene features. INFO 2022-10-17 12:48:04 - 19384 variant(s) exist in gnomad.exome. INFO 2022-10-17 12:48:04 - 19377 variant(s) exist in gnomad.genome. INFO 2022-10-17 12:48:04 - 22825 variant(s) with minor allele frequency [0, 0.01) in the reference datasets above are retained! INFO 2022-10-17 12:48:04 - 104 coding nonsynonymous variants are assigned functional prediction scores. INFO 2022-10-17 12:48:04 - 52 variants (in 49 genes) are predicted to be disease-causal; 22773 variants are predicted to be non-disease-causal according to the Logistic regression prediction model trained by ExoVar dataset (http://pmglab.top/kggseq/download/ExoVar.xls) INFO 2022-10-17 12:48:04 - ------------------------------------------------------------ INFO 2022-10-17 12:48:14 - Finally, 22825 variants are saved in /cbio/projects/003/melissa/black_als_analysis/RUNNER/sa_als_runner_analysis_coding_v7.flt.xlsx with Excel format.

INFO 2022-10-17 12:48:14 - The non-zero mutation counts of genes is 13 at truncation point 0. The analysis of RUNNER is stopped.

Any suggestions on what the issue might be?

Thanks, Melissa

mangoJH commented 2 years ago

I noticed that The non-zero mutation counts of genes is 13 at truncation point 0. The analysis of RUNNER is stopped. in your log file. But there are 22825 variants retained after your QC filtering. Are you added some other critical filtering criteria, such as --filter-case-maf-oe 0.1 and --min-case-control-freq-ratio 3.0. Variants that do not meet these filtering are also recorded in the *.flt.xlsx file, but are not used in the RUNNER analysis. Please check it.