ikalatskaya / ISOWN

Apache License 2.0
44 stars 15 forks source link

Error about dbsnp #22

Open solivehong opened 5 years ago

solivehong commented 5 years ago

I have 13 lung cancer data, some of which are not normal. I hope you can use this software to find somatic mutations. I ran this program when the following error occurred, I did not understand the readme instructions need to re-download dbsno142 or use your dbsnp download link,

perl ${ISOWN_HOME}/bin/database_annotation.pl PM2018122708.sofstric.watson.vcf PM2018122708.sofstric.watson.vcf.test.vcf

annotating input file with ANNOVAR ...NOTICE: Output files were written to PM2018122708.sofstric.watson.vcf.test.vcf.temp.annovar.vcf.temp.convert2annovar.variant_function, PM2018122708.sofstric.watson.vcf.test.vcf.temp.annovar.vcf.temp.convert2annovar.exonic_variant_function NOTICE: Reading gene annotation from /gpfs/home//software/ISOWN/bin/../external_tools/annovar_2012-03-08/humandb/hg19_refGene.txt ... Done with 52068 transcripts (including 11837 without coding sequence annotation) for 26464 unique genes NOTICE: Processing next batch with 2995 unique variants in 2995 input lines NOTICE: Reading FASTA sequences from /gpfs/home/zhaohongqiang/software/ISOWN/bin/../external_tools/annovar_2012-03-08/humandb/hg19_refGeneMrna.fa ... Done with 1579 sequences WARNING: A total of 356 sequences will be ignored due to lack of correct ORF annotation

The dbSNP 142 file is not found. Please correct the path in /gpfs/home/software/ISOWN//bin/database_annotation.pl and try again - see path below:

    /gpfs/home/software/ISOWN//bin/../external_databases/dbSNP142_All_20141124.vcf.gz.modified.vcf.gz
solivehong commented 5 years ago

I use annovar directly to comment on the vcf run. perl ${ISOWN_HOME}/bin/run_isown.pl test/RD2018080114.sofstric.watson.vcf.hg19_multianno.vcf test.output.txt " -trainingSet ${ISOWN_HOME}/training_data/BRCA_100_TrainSet.arff -sanityCheck false -classifier nbc"

Reformat files in '/gpfs/home/zhaohongqiang/software/ISOWN/test/RD2018080114.sofstric.watson.vcf.hg19_multianno.vcf' to emaf ...

Exception in thread "main" java.lang.NullPointerException at com.Processing.processVcf(Processing.java:39) at com.runReformating.main(runReformating.java:39)

Running prediction using file 'test.output.txt.emaf' ...

... Your working directory is /gpfs/home/zhaohongqiang/software/ISOWN ... This file was chosen for classifier training: /gpfs/home/zhaohongqiang/software/ISOWN//training_data/BRCA_100_TrainSet.arff ... Exception in thread "main" java.lang.NullPointerException at helper.Headers.(Headers.java:41) at main.Prediction.getVariant2samples(Prediction.java:347) at main.Prediction.loadVariants(Prediction.java:28) at main.runISOWN.main(runISOWN.java:85)

Done

solivehong commented 5 years ago

this is my annovar result

CHROM POS ID REF ALT QUAL FILTER INFO FORMAT NORMAL TUMOR

chr1 12783 . G A . PASS WfG_variant_origin=somatic;ANNOVAR_DATE=2016-02-01;Func.refGene=ncRNA_intronic;Gene.refGene=DDX11L1;GeneDetail.refGene=.;ExonicFunc.refGene=.;AAChange.refGene=.;cosmic70=.;avsnp147=rs62635284;esp6500siv2_all=.;ExAC_ALL=.;ExAC_AFR=.;ExAC_AMR=.;ExAC_EAS=.;ExAC_FIN=.;ExAC_NFE=.;ExAC_OTH=.;ExAC_SAS=.;1000g2015aug_all=.;1000g2015aug_eas=.;CLINSIG=.;CLNDBN=.;CLNACC=.;CLNDSDB=.;CLNDSDBID=.;SIFT_score=.;SIFT_pred=.;Polyphen2_HDIV_score=.;Polyphen2_HDIV_pred=.;Polyphen2_HVAR_score=.;Polyphen2_HVAR_pred=.;LRT_score=.;LRT_pred=.;MutationTaster_score=.;MutationTaster_pred=.;MutationAssessor_score=.;MutationAssessor_pred=.;FATHMM_score=.;FATHMM_pred=.;PROVEAN_score=.;PROVEAN_pred=.;VEST3_score=.;CADD_raw=.;CADD_phred=.;DANN_score=.;fathmm-MKL_coding_score=.;fathmm-MKL_coding_pred=.;MetaSVM_score=.;MetaSVM_pred=.;MetaLR_score=.;MetaLR_pred=.;integrated_fitCons_score=.;integrated_confidence_value=.;GERP++_RS=.;phyloP7way_vertebrate=.;phyloP20way_mammalian=.;phastCons7way_vertebrate=.;phastCons20way_mammalian=.;SiPhy_29way_logOdds=.;ALLELE_END GT:DP:AF:VD:ALD 0/0:59:0.2712:16:7,9 0/1:59:0.2712:16:7,9

ikalatskaya commented 5 years ago

Hello,

did not understand the readme instructions need to re-download dbsno142 or use your dbsnp download link, You have to download dbSNP file and reformat it.

In the INSTALLATION INSTRUCTIONS: Download dbSNP from NCBI: wget ftp://ftp.ncbi.nlm.nih.gov/snp/organisms/human_9606/VCF/00-All.vcf.gz --no-passive-ftp Reformat and index dbSNP using the following script: perl ${ISOWN_HOME}/bin/ncbi_dbSNP_format_index.pl 00-All.vcf.gz 00-All.modified.vcf

Vcf reformatting is most likely failing because the annotation is not completed.

Let me know if you have other issues. Irina

gprashant17 commented 5 years ago

I tried reformatting dbsnp file but the following error occured. What should i do ?

perl bin/ncbi_dbSNP_format_index.pl 00-All.vcf.gz dbSNP142_All_20141124.vcf.gz.modified.vcf.gz

Reformat 00-All.vcf.gz ... gzip: 00-All.vcf.gz: invalid compressed data--crc error

gzip: 00-All.vcf.gz: invalid compressed data--length error

Compressing dbSNP142_All_20141124.vcf.gz.modified.vcf.gz ... [ti_index_core] the file out of order at line 10276162

Done

ikalatskaya commented 5 years ago

Hello, Truly we are debugging a lot of requests from end users but I haven't seen anything like that. Have you tried to re-wget dbSNP file? Irina

On Fri, Jun 28, 2019 at 1:42 AM gprashant123 notifications@github.com wrote:

I tried reformatting dbsnp file but the following error occured. What should i do ?

perl bin/ncbi_dbSNP_format_index.pl 00-All.vcf.gz dbSNP142_All_20141124.vcf.gz.modified.vcf.gz

Reformat 00-All.vcf.gz ... gzip: 00-All.vcf.gz: invalid compressed data--crc error

gzip: 00-All.vcf.gz: invalid compressed data--length error

Compressing dbSNP142_All_20141124.vcf.gz.modified.vcf.gz ... [ti_index_core] the file out of order at line 10276162

Done

— You are receiving this because you commented. Reply to this email directly, view it on GitHub https://github.com/ikalatskaya/ISOWN/issues/22?email_source=notifications&email_token=AE54FTSYXU46PTM4473ZELTP4WQDTA5CNFSM4GSB6DR2YY3PNVWWK3TUL52HS4DFVREXG43VMVBW63LNMVXHJKTDN5WW2ZLOORPWSZGODYZDXLA#issuecomment-506608556, or mute the thread https://github.com/notifications/unsubscribe-auth/AE54FTXYERKSGWJLHUXXXB3P4WQDTANCNFSM4GSB6DRQ .

peishimei commented 5 years ago

I tried reformatting dbsnp file but the following error occured. What should i do ?

perl bin/ncbi_dbSNP_format_index.pl 00-All.vcf.gz dbSNP142_All_20141124.vcf.gz.modified.vcf.gz

Reformat 00-All.vcf.gz ... gzip: 00-All.vcf.gz: invalid compressed data--crc error

gzip: 00-All.vcf.gz: invalid compressed data--length error

Compressing dbSNP142_All_20141124.vcf.gz.modified.vcf.gz ... [ti_index_core] the file out of order at line 10276162

Done

Hi Try gunzip your dbsnp file before reformatting