Open LuoPangpang opened 5 years ago
Can you send the first few lines of your annotated VCF file excluding headers ? I was getting an error after the database annotation process - "ArrayIndexOutOfBound 3 " , i just want to find where the error is . Thanks Amit
Can you send the first few lines of your annotated VCF file excluding headers ? I was getting an error after the database annotation process - "ArrayIndexOutOfBound 3 " , i just want to find where the error is . Thanks Amit
Hi Amit,
This is the output annotated vcf that I run through without errors using the "test_data/3d2edf87-6ec5-4c9f-9212-e8a751cc33e8.dkfz-snvCalling_1-0-132-1.20160126.vcf as input". Hope it helps.
Pang
Hi, Pang
Could you send several actual variants in your annotated vcf ? ANNOVAR can also annotate PolyPhen, Mutation Assessor, etc. I want to change the annotated vcf by ANNOVAR into the ISOWN annotated vcf and then test. Thanks.
Hi, Pang
I have completed the test with reference to the process. I didn't encountered your problem. But I have another promblem like "Exception in thread "main" java.lang.IncompatibleClassChangeError: Found interface weka.core.Instance, but class was expected" until I replace the weka.jar to the origin file. I realize the weka.jar was already in the bin/ directory througth your issue. Thank you all the same.
Guo.
Dear Author,
I got the message as shown below when trying to run the final step run_isown.pl:
perl /workplace/Software/ISOWN/bin/run_isown.pl 181023001/ 181023001/181023001.isown.txt "-trainingSet /workplace/Software/ISOWN/training_data/UCEC_100_TrainSet.arff -sanityCheck false -classifier nbc"
Reformat files in '181023001' to emaf ...
WARNING: 18 variants with unknown annotation were removed Total number of variants after filtering 3770
Running prediction using file '181023001/181023001.isown.txt.emaf' ...
... Your working directory is 181023001 ... This file was chosen for classifier training: /workplace/Software/ISOWN/training_data/UCEC_100_TrainSet.arff ... Total number of samples in your set is 1 ... Number of loaded nonsilent coding variants in test set is 808 ...
Naive Bayes Classifier: Option: supervised discretization (SD) is true 10-fold cross-validation
F1-measure: 98.12%. Recall: 97.817%. Precision: 98.425%. False positive rate: 1.565%. AUC: 99.77%.
Can't run classifier. java.io.IOException: nominal value not declared in header, read Token[null], line 19 at weka.core.converters.ArffLoader$ArffReader.errorMessage(ArffLoader.java:240) at weka.core.converters.ArffLoader$ArffReader.getInstanceFull(ArffLoader.java:578) at weka.core.converters.ArffLoader$ArffReader.getInstance(ArffLoader.java:423) at weka.core.converters.ArffLoader$ArffReader.readInstance(ArffLoader.java:391) at weka.core.converters.ArffLoader$ArffReader.readInstance(ArffLoader.java:376) at weka.core.converters.ArffLoader$ArffReader.(ArffLoader.java:138) at weka.core.Instances.(Instances.java:126) at main.Prediction.runClassifier(Prediction.java:233) at main.runISOWN.main(runISOWN.java:90)
... Total number of predicted somatic mutations 0 Final results are saved here: 181023001/181023001.isown.txt ...
Done
INTERESTINGLY, I got no error running both database_annotation.pl and run_isown.pl with the two vcf files provided in the test_data/ directory ...
I googled about the "nominal value not declared in header" and some said it is something to do with weka, so I checked:
java -jar /workplace/Software/ISOWN/bin/weka.jar
Exception in thread "main" java.lang.ExceptionInInitializerError Caused by: java.awt.HeadlessException: No X11 DISPLAY variable was set, but this program performed an operation which requires it. at java.awt.GraphicsEnvironment.checkHeadless(GraphicsEnvironment.java:204) at java.awt.Window.(Window.java:536) at java.awt.Frame.(Frame.java:420) at javax.swing.JFrame.(JFrame.java:233) at weka.gui.LogWindow.(LogWindow.java:252) at weka.gui.GUIChooser.(GUIChooser.java:215)
So did I miss anything? By the way, the weka.jar was already in the bin/ directory when I installed ISOWN, so I did not do any replacement of weka.jar since check_dependencies.pl said everything was installed.
Thank you very much!
Pang
Hi, Pang
Have you solve the problem? I just get the same error Can't run classifier
.
thanks yueyang
@Guofengyu, I got this error from the first command, reformatting to emaf, of the classifying step:
Exception in thread "main" java.lang.ArrayIndexOutOfBoundsException: -1
at com.Processing.processVcf(Processing.java:119)
at com.runReformating.main(runReformating.java:39)
The beginning of my input VCF is the annotated version of the sample VCF. It looks like this:
##fileformat=VCFv4.1
##fileDate=20160129
##pancancerversion=1.0
##reference=<ID=hs37d5,Source=ftp://ftp.1000genomes.ebi.ac.uk/vol1/ftp/technical/reference/phase2_reference_assembly_sequence/hs37d5.fa.gz>;
##center="DKFZ"
##workflowName=DKFZ_SNV_workflow
##workflowVersion=1.0.0
##INFO=<ID=SOMATIC,Number=0,Type=Flag,Description="Indicates if record is a somatic mutation">
##INFO=<ID=GERMLINE,Number=0,Type=Flag,Description="Indicates if record is a germline mutation">
##INFO=<ID=UNCLEAR,Number=0,Type=Flag,Description="Indicates if the somatic status of a mutation is unclear">
##INFO=<ID=VT,Number=1,Type=String,Description="Variant type, can be SNP, INS or DEL">
##INFO=<ID=AF,Number=.,Type=Float,Description="Allele Frequency in primary data, for each ALT allele, in the same order as listed">
##INFO=<ID=DB,Number=0,Type=Flag,Description="dbSNP membership">
##INFO=<ID=MQ,Number=1,Type=Integer,Description="RMS Mapping Quality">
##INFO=<ID=1000G,Number=0,Type=Flag,Description="Indicates membership in 1000Genomes">
##FORMAT=<ID=GT,Number=1,Type=String,Description="Genotype">
##FORMAT=<ID=DP,Number=1,Type=Integer,Description="Read depth at this position in the sample">
##FORMAT=<ID=DP4,Number=4,Type=Integer,Description="Number of high-quality ref-forward, ref-reverse, alt-forward and alt-reverse bases">
##FILTER=<ID=RE,Description="variant in UCSC_27Sept2013_RepeatMasker.bed.gz region and/or SimpleTandemRepeats_chr.bed.gz region, downloaded from UCSC genome browser and/or variant in segmental duplication region, annotated by annovar">
##FILTER=<ID=BL,Description="variant in DAC-Blacklist from ENCODE or in DUKE_EXCLUDED list, both downloaded from UCSC genome browser">
##FILTER=<ID=DP,Description="<= 5 reads total at position in tumor">
##FILTER=<ID=SB,Description="Strand bias of reads with mutant allele = zero reads on one strand">
##FILTER=<ID=TAC,Description="less than 6 reads in Tumor at position">
##FILTER=<ID=dbSNP,Description="variant in dbSNP135">
##FILTER=<ID=DB,Description="variant in 1000Genomes, ALL.wgs.phase1_integrated_calls.20101123.snps_chr.vcf.gz or dbSNP">
##FILTER=<ID=HSDEPTH,Description="variant in HiSeqDepthTop10Pct_chr.bed.gz region, downloaded from UCSC genome browser">
##FILTER=<ID=MAP,Description="variant overlaps a region from wgEncodeCrgMapabilityAlign100mer.bedGraph.gz:::--breakPointMode --aEndOffset=1 with a value below 0.5, punishment increases with a decreasing mapability">
##FILTER=<ID=SBAF,Description="Strand bias of reads with mutant allele = zero reads on one strand and variant allele frequency below 0.1">
##FILTER=<ID=FRQ,Description="variant allele frequency below 0.05">
##FILTER=<ID=TAR,Description="Only one alternative read in Tumor at position">
##FILTER=<ID=UNCLEAR,Description="Classification is unclear">
##FILTER=<ID=DPHIGH,Description="Too many reads mapped in control at this region">
##FILTER=<ID=DPLOWC,Description="Only 5 or less reads in control">
##FILTER=<ID=1PS,Description="Only two alternative reads, one on each strand">
##FILTER=<ID=ALTC,Description="Alternative reads in control">
##FILTER=<ID=ALTCFR,Description="Alternative reads in control and tumor allele frequency below 0.3">
##FILTER=<ID=FRC,Description="Variant allele frequency below 0.3 in germline call">
##FILTER=<ID=YALT,Description="Variant on Y chromosome with low allele frequency">
##FILTER=<ID=VAF,Description="Variant allele frequency in tumor < 5 times allele frequency in control">
##FILTER=<ID=BI,Description="Bias towards a PCR strand or sequencing strand">
##SAMPLE=<ID=CONTROL,SampleName=control_NA,Individual=NA,Description="Control">
##SAMPLE=<ID=TUMOR,SampleName=tumor_NA,Individual=NA,Description="Tumor">
##TARGET_FILE:SureSelectHumanAllExonV4=file:///oicr/data/genomes/homo_sapiens_mc/Agilent/SureSelectHumanAllExonV4/S03723314_Regions.merged.sorted.bed.gz
##VCF_FILE:dbSNP152_All_20180423=file:///oasis/tscc/scratch/z8jiang/ISOWN/bin/../external_databases/dbSNP152_All_20180423.vcf.gz.modified.vcf.gz
##VCF_FILE:COSMIC_94=file:///oasis/tscc/scratch/z8jiang/ISOWN/bin/../external_databases/COSMIC_v94.vcf.gz
##VCF_FILE:ExAC.r0.3.1=file:///oasis/tscc/scratch/z8jiang/ISOWN/bin/../external_databases/ExAC.r0.3.1.sites.vep.vcf.gz
##VCF_FILE:2021_07_23_MA=file:///oasis/tscc/scratch/z8jiang/ISOWN/bin/../external_databases/2021_07_23_MA.vcf.gz
#TAB_DELIMITED_HEADER=sample_name chr pos reference alternative genotype totalReadDepth %readDepthAlt in.dbSNP.or.not in.dbSNP.COMMON.or.not in.COSMIC.or.not MA_functional_impact MA_score is.SOMATIC in.ExAC ExAC_NCC FLANKING_STR POLYPHEN VARIANT_CLASS LENGTH CHROM POS ID REF ALT QUAL FILTER INFO FORMAT TUMOR
#CHROM POS ID REF ALT QUAL FILTER INFO FORMAT TUMOR
TUMOR chr1 876499 A G GENOTYPE_BB 48 100 IN.dbSNP not.in.dbSNP.COMMON IN.COSMIC_CNT=0 . 0 0 1 NCC= V1=[.;.;.;.;.];X=[chr1;876499;0.0000;0.0000;0.0000];V2=[.;.;.;.;.];Distance_Between_V1_and_X=-1;Distance_Between_V2_and_X=-1;V1overlappedX=0;V2overlappedX=0;V1overlappedV2=0 NO_POLYPHEN_DATA VARIANT_CLASS_SINGLE_NUCLEOTIDE_VARIANT 1 chr1 876499 rs4372192_876499 A G . PASS GERMLINE;SNP;AF=1.00,1.00;MQ=60;DB;1000G;[SureSelectHumanAllExonV4=1,1,[chr1,876429,876641]];ANNOVAR=intronic,SAMD11;ANNOTATIONS={dbSNP152_All_20180423=0;0;.;.;.},{COSMIC_94=0;0;.;.;.},{ExAC.r0.3.1=0;0;.;.;.},{2021_07_23_MA=0;0;.;.;.};POLYPHEN=[polyphenWHESS_20150403=0,NO_POLYPHEN_DATA];SEQUENCE_CONTEXT=GAT;OICR_FLANKING={V1=[.;.;.;.;.];X=[chr1;876499;0.0000;0.0000;0.0000];V2=[.;.;.;.;.];Distance_Between_V1_and_X=-1;Distance_Between_V2_and_X=-1;V1overlappedX=0;V2overlappedX=0;V1overlappedV2=0} AD:GT:DP:DP4 0,48:1/1:48:0,0,32,16
TUMOR chr1 877715 C G GENOTYPE_BB 34 100 IN.dbSNP not.in.dbSNP.COMMON IN.COSMIC_CNT=0 . 0 0 1 NCC= V1=[.;.;.;.;.];X=[chr1;877715;0.0000;0.0000;0.0000];V2=[.;.;.;.;.];Distance_Between_V1_and_X=-1;Distance_Between_V2_and_X=-1;V1overlappedX=0;V2overlappedX=0;V1overlappedV2=0 NO_POLYPHEN_DATA VARIANT_CLASS_SINGLE_NUCLEOTIDE_VARIANT 1 chr1 877715 rs6605066_877715 C G . PASS GERMLINE;SNP;AF=1.00,1.00;MQ=60;DB;1000G;[SureSelectHumanAllExonV4=1,1,[chr1,877537,878481]];ANNOVAR=intronic,SAMD11;ANNOTATIONS={dbSNP152_All_20180423=0;0;.;.;.},{COSMIC_94=0;0;.;.;.},{ExAC.r0.3.1=0;0;.;.;.},{2021_07_23_MA=0;0;.;.;.};POLYPHEN=[polyphenWHESS_20150403=0,NO_POLYPHEN_DATA];SEQUENCE_CONTEXT=CCG;OICR_FLANKING={V1=[.;.;.;.;.];X=[chr1;877715;0.0000;0.0000;0.0000];V2=[.;.;.;.;.];Distance_Between_V1_and_X=-1;Distance_Between_V2_and_X=-1;V1overlappedX=0;V2overlappedX=0;V1overlappedV2=0} AD:GT:DP:DP4 0,34:1/1:34:0,0,13,21
TUMOR chr1 877831 T C GENOTYPE_BB 33 100 IN.dbSNP not.in.dbSNP.COMMON IN.COSMIC_CNT=0 . 0 0 1 NCC= V1=[.;.;.;.;.];X=[chr1;877831;0.0000;0.0000;0.0000];V2=[.;.;.;.;.];Distance_Between_V1_and_X=-1;Distance_Between_V2_and_X=-1;V1overlappedX=0;V2overlappedX=0;V1overlappedV2=0 transcript:uc001abw.1,uc001abx.1;hdiv_prediction:benign,benign;hdiv_class:neutral,neutral;hvar_prediction:benign,benign;hvar_class:neutral,neutral VARIANT_CLASS_SINGLE_NUCLEOTIDE_VARIANT 1 chr1 877831 rs6672356_877831 T C . PASS GERMLINE;SNP;AF=1.00,1.00;MQ=60;DB;1000G;[SureSelectHumanAllExonV4=1,1,[chr1,877537,878481]];ANNOVAR=exonic,SAMD11;ANNOVAR_EXONIC=nonsynonymous SNV,SAMD11:NM_152486:exon10:c.T1027C:p.W343R,;ANNOTATIONS={dbSNP152_All_20180423=0;0;.;.;.},{COSMIC_94=0;0;.;.;.},{ExAC.r0.3.1=0;0;.;.;.},{2021_07_23_MA=1;1;VARIANT_MATCHED;.;[chr1|877831|.|T|C|.|.|RefGenome variant=W>R;Gene=SAMD11;Uniprot=SAM11_HUMAN;Info=;Uniprot variant=W343R;Func. Impact=neutral;FI score=-2.1]};POLYPHEN=[polyphenWHESS_20150403=1,transcript:uc001abw.1,uc001abx.1;hdiv_prediction:benign,benign;hdiv_class:neutral,neutral;hvar_prediction:benign,benign;hvar_class:neutral,neutral];SEQUENCE_CONTEXT=CTG;OICR_FLANKING={V1=[.;.;.;.;.];X=[chr1;877831;0.0000;0.0000;0.0000];V2=[.;.;.;.;.];Distance_Between_V1_and_X=-1;Distance_Between_V2_and_X=-1;V1overlappedX=0;V2overlappedX=0;V1overlappedV2=0} AD:GT:DP:DP4 0,33:1/1:33:0,0,15,18
TUMOR chr1 880238 A G GENOTYPE_BB 73 100 IN.dbSNP not.in.dbSNP.COMMON IN.COSMIC_CNT=0 . 0 0 1 NCC= V1=[.;.;.;.;.];X=[chr1;880238;0.0000;0.0000;0.0000];V2=[.;.;.;.;.];Distance_Between_V1_and_X=-1;Distance_Between_V2_and_X=-1;V1overlappedX=0;V2overlappedX=0;V1overlappedV2=0 NO_POLYPHEN_DATA VARIANT_CLASS_SINGLE_NUCLEOTIDE_VARIANT 1 chr1 880238 rs3748592_880238 A G . PASS GERMLINE;SNP;AF=1.00,1.00;MQ=60;DB;1000G;[SureSelectHumanAllExonV4=1,1,[chr1,880160,880280]];ANNOVAR=intronic,NOC2L;ANNOTATIONS={dbSNP152_All_20180423=0;0;.;.;.},{COSMIC_94=0;0;.;.;.},{ExAC.r0.3.1=0;0;.;.;.},{2021_07_23_MA=0;0;.;.;.};POLYPHEN=[polyphenWHESS_20150403=0,NO_POLYPHEN_DATA];SEQUENCE_CONTEXT=TAG;OICR_FLANKING={V1=[.;.;.;.;.];X=[chr1;880238;0.0000;0.0000;0.0000];V2=[.;.;.;.;.];Distance_Between_V1_and_X=-1;Distance_Between_V2_and_X=-1;V1overlappedX=0;V2overlappedX=0;V1overlappedV2=0} AD:GT:DP:DP4 0,73:1/1:73:0,0,36,37
TUMOR chr1 880466 T C GENOTYPE_AB 65 35.38 IN.dbSNP not.in.dbSNP.COMMON IN.COSMIC_CNT=0 . 0 0 1 NCC= V1=[.;.;.;.;.];X=[chr1;880466;25.5987;35.3846;45.1706];V2=[.;.;.;.;.];Distance_Between_V1_and_X=-1;Distance_Between_V2_and_X=-1;V1overlappedX=0;V2overlappedX=0;V1overlappedV2=0 NO_POLYPHEN_DATA VARIANT_CLASS_SINGLE_NUCLEOTIDE_VARIANT 1 chr1 880466 rs138652036_880466 T C . PASS GERMLINE;SNP;AF=0.51,0.35;MQ=60;DB;[SureSelectHumanAllExonV4=1,1,[chr1,880449,880637]];ANNOVAR=exonic,NOC2L;ANNOVAR_EXONIC=nonsynonymous SNV,NOC2L:NM_015658:exon18:c.A2114G:p.E705G,;ANNOTATIONS={dbSNP152_All_20180423=0;0;.;.;.},{COSMIC_94=0;0;.;.;.},{ExAC.r0.3.1=0;0;.;.;.},{2021_07_23_MA=1;1;VARIANT_MATCHED;.;[chr1|880466|.|T|C|.|.|RefGenome variant=E>G;Gene=NOC2L;Uniprot=NOC2L_HUMAN;Info=;Uniprot variant=E705G;Func. Impact=neutral;FI score=0.77]};POLYPHEN=[polyphenWHESS_20150403=0,NO_POLYPHEN_DATA];SEQUENCE_CONTEXT=CTC;OICR_FLANKING={V1=[.;.;.;.;.];X=[chr1;880466;25.5987;35.3846;45.1706];V2=[.;.;.;.;.];Distance_Between_V1_and_X=-1;Distance_Between_V2_and_X=-1;V1overlappedX=0;V2overlappedX=0;V1overlappedV2=0} AD:GT:DP:DP4 42,23:0/1:65:19,23,10,13
TUMOR chr1 881627 G A GENOTYPE_BB 44 100 IN.dbSNP not.in.dbSNP.COMMON IN.COSMIC_CNT=0 . 0 0 1 NCC= V1=[.;.;.;.;.];X=[chr1;881627;0.0000;0.0000;0.0000];V2=[.;.;.;.;.];Distance_Between_V1_and_X=-1;Distance_Between_V2_and_X=-1;V1overlappedX=0;V2overlappedX=0;V1overlappedV2=0 NO_POLYPHEN_DATA VARIANT_CLASS_SINGLE_NUCLEOTIDE_VARIANT 1 chr1 881627 rs2272757_881627 G A . PASS GERMLINE;SNP;AF=1.00,1.00;MQ=60;DB;1000G;[SureSelectHumanAllExonV4=1,1,[chr1,881618,881803]];ANNOVAR=exonic,NOC2L;ANNOVAR_EXONIC=synonymous SNV,NOC2L:NM_015658:exon16:c.C1843T:p.L615L,;ANNOTATIONS={dbSNP152_All_20180423=0;0;.;.;.},{COSMIC_94=0;0;.;.;.},{ExAC.r0.3.1=0;0;.;.;.},{2021_07_23_MA=1;1;VARIANT_MATCHED;.;[chr1|881627|.|G|A|.|.|RefGenome variant=L>L;Gene=NOC2L;Uniprot=NOC2L_HUMAN;Info=synonymous in Uniprot;Uniprot variant=L615L;Func. Impact=;FI score=]};POLYPHEN=[polyphenWHESS_20150403=0,NO_POLYPHEN_DATA];SEQUENCE_CONTEXT=AGG;OICR_FLANKING={V1=[.;.;.;.;.];X=[chr1;881627;0.0000;0.0000;0.0000];V2=[.;.;.;.;.];Distance_Between_V1_and_X=-1;Distance_Between_V2_and_X=-1;V1overlappedX=0;V2overlappedX=0;V1overlappedV2=0} AD:GT:DP:DP4 0,44:1/1:44:0,0,30,14
Can someone help me with this "Can't run classifier...nominal value not declared in header" error?
...
Your working directory is /oasis/tscc/scratch/z8jiang/ISOWN/run_isown_trial6
...
This file was chosen for classifier training: /oasis/tscc/scratch/z8jiang/ISOWN/training_data/COAD_100_TrainSet.arff
...
Total number of samples in your set is 2
...
Number of loaded nonsilent coding variants in test set is 6330
...
*************
Naive Bayes Classifier:
Option: supervised discretization (SD) is true
10-fold cross-validation
*************
F1-measure: 96.163%.
Recall: 95.235%.
Precision: 97.11%.
False positive rate: 2.834%.
AUC: 99.39%.
*************
Can't run classifier.
java.io.IOException: nominal value not declared in header, read Token[null], line 59
at weka.core.converters.ArffLoader$ArffReader.errorMessage(ArffLoader.java:240)
at weka.core.converters.ArffLoader$ArffReader.getInstanceFull(ArffLoader.java:578)
at weka.core.converters.ArffLoader$ArffReader.getInstance(ArffLoader.java:423)
at weka.core.converters.ArffLoader$ArffReader.readInstance(ArffLoader.java:391)
at weka.core.converters.ArffLoader$ArffReader.readInstance(ArffLoader.java:376)
at weka.core.converters.ArffLoader$ArffReader.<init>(ArffLoader.java:138)
at weka.core.Instances.<init>(Instances.java:126)
at main.Prediction.runClassifier(Prediction.java:233)
at main.runISOWN.main(runISOWN.java:90)
...
Total number of predicted somatic mutations 0
Final results are saved here: test.txt
...
The first few lines of my .emaf file looks like the following:
Variant chr pos reference alternative sample_name type subtype gene_name amino_acid_change MA_functional_impact MA_score isFlanking is_in_COSMIC CNT is_in_dbSNP is_in_dbSNP_common readDepthAlt totalReadDepth SEQUENCING_CONTEXT POLYPHEN_hdiv POLYPHEN_hvar is_in_ExAct isSOMATIC
chr1,942665C>A chr1 942665 C A SP10_filtered exonic nonsynonymous SAMD11 L554M . . NA T 0 T F 20.00 5 GCT . . T false
chr1,942668C>G chr1 942668 C G SP10_filtered exonic nonsynonymous SAMD11 Q555E . . NA T 0 T F 20.00 5 GCA . . T false
chr1,942681CC>GG chr1 942681 CC GG SP10_filtered exonic nonframeshift substitution SAMD11 . . . NA T 0 T F 20.00 5 . . T false
nominal value not declared in header
Hi, zjiang-lji
Have you solve the problem? I just get the same error Can't run classifier
.
Thanks, Hengqi Liu
Dear Author,
I got the message as shown below when trying to run the final step run_isown.pl:
perl /workplace/Software/ISOWN/bin/run_isown.pl 181023001/ 181023001/181023001.isown.txt "-trainingSet /workplace/Software/ISOWN/training_data/UCEC_100_TrainSet.arff -sanityCheck false -classifier nbc"
Reformat files in '181023001' to emaf ...
WARNING: 18 variants with unknown annotation were removed Total number of variants after filtering 3770
Running prediction using file '181023001/181023001.isown.txt.emaf' ...
... Your working directory is 181023001 ... This file was chosen for classifier training: /workplace/Software/ISOWN/training_data/UCEC_100_TrainSet.arff ... Total number of samples in your set is 1 ... Number of loaded nonsilent coding variants in test set is 808 ...
Naive Bayes Classifier: Option: supervised discretization (SD) is true 10-fold cross-validation
F1-measure: 98.12%. Recall: 97.817%. Precision: 98.425%. False positive rate: 1.565%. AUC: 99.77%.
Can't run classifier. java.io.IOException: nominal value not declared in header, read Token[null], line 19 at weka.core.converters.ArffLoader$ArffReader.errorMessage(ArffLoader.java:240) at weka.core.converters.ArffLoader$ArffReader.getInstanceFull(ArffLoader.java:578) at weka.core.converters.ArffLoader$ArffReader.getInstance(ArffLoader.java:423) at weka.core.converters.ArffLoader$ArffReader.readInstance(ArffLoader.java:391) at weka.core.converters.ArffLoader$ArffReader.readInstance(ArffLoader.java:376) at weka.core.converters.ArffLoader$ArffReader.(ArffLoader.java:138)
at weka.core.Instances.(Instances.java:126)
at main.Prediction.runClassifier(Prediction.java:233)
at main.runISOWN.main(runISOWN.java:90)
... Total number of predicted somatic mutations 0 Final results are saved here: 181023001/181023001.isown.txt ...
Done
INTERESTINGLY, I got no error running both database_annotation.pl and run_isown.pl with the two vcf files provided in the test_data/ directory ...
I googled about the "nominal value not declared in header" and some said it is something to do with weka, so I checked:
java -jar /workplace/Software/ISOWN/bin/weka.jar
Exception in thread "main" java.lang.ExceptionInInitializerError Caused by: java.awt.HeadlessException: No X11 DISPLAY variable was set, but this program performed an operation which requires it. at java.awt.GraphicsEnvironment.checkHeadless(GraphicsEnvironment.java:204) at java.awt.Window.(Window.java:536)
at java.awt.Frame.(Frame.java:420)
at javax.swing.JFrame.(JFrame.java:233)
at weka.gui.LogWindow.(LogWindow.java:252)
at weka.gui.GUIChooser.(GUIChooser.java:215)
So did I miss anything? By the way, the weka.jar was already in the bin/ directory when I installed ISOWN, so I did not do any replacement of weka.jar since check_dependencies.pl said everything was installed.
Thank you very much!
Pang