ikalatskaya / ISOWN

Apache License 2.0
44 stars 15 forks source link

Can't run classifier. #26

Open LuoPangpang opened 5 years ago

LuoPangpang commented 5 years ago

Dear Author,

I got the message as shown below when trying to run the final step run_isown.pl:

perl /workplace/Software/ISOWN/bin/run_isown.pl 181023001/ 181023001/181023001.isown.txt "-trainingSet /workplace/Software/ISOWN/training_data/UCEC_100_TrainSet.arff -sanityCheck false -classifier nbc"


Reformat files in '181023001' to emaf ...

WARNING: 18 variants with unknown annotation were removed Total number of variants after filtering 3770

Running prediction using file '181023001/181023001.isown.txt.emaf' ...

... Your working directory is 181023001 ... This file was chosen for classifier training: /workplace/Software/ISOWN/training_data/UCEC_100_TrainSet.arff ... Total number of samples in your set is 1 ... Number of loaded nonsilent coding variants in test set is 808 ...


Naive Bayes Classifier: Option: supervised discretization (SD) is true 10-fold cross-validation


F1-measure: 98.12%. Recall: 97.817%. Precision: 98.425%. False positive rate: 1.565%. AUC: 99.77%.


Can't run classifier. java.io.IOException: nominal value not declared in header, read Token[null], line 19 at weka.core.converters.ArffLoader$ArffReader.errorMessage(ArffLoader.java:240) at weka.core.converters.ArffLoader$ArffReader.getInstanceFull(ArffLoader.java:578) at weka.core.converters.ArffLoader$ArffReader.getInstance(ArffLoader.java:423) at weka.core.converters.ArffLoader$ArffReader.readInstance(ArffLoader.java:391) at weka.core.converters.ArffLoader$ArffReader.readInstance(ArffLoader.java:376) at weka.core.converters.ArffLoader$ArffReader.(ArffLoader.java:138) at weka.core.Instances.(Instances.java:126) at main.Prediction.runClassifier(Prediction.java:233) at main.runISOWN.main(runISOWN.java:90)

... Total number of predicted somatic mutations 0 Final results are saved here: 181023001/181023001.isown.txt ...

Done

INTERESTINGLY, I got no error running both database_annotation.pl and run_isown.pl with the two vcf files provided in the test_data/ directory ...

I googled about the "nominal value not declared in header" and some said it is something to do with weka, so I checked:

java -jar /workplace/Software/ISOWN/bin/weka.jar

Exception in thread "main" java.lang.ExceptionInInitializerError Caused by: java.awt.HeadlessException: No X11 DISPLAY variable was set, but this program performed an operation which requires it. at java.awt.GraphicsEnvironment.checkHeadless(GraphicsEnvironment.java:204) at java.awt.Window.(Window.java:536) at java.awt.Frame.(Frame.java:420) at javax.swing.JFrame.(JFrame.java:233) at weka.gui.LogWindow.(LogWindow.java:252) at weka.gui.GUIChooser.(GUIChooser.java:215)

So did I miss anything? By the way, the weka.jar was already in the bin/ directory when I installed ISOWN, so I did not do any replacement of weka.jar since check_dependencies.pl said everything was installed.

Thank you very much!

Pang

amit21AIT commented 5 years ago

Can you send the first few lines of your annotated VCF file excluding headers ? I was getting an error after the database annotation process - "ArrayIndexOutOfBound 3 " , i just want to find where the error is . Thanks Amit

LuoPangpang commented 5 years ago

Can you send the first few lines of your annotated VCF file excluding headers ? I was getting an error after the database annotation process - "ArrayIndexOutOfBound 3 " , i just want to find where the error is . Thanks Amit

Hi Amit,

This is the output annotated vcf that I run through without errors using the "test_data/3d2edf87-6ec5-4c9f-9212-e8a751cc33e8.dkfz-snvCalling_1-0-132-1.20160126.vcf as input". Hope it helps.

Pang

fileformat=VCFv4.1

fileDate=20160129

pancancerversion=1.0

reference=;

center="DKFZ"

workflowName=DKFZ_SNV_workflow

workflowVersion=1.0.0

INFO=

INFO=

INFO=

INFO=

INFO=

INFO=

INFO=

INFO=

FORMAT=

FORMAT=

FORMAT=

FILTER=

FILTER=

FILTER=

FILTER=

FILTER=

FILTER=

FILTER=

FILTER=

FILTER=

FILTER=

FILTER=

FILTER=

FILTER=

FILTER=

FILTER=

FILTER=

FILTER=

FILTER=

FILTER=

FILTER=

FILTER=

FILTER=

SAMPLE=

SAMPLE=

TARGET_FILE:SureSelectHumanAllExonV4=file:///oicr/data/genomes/homo_sapiens_mc/Agilent/SureSelectHumanAllExonV4/S03723314_Regions.merged.sorted.bed.gz

VCF_FILE:dbSNP151=file:///workplace/Software/ISOWN/bin/../external_databases/dbsnp_151.hg19.All.modified.vcf.gz

VCF_FILE:COSMIC_77=file:///workplace/Software/ISOWN/bin/../external_databases/Cosmic-All-Muts.vcf.gz

VCF_FILE:ExAC.r0.3_20150421=file:///workplace/Software/ISOWN/bin/../external_databases/ExAC.r0.3.1.database.vcf.gz

VCF_FILE:2015_12_31_MA=file:///workplace/Software/ISOWN/bin/../external_databases/2015_12_31_MA_r3.vcf.gz

TAB_DELIMITED_HEADER=sample_name chr pos reference alternative genotype totalReadDepth %readDepthAlt in.dbSNP.or.not in.dbSNP.COMMON.or.not in.COSMIC.or.not MA_functional_impact MA_score is.SOMATIC in.ExAC ExAC_NCC FLANKING_STR POLYPHEN VARIANT_CLASS LENGTH CHROM POS ID REF ALT QUAL FILTER INFO FORMAT TUMOR

Guofengyu commented 5 years ago

Hi, Pang

Could you send several actual variants in your annotated vcf ? ANNOVAR can also annotate PolyPhen, Mutation Assessor, etc. I want to change the annotated vcf by ANNOVAR into the ISOWN annotated vcf and then test. Thanks.

Guofengyu commented 5 years ago

Hi, Pang

I have completed the test with reference to the process. I didn't encountered your problem. But I have another promblem like "Exception in thread "main" java.lang.IncompatibleClassChangeError: Found interface weka.core.Instance, but class was expected" until I replace the weka.jar to the origin file. I realize the weka.jar was already in the bin/ directory througth your issue. Thank you all the same.

Guo.

yueyangtime commented 4 years ago

Dear Author,

I got the message as shown below when trying to run the final step run_isown.pl:

perl /workplace/Software/ISOWN/bin/run_isown.pl 181023001/ 181023001/181023001.isown.txt "-trainingSet /workplace/Software/ISOWN/training_data/UCEC_100_TrainSet.arff -sanityCheck false -classifier nbc"

Reformat files in '181023001' to emaf ...

WARNING: 18 variants with unknown annotation were removed Total number of variants after filtering 3770

Running prediction using file '181023001/181023001.isown.txt.emaf' ...

... Your working directory is 181023001 ... This file was chosen for classifier training: /workplace/Software/ISOWN/training_data/UCEC_100_TrainSet.arff ... Total number of samples in your set is 1 ... Number of loaded nonsilent coding variants in test set is 808 ...

Naive Bayes Classifier: Option: supervised discretization (SD) is true 10-fold cross-validation

F1-measure: 98.12%. Recall: 97.817%. Precision: 98.425%. False positive rate: 1.565%. AUC: 99.77%.

Can't run classifier. java.io.IOException: nominal value not declared in header, read Token[null], line 19 at weka.core.converters.ArffLoader$ArffReader.errorMessage(ArffLoader.java:240) at weka.core.converters.ArffLoader$ArffReader.getInstanceFull(ArffLoader.java:578) at weka.core.converters.ArffLoader$ArffReader.getInstance(ArffLoader.java:423) at weka.core.converters.ArffLoader$ArffReader.readInstance(ArffLoader.java:391) at weka.core.converters.ArffLoader$ArffReader.readInstance(ArffLoader.java:376) at weka.core.converters.ArffLoader$ArffReader.(ArffLoader.java:138) at weka.core.Instances.(Instances.java:126) at main.Prediction.runClassifier(Prediction.java:233) at main.runISOWN.main(runISOWN.java:90)

... Total number of predicted somatic mutations 0 Final results are saved here: 181023001/181023001.isown.txt ...

Done

INTERESTINGLY, I got no error running both database_annotation.pl and run_isown.pl with the two vcf files provided in the test_data/ directory ...

I googled about the "nominal value not declared in header" and some said it is something to do with weka, so I checked:

java -jar /workplace/Software/ISOWN/bin/weka.jar

Exception in thread "main" java.lang.ExceptionInInitializerError Caused by: java.awt.HeadlessException: No X11 DISPLAY variable was set, but this program performed an operation which requires it. at java.awt.GraphicsEnvironment.checkHeadless(GraphicsEnvironment.java:204) at java.awt.Window.(Window.java:536) at java.awt.Frame.(Frame.java:420) at javax.swing.JFrame.(JFrame.java:233) at weka.gui.LogWindow.(LogWindow.java:252) at weka.gui.GUIChooser.(GUIChooser.java:215)

So did I miss anything? By the way, the weka.jar was already in the bin/ directory when I installed ISOWN, so I did not do any replacement of weka.jar since check_dependencies.pl said everything was installed.

Thank you very much!

Pang

Hi, Pang Have you solve the problem? I just get the same error Can't run classifier.

thanks yueyang

zjiang-lji commented 3 years ago

@Guofengyu, I got this error from the first command, reformatting to emaf, of the classifying step:

Exception in thread "main" java.lang.ArrayIndexOutOfBoundsException: -1
        at com.Processing.processVcf(Processing.java:119)
        at com.runReformating.main(runReformating.java:39)

The beginning of my input VCF is the annotated version of the sample VCF. It looks like this:

##fileformat=VCFv4.1                                                                                                                                                
##fileDate=20160129                                                                                                                                             
##pancancerversion=1.0                                                                                                                                              
##reference=<ID=hs37d5,Source=ftp://ftp.1000genomes.ebi.ac.uk/vol1/ftp/technical/reference/phase2_reference_assembly_sequence/hs37d5.fa.gz>;                                                                                                                                                
##center="DKFZ"                                                                                                                                             
##workflowName=DKFZ_SNV_workflow                                                                                                                                                
##workflowVersion=1.0.0                                                                                                                                             
##INFO=<ID=SOMATIC,Number=0,Type=Flag,Description="Indicates if record is a somatic mutation">                                                                                                                                              
##INFO=<ID=GERMLINE,Number=0,Type=Flag,Description="Indicates if record is a germline mutation">                                                                                                                                                
##INFO=<ID=UNCLEAR,Number=0,Type=Flag,Description="Indicates if the somatic status of a mutation is unclear">                                                                                                                                               
##INFO=<ID=VT,Number=1,Type=String,Description="Variant type, can be SNP, INS or DEL">                                                                                                                                              
##INFO=<ID=AF,Number=.,Type=Float,Description="Allele Frequency in primary data, for each ALT allele, in the same order as listed">                                                                                                                                             
##INFO=<ID=DB,Number=0,Type=Flag,Description="dbSNP membership">                                                                                                                                                
##INFO=<ID=MQ,Number=1,Type=Integer,Description="RMS Mapping Quality">                                                                                                                                              
##INFO=<ID=1000G,Number=0,Type=Flag,Description="Indicates membership in 1000Genomes">                                                                                                                                              
##FORMAT=<ID=GT,Number=1,Type=String,Description="Genotype">                                                                                                                                                
##FORMAT=<ID=DP,Number=1,Type=Integer,Description="Read depth at this position in the sample">                                                                                                                                              
##FORMAT=<ID=DP4,Number=4,Type=Integer,Description="Number of high-quality ref-forward, ref-reverse, alt-forward and alt-reverse bases">                                                                                                                                                
##FILTER=<ID=RE,Description="variant in UCSC_27Sept2013_RepeatMasker.bed.gz region and/or SimpleTandemRepeats_chr.bed.gz region, downloaded from UCSC genome browser and/or variant in segmental duplication region, annotated by annovar">                                                                                                                                             
##FILTER=<ID=BL,Description="variant in DAC-Blacklist from ENCODE or in DUKE_EXCLUDED list, both downloaded from UCSC genome browser">                                                                                                                                              
##FILTER=<ID=DP,Description="<= 5 reads total at position in tumor">                                                                                                                                                
##FILTER=<ID=SB,Description="Strand bias of reads with mutant allele = zero reads on one strand">                                                                                                                                               
##FILTER=<ID=TAC,Description="less than 6 reads in Tumor at position">                                                                                                                                              
##FILTER=<ID=dbSNP,Description="variant in dbSNP135">                                                                                                                                               
##FILTER=<ID=DB,Description="variant in 1000Genomes, ALL.wgs.phase1_integrated_calls.20101123.snps_chr.vcf.gz or dbSNP">                                                                                                                                                
##FILTER=<ID=HSDEPTH,Description="variant in HiSeqDepthTop10Pct_chr.bed.gz region, downloaded from UCSC genome browser">                                                                                                                                                
##FILTER=<ID=MAP,Description="variant overlaps a region from wgEncodeCrgMapabilityAlign100mer.bedGraph.gz:::--breakPointMode --aEndOffset=1 with a value below 0.5, punishment increases with a decreasing mapability">                                                                                                                                             
##FILTER=<ID=SBAF,Description="Strand bias of reads with mutant allele = zero reads on one strand and variant allele frequency below 0.1">                                                                                                                                              
##FILTER=<ID=FRQ,Description="variant allele frequency below 0.05">                                                                                                                                             
##FILTER=<ID=TAR,Description="Only one alternative read in Tumor at position">                                                                                                                                              
##FILTER=<ID=UNCLEAR,Description="Classification is unclear">                                                                                                                                               
##FILTER=<ID=DPHIGH,Description="Too many reads mapped in control at this region">                                                                                                                                              
##FILTER=<ID=DPLOWC,Description="Only 5 or less reads in control">                                                                                                                                              
##FILTER=<ID=1PS,Description="Only two alternative reads, one on each strand">                                                                                                                                              
##FILTER=<ID=ALTC,Description="Alternative reads in control">                                                                                                                                               
##FILTER=<ID=ALTCFR,Description="Alternative reads in control and tumor allele frequency below 0.3">                                                                                                                                                
##FILTER=<ID=FRC,Description="Variant allele frequency below 0.3 in germline call">                                                                                                                                             
##FILTER=<ID=YALT,Description="Variant on Y chromosome with low allele frequency">                                                                                                                                              
##FILTER=<ID=VAF,Description="Variant allele frequency in tumor < 5 times allele frequency in control">                                                                                                                                             
##FILTER=<ID=BI,Description="Bias towards a PCR strand or sequencing strand">                                                                                                                                               
##SAMPLE=<ID=CONTROL,SampleName=control_NA,Individual=NA,Description="Control">                                                                                                                                             
##SAMPLE=<ID=TUMOR,SampleName=tumor_NA,Individual=NA,Description="Tumor">                                                                                                                                               
##TARGET_FILE:SureSelectHumanAllExonV4=file:///oicr/data/genomes/homo_sapiens_mc/Agilent/SureSelectHumanAllExonV4/S03723314_Regions.merged.sorted.bed.gz                                                                                                                                                
##VCF_FILE:dbSNP152_All_20180423=file:///oasis/tscc/scratch/z8jiang/ISOWN/bin/../external_databases/dbSNP152_All_20180423.vcf.gz.modified.vcf.gz                                                                                                                                                
##VCF_FILE:COSMIC_94=file:///oasis/tscc/scratch/z8jiang/ISOWN/bin/../external_databases/COSMIC_v94.vcf.gz                                                                                                                                               
##VCF_FILE:ExAC.r0.3.1=file:///oasis/tscc/scratch/z8jiang/ISOWN/bin/../external_databases/ExAC.r0.3.1.sites.vep.vcf.gz                                                                                                                                              
##VCF_FILE:2021_07_23_MA=file:///oasis/tscc/scratch/z8jiang/ISOWN/bin/../external_databases/2021_07_23_MA.vcf.gz                                                                                                                                                
#TAB_DELIMITED_HEADER=sample_name       chr     pos     reference       alternative     genotype        totalReadDepth  %readDepthAlt   in.dbSNP.or.not in.dbSNP.COMMON.or.not      in.COSMIC.or.not        MA_functional_impact    MA_score        is.SOMATIC      in.ExAC ExAC_NCC        FLANKING_STR    POLYPHEN   VARIANT_CLASS    LENGTH  CHROM   POS     ID      REF     ALT     QUAL    FILTER  INFO    FORMAT  TUMOR                                                                                                                                               
#CHROM  POS ID  REF ALT QUAL    FILTER  INFO    FORMAT  TUMOR                                                                                                           
TUMOR   chr1    876499  A   G   GENOTYPE_BB 48  100 IN.dbSNP    not.in.dbSNP.COMMON IN.COSMIC_CNT=0 .   0   0   1   NCC=    V1=[.;.;.;.;.];X=[chr1;876499;0.0000;0.0000;0.0000];V2=[.;.;.;.;.];Distance_Between_V1_and_X=-1;Distance_Between_V2_and_X=-1;V1overlappedX=0;V2overlappedX=0;V1overlappedV2=0   NO_POLYPHEN_DATA    VARIANT_CLASS_SINGLE_NUCLEOTIDE_VARIANT 1   chr1    876499  rs4372192_876499    A   G   .   PASS    GERMLINE;SNP;AF=1.00,1.00;MQ=60;DB;1000G;[SureSelectHumanAllExonV4=1,1,[chr1,876429,876641]];ANNOVAR=intronic,SAMD11;ANNOTATIONS={dbSNP152_All_20180423=0;0;.;.;.},{COSMIC_94=0;0;.;.;.},{ExAC.r0.3.1=0;0;.;.;.},{2021_07_23_MA=0;0;.;.;.};POLYPHEN=[polyphenWHESS_20150403=0,NO_POLYPHEN_DATA];SEQUENCE_CONTEXT=GAT;OICR_FLANKING={V1=[.;.;.;.;.];X=[chr1;876499;0.0000;0.0000;0.0000];V2=[.;.;.;.;.];Distance_Between_V1_and_X=-1;Distance_Between_V2_and_X=-1;V1overlappedX=0;V2overlappedX=0;V1overlappedV2=0}  AD:GT:DP:DP4    0,48:1/1:48:0,0,32,16                           
TUMOR   chr1    877715  C   G   GENOTYPE_BB 34  100 IN.dbSNP    not.in.dbSNP.COMMON IN.COSMIC_CNT=0 .   0   0   1   NCC=    V1=[.;.;.;.;.];X=[chr1;877715;0.0000;0.0000;0.0000];V2=[.;.;.;.;.];Distance_Between_V1_and_X=-1;Distance_Between_V2_and_X=-1;V1overlappedX=0;V2overlappedX=0;V1overlappedV2=0   NO_POLYPHEN_DATA    VARIANT_CLASS_SINGLE_NUCLEOTIDE_VARIANT 1   chr1    877715  rs6605066_877715    C   G   .   PASS    GERMLINE;SNP;AF=1.00,1.00;MQ=60;DB;1000G;[SureSelectHumanAllExonV4=1,1,[chr1,877537,878481]];ANNOVAR=intronic,SAMD11;ANNOTATIONS={dbSNP152_All_20180423=0;0;.;.;.},{COSMIC_94=0;0;.;.;.},{ExAC.r0.3.1=0;0;.;.;.},{2021_07_23_MA=0;0;.;.;.};POLYPHEN=[polyphenWHESS_20150403=0,NO_POLYPHEN_DATA];SEQUENCE_CONTEXT=CCG;OICR_FLANKING={V1=[.;.;.;.;.];X=[chr1;877715;0.0000;0.0000;0.0000];V2=[.;.;.;.;.];Distance_Between_V1_and_X=-1;Distance_Between_V2_and_X=-1;V1overlappedX=0;V2overlappedX=0;V1overlappedV2=0}  AD:GT:DP:DP4    0,34:1/1:34:0,0,13,21                           
TUMOR   chr1    877831  T   C   GENOTYPE_BB 33  100 IN.dbSNP    not.in.dbSNP.COMMON IN.COSMIC_CNT=0 .   0   0   1   NCC=    V1=[.;.;.;.;.];X=[chr1;877831;0.0000;0.0000;0.0000];V2=[.;.;.;.;.];Distance_Between_V1_and_X=-1;Distance_Between_V2_and_X=-1;V1overlappedX=0;V2overlappedX=0;V1overlappedV2=0   transcript:uc001abw.1,uc001abx.1;hdiv_prediction:benign,benign;hdiv_class:neutral,neutral;hvar_prediction:benign,benign;hvar_class:neutral,neutral  VARIANT_CLASS_SINGLE_NUCLEOTIDE_VARIANT 1   chr1    877831  rs6672356_877831    T   C   .   PASS    GERMLINE;SNP;AF=1.00,1.00;MQ=60;DB;1000G;[SureSelectHumanAllExonV4=1,1,[chr1,877537,878481]];ANNOVAR=exonic,SAMD11;ANNOVAR_EXONIC=nonsynonymous SNV,SAMD11:NM_152486:exon10:c.T1027C:p.W343R,;ANNOTATIONS={dbSNP152_All_20180423=0;0;.;.;.},{COSMIC_94=0;0;.;.;.},{ExAC.r0.3.1=0;0;.;.;.},{2021_07_23_MA=1;1;VARIANT_MATCHED;.;[chr1|877831|.|T|C|.|.|RefGenome variant=W>R;Gene=SAMD11;Uniprot=SAM11_HUMAN;Info=;Uniprot   variant=W343R;Func. Impact=neutral;FI   score=-2.1]};POLYPHEN=[polyphenWHESS_20150403=1,transcript:uc001abw.1,uc001abx.1;hdiv_prediction:benign,benign;hdiv_class:neutral,neutral;hvar_prediction:benign,benign;hvar_class:neutral,neutral];SEQUENCE_CONTEXT=CTG;OICR_FLANKING={V1=[.;.;.;.;.];X=[chr1;877831;0.0000;0.0000;0.0000];V2=[.;.;.;.;.];Distance_Between_V1_and_X=-1;Distance_Between_V2_and_X=-1;V1overlappedX=0;V2overlappedX=0;V1overlappedV2=0}  AD:GT:DP:DP4    0,33:1/1:33:0,0,15,18       
TUMOR   chr1    880238  A   G   GENOTYPE_BB 73  100 IN.dbSNP    not.in.dbSNP.COMMON IN.COSMIC_CNT=0 .   0   0   1   NCC=    V1=[.;.;.;.;.];X=[chr1;880238;0.0000;0.0000;0.0000];V2=[.;.;.;.;.];Distance_Between_V1_and_X=-1;Distance_Between_V2_and_X=-1;V1overlappedX=0;V2overlappedX=0;V1overlappedV2=0   NO_POLYPHEN_DATA    VARIANT_CLASS_SINGLE_NUCLEOTIDE_VARIANT 1   chr1    880238  rs3748592_880238    A   G   .   PASS    GERMLINE;SNP;AF=1.00,1.00;MQ=60;DB;1000G;[SureSelectHumanAllExonV4=1,1,[chr1,880160,880280]];ANNOVAR=intronic,NOC2L;ANNOTATIONS={dbSNP152_All_20180423=0;0;.;.;.},{COSMIC_94=0;0;.;.;.},{ExAC.r0.3.1=0;0;.;.;.},{2021_07_23_MA=0;0;.;.;.};POLYPHEN=[polyphenWHESS_20150403=0,NO_POLYPHEN_DATA];SEQUENCE_CONTEXT=TAG;OICR_FLANKING={V1=[.;.;.;.;.];X=[chr1;880238;0.0000;0.0000;0.0000];V2=[.;.;.;.;.];Distance_Between_V1_and_X=-1;Distance_Between_V2_and_X=-1;V1overlappedX=0;V2overlappedX=0;V1overlappedV2=0}   AD:GT:DP:DP4    0,73:1/1:73:0,0,36,37                           
TUMOR   chr1    880466  T   C   GENOTYPE_AB 65  35.38   IN.dbSNP    not.in.dbSNP.COMMON IN.COSMIC_CNT=0 .   0   0   1   NCC=    V1=[.;.;.;.;.];X=[chr1;880466;25.5987;35.3846;45.1706];V2=[.;.;.;.;.];Distance_Between_V1_and_X=-1;Distance_Between_V2_and_X=-1;V1overlappedX=0;V2overlappedX=0;V1overlappedV2=0    NO_POLYPHEN_DATA    VARIANT_CLASS_SINGLE_NUCLEOTIDE_VARIANT 1   chr1    880466  rs138652036_880466  T   C   .   PASS    GERMLINE;SNP;AF=0.51,0.35;MQ=60;DB;[SureSelectHumanAllExonV4=1,1,[chr1,880449,880637]];ANNOVAR=exonic,NOC2L;ANNOVAR_EXONIC=nonsynonymous    SNV,NOC2L:NM_015658:exon18:c.A2114G:p.E705G,;ANNOTATIONS={dbSNP152_All_20180423=0;0;.;.;.},{COSMIC_94=0;0;.;.;.},{ExAC.r0.3.1=0;0;.;.;.},{2021_07_23_MA=1;1;VARIANT_MATCHED;.;[chr1|880466|.|T|C|.|.|RefGenome  variant=E>G;Gene=NOC2L;Uniprot=NOC2L_HUMAN;Info=;Uniprot    variant=E705G;Func. Impact=neutral;FI   score=0.77]};POLYPHEN=[polyphenWHESS_20150403=0,NO_POLYPHEN_DATA];SEQUENCE_CONTEXT=CTC;OICR_FLANKING={V1=[.;.;.;.;.];X=[chr1;880466;25.5987;35.3846;45.1706];V2=[.;.;.;.;.];Distance_Between_V1_and_X=-1;Distance_Between_V2_and_X=-1;V1overlappedX=0;V2overlappedX=0;V1overlappedV2=0} AD:GT:DP:DP4    42,23:0/1:65:19,23,10,13        
TUMOR   chr1    881627  G   A   GENOTYPE_BB 44  100 IN.dbSNP    not.in.dbSNP.COMMON IN.COSMIC_CNT=0 .   0   0   1   NCC=    V1=[.;.;.;.;.];X=[chr1;881627;0.0000;0.0000;0.0000];V2=[.;.;.;.;.];Distance_Between_V1_and_X=-1;Distance_Between_V2_and_X=-1;V1overlappedX=0;V2overlappedX=0;V1overlappedV2=0   NO_POLYPHEN_DATA    VARIANT_CLASS_SINGLE_NUCLEOTIDE_VARIANT 1   chr1    881627  rs2272757_881627    G   A   .   PASS    GERMLINE;SNP;AF=1.00,1.00;MQ=60;DB;1000G;[SureSelectHumanAllExonV4=1,1,[chr1,881618,881803]];ANNOVAR=exonic,NOC2L;ANNOVAR_EXONIC=synonymous SNV,NOC2L:NM_015658:exon16:c.C1843T:p.L615L,;ANNOTATIONS={dbSNP152_All_20180423=0;0;.;.;.},{COSMIC_94=0;0;.;.;.},{ExAC.r0.3.1=0;0;.;.;.},{2021_07_23_MA=1;1;VARIANT_MATCHED;.;[chr1|881627|.|G|A|.|.|RefGenome  variant=L>L;Gene=NOC2L;Uniprot=NOC2L_HUMAN;Info=synonymous  in  Uniprot;Uniprot variant=L615L;Func. Impact=;FI  score=]};POLYPHEN=[polyphenWHESS_20150403=0,NO_POLYPHEN_DATA];SEQUENCE_CONTEXT=AGG;OICR_FLANKING={V1=[.;.;.;.;.];X=[chr1;881627;0.0000;0.0000;0.0000];V2=[.;.;.;.;.];Distance_Between_V1_and_X=-1;Distance_Between_V2_and_X=-1;V1overlappedX=0;V2overlappedX=0;V1overlappedV2=0}    AD:GT:DP:DP4    0,44:1/1:44:0,0,30,14
zjiang-lji commented 3 years ago

Can someone help me with this "Can't run classifier...nominal value not declared in header" error?

...
Your working directory is /oasis/tscc/scratch/z8jiang/ISOWN/run_isown_trial6
...
This file was chosen for classifier training: /oasis/tscc/scratch/z8jiang/ISOWN/training_data/COAD_100_TrainSet.arff
...
Total number of samples in your set is 2
...
Number of loaded nonsilent coding variants in test set is 6330
...
*************
Naive Bayes Classifier: 
Option: supervised discretization (SD) is true
10-fold cross-validation
*************
F1-measure: 96.163%.
Recall: 95.235%.
Precision: 97.11%.
False positive rate: 2.834%.
AUC: 99.39%.
*************
Can't run classifier.
java.io.IOException: nominal value not declared in header, read Token[null], line 59
    at weka.core.converters.ArffLoader$ArffReader.errorMessage(ArffLoader.java:240)
    at weka.core.converters.ArffLoader$ArffReader.getInstanceFull(ArffLoader.java:578)
    at weka.core.converters.ArffLoader$ArffReader.getInstance(ArffLoader.java:423)
    at weka.core.converters.ArffLoader$ArffReader.readInstance(ArffLoader.java:391)
    at weka.core.converters.ArffLoader$ArffReader.readInstance(ArffLoader.java:376)
    at weka.core.converters.ArffLoader$ArffReader.<init>(ArffLoader.java:138)
    at weka.core.Instances.<init>(Instances.java:126)
    at main.Prediction.runClassifier(Prediction.java:233)
    at main.runISOWN.main(runISOWN.java:90)

...
Total number of predicted somatic mutations 0
Final results are saved here: test.txt
...

The first few lines of my .emaf file looks like the following:

Variant chr pos reference   alternative sample_name type    subtype gene_name   amino_acid_change   MA_functional_impact    MA_score    isFlanking  is_in_COSMIC    CNT is_in_dbSNP is_in_dbSNP_common  readDepthAlt    totalReadDepth  SEQUENCING_CONTEXT  POLYPHEN_hdiv   POLYPHEN_hvar   is_in_ExAct isSOMATIC
chr1,942665C>A  chr1    942665  C   A   SP10_filtered   exonic  nonsynonymous   SAMD11  L554M   .   .   NA  T   0   T   F   20.00   5   GCT .   .   T   false
chr1,942668C>G  chr1    942668  C   G   SP10_filtered   exonic  nonsynonymous   SAMD11  Q555E   .   .   NA  T   0   T   F   20.00   5   GCA .   .   T   false
chr1,942681CC>GG    chr1    942681  CC  GG  SP10_filtered   exonic  nonframeshift substitution  SAMD11  .   .   .   NA  T   0   T   F   20.00   5       .   .   T   false
HengqiLiu commented 2 years ago

nominal value not declared in header

Hi, zjiang-lji

Have you solve the problem? I just get the same error Can't run classifier.

Thanks, Hengqi Liu