seppinho / haplogrep-cmd

HaploGrep - mtDNA haplogroup classification. Supporting rCRS and RSRS.
https://haplogrep.i-med.ac.at/
MIT License
74 stars 23 forks source link

Start Classification... java.lang.NumberFormatException: For input string: "15940*" #9

Closed snashraf closed 6 years ago

snashraf commented 6 years ago

Hi All,

Thanks for updating awesome software !!

I was running haplogrep with below command,

java -jar haplogrep-cmd/haplogrep-2.1.13.jar --in /gpfs/QGP_MT_merge_final.vcf.gz --format vcf --phylotree 17 --out mtHaplotype_2.1.13.txt --metric 1 --lineage

and which gave below error Start Classification... java.lang.NumberFormatException: For input string: "15940*" at java.lang.NumberFormatException.forInputString(NumberFormatException.java:65) at java.lang.Integer.parseInt(Integer.java:580) at java.lang.Integer.valueOf(Integer.java:766) at core.Polymorphism.parse(Polymorphism.java:414) at core.Polymorphism.(Polymorphism.java:74) at core.Sample.parseSample(Sample.java:243) at core.Sample.(Sample.java:27) at core.TestSample.parse(TestSample.java:93) at core.SampleFile.(SampleFile.java:74) at genepi.haplogrep.main.Haplogrep.run(Haplogrep.java:145) at genepi.base.Tool.start(Tool.java:193) at genepi.haplogrep.main.Haplogrep.main(Haplogrep.java:182)

And i tried haplogrep-2.1.11.jar, haplogrep-2.1.12.jar, haplogrep-2.1.13.jar and got the same error , But I was able to run same command with same file successfully using below command with /haplogrep-2.1.3.jar java -jar haplogrep-cmd/haplogrep-2.1.3.jar --in /gpfs/ngsdata/QGP_MT_merge_final.vcf.gz --export 1 --format vcf --phylotree 17 --out mtHaplotype.txt --metric 1 Above command generate haplotype files correctly. I was using Java 1. 8 for running haplogrep-2.1.11.jar, haplogrep-2.1.12.jar, haplogrep-2.1.13.jar as well.

It seems that this error is coming because of such line from GATK output

MT 15940 . T C,* 7125536.33 . AC=5,92;AF=8.048e-04,0.015;AN=6213;BaseQRankSum=1.44;ClippingRankSum=-2.740e-01;DP=11230701;FS=0.000;MLEAC=5,92;MLEAF=8.048e-04,0.015;MQ=60.00;MQRankSum=0.302;QD=33.20;ReadPosRankSum=0.275;SOR=0.244 GT:AD:DP:GQ:PL 0:1853,0,0:1853:99:0,1800,1800 0:1486,0,0:1486:99:0,1800,1800 0:1489,0,0:1489:99:0,1800,1800 0:1492,0,0:1492:99:0,1800,1800 0:1481,0,0:1481:99:0,1800,1800 0:1480,0,0:1480:99:0,1800,1800 0:1747,0,0:1747:99:0,1800,1800 0:1759,0,0:1759:99:0,1800,1800 0:1562,0,0:1562:99:0,1800,1800 0:1819,0,0:1819:99:0,1800,1800 2:18,0,2448:2466:99:77750,77989,0

Can you please help us with how to tackle such cases? Thanks Najeeb

haansi commented 6 years ago

Hi @snashraf !

Thanks for pointing out. Can you please check the genotype for position 15940 - what is the genotype in the VCF file? can you provide the information for this line?

snashraf commented 6 years ago

MT 15940 . T C,* 7125536.33 . AC=5,92;AF=8.048e-04,0.015;AN=6213;BaseQRankSum=1.44;ClippingRankSum=-2.740e-01;DP=11230701;FS=0.000;MLEAC=5,92;MLEAF=8.048e-04,0.015;MQ=60.00;MQRankSum=0.302;QD=33.20;ReadPosRankSum=0.275;SOR=0.244 GT:AD:DP:GQ:PL 0:1853,0,0:1853:99:0,1800,1800 0:1486,0,0:1486:99:0,1800,1800 0:1489,0,0:1489:99:0,1800,1800 0:1492,0,0:1492:99:0,1800,1800 0:1481,0,0:1481:99:0,1800,1800 0:1480,0,0:1480:99:0,1800,1800 0:1747,0,0:1747:99:0,1800,1800 0:1759,0,0:1759:99:0,1800,1800 0:1562,0,0:1562:99:0,1800,1800 0:1819,0,0:1819:99:0,1800,1800 2:18,0,2448:2466:99:77750,77989,0

Just to let you know that GATK now generates * alleles with GATK 3.3 onwards !!

haansi commented 6 years ago

thanks for this information - the issue here is the asterisk "" did you use GATK? the 15940 seems to be the T15944del mutation. Have to recheck the VCF importer.

haansi commented 6 years ago

the asterisk *

snashraf commented 6 years ago

Yes !! I have updated all these information in original question now. I didn't have any error when I used haplogrep-2.1.3.jar . only the later version are giving an error.

thanks Najeeb

snashraf commented 6 years ago

Hi Team,

When do you think that this problem will be solved? Or What do you think if I removed all "*" variants and then run the tool ?

Thanks Najeeb

haansi commented 6 years ago

Dear @snashraf - will try to fix it by next week - removing the "*" is a workaround, which should not have too big effect on the classification

haansi commented 6 years ago

added support for * in version 2.1.14 now