seppinho / haplogrep-cmd

HaploGrep - mtDNA haplogroup classification. Supporting rCRS and RSRS.
https://haplogrep.i-med.ac.at/
MIT License
74 stars 23 forks source link

Error when using --lineage option #10

Closed nuin closed 5 years ago

nuin commented 6 years ago

Hi

I am getting an error with version 2.1.13 when using the --lineage option, on a file that runs fine when the option is not used

java -jar ../bin/haplogrep-2.1.13.jar --in test.hsd --format hsd --out t1 --lineage

outputs

Welcome to HaploGrep v2.1.13
(c) Division of Genetic Epidemiology, Medical University of Innsbruck
Hansi Weissensteiner, Lukas Forer, Dominic Pacher and Sebastian Schönherr

phylotree/phylotree17.xml
Parameters:
Input Format: hsd
Phylotree Version: 17
Reference: rCRS
Extended Report: false
Used Metric: kulczynski
Chip array data: false
Lineage: true

Start Classification...
java.lang.StringIndexOutOfBoundsException: String index out of range: -1
    at java.lang.String.substring(String.java:1967)
    at genepi.haplogrep.util.ExportTools.calcLineage(ExportTools.java:215)
    at genepi.haplogrep.main.Haplogrep.run(Haplogrep.java:154)
    at genepi.base.Tool.start(Tool.java:193)
    at genepi.haplogrep.main.Haplogrep.main(Haplogrep.java:182)

This is from a JAR downloaded from here.

Thanks

stephenturner commented 5 years ago

I'm also getting some errors when using the lineage option with 2.1.16. But I think mine are more related to something to do with graphviz, not the VCF issues noted above. Works fine without the --lineage option. I get this error when using it.

Start Classification...
Exception in thread "main" java.lang.NoClassDefFoundError: org/slf4j/LoggerFactory
    at guru.nidi.graphviz.engine.AbstractGraphvizEngine.<clinit>(AbstractGraphvizEngine.java:24)
    at guru.nidi.graphviz.engine.Graphviz.useDefaultEngines(Graphviz.java:52)
    at guru.nidi.graphviz.engine.Graphviz.getEngine(Graphviz.java:93)
    at guru.nidi.graphviz.engine.Graphviz.execute(Graphviz.java:186)
    at guru.nidi.graphviz.engine.Renderer.toString(Renderer.java:46)
    at guru.nidi.graphviz.engine.Renderer.toFile(Renderer.java:58)
    at genepi.haplogrep.util.ExportTools.calcLineage(ExportTools.java:222)
    at genepi.haplogrep.main.Haplogrep.run(Haplogrep.java:169)
    at genepi.base.Tool.start(Tool.java:193)
    at genepi.haplogrep.main.Haplogrep.main(Haplogrep.java:197)
Caused by: java.lang.ClassNotFoundException: org.slf4j.LoggerFactory
    at java.net.URLClassLoader.findClass(URLClassLoader.java:381)
    at java.lang.ClassLoader.loadClass(ClassLoader.java:424)
    at sun.misc.Launcher$AppClassLoader.loadClass(Launcher.java:331)
    at java.lang.ClassLoader.loadClass(ClassLoader.java:357)
    ... 10 more

I've tried using graphviz 2.38.0 from conda and 2.40.1 from homebrew. Neither options work.

$ java -jar ../../bin/haplogrep-2.1.16.jar -h
Welcome to HaploGrep v2.1.16

$ dot -V
dot - graphviz version 2.40.1 (20161225.0304)
seppinho commented 5 years ago

Hi Stephen, thanks. This is related to the new graphical representation. Looks like I forgot to add some graphviz dependencies. Will come back asap

stephenturner commented 5 years ago

Thanks. I'm wondering if the error is related to slf4j instead of graphviz. Not sure from the error.

On Wed, Oct 24, 2018 at 8:46 AM Sebastian Schoenherr < notifications@github.com> wrote:

Hi Stephen, thanks. This is related to the new graphical representation. Looks like I forgot to add some graphviz dependencies. Will come back asap

— You are receiving this because you modified the open/close state. Reply to this email directly, view it on GitHub https://github.com/seppinho/haplogrep-cmd/issues/10#issuecomment-432639857, or mute the thread https://github.com/notifications/unsubscribe-auth/AAcFLJ11jBBqRscpDugndFTcl6Zi4pL2ks5uoGERgaJpZM4VWmpG .

seppinho commented 5 years ago

Looks like this was a dependency of graphviz. Fyi, we removed the png/svg file creation for now, too many dependencies needed. Dot file is still provided and can be used. I have updated 2.1.16: https://github.com/seppinho/haplogrep-cmd/releases/download/v2.1.16/haplogrep-2.1.16.jar

stephenturner commented 5 years ago

Thanks. Works now, but is it possible to export a separate dot file for each sample in the multisample VCF?

On Wed, Oct 24, 2018 at 9:43 AM Sebastian Schoenherr < notifications@github.com> wrote:

Looks like this was a dependency of graphviz. Fyi, we removed the png/svg file creation for now, too many dependencies needed. Dot file is still provided and can be used. I have updated 2.1.16: https://github.com/seppinho/haplogrep-cmd/releases/download/v2.1.16/haplogrep-2.1.16.jar

— You are receiving this because you modified the open/close state. Reply to this email directly, view it on GitHub https://github.com/seppinho/haplogrep-cmd/issues/10#issuecomment-432661657, or mute the thread https://github.com/notifications/unsubscribe-auth/AAcFLGdnHR_LqhgMGw5tbU4oBDRrKLPSks5uoG6ZgaJpZM4VWmpG .

stephenturner commented 5 years ago

Ah, nevermind the last comment.

Something I've noticed when running haplogrep with the lineage option on 1000 genomes data. There's a M4"67" haplogroup. The presence of this in the dot file corrupts the dot file. I'm trying to go through and change this character with a sed command. Didn't know if there's a better way.

screen shot 2018-10-24 at 9 55 35 am

stephenturner commented 5 years ago

gsed -i "s/M4\"67/M4'67/g" resultsfile.dot fixes it.

seppinho commented 5 years ago

thanks. any chance you can provide this sample in VCF format?

stephenturner commented 5 years ago

Sure, it's just the 1000genomes variant calls, that I've decomposed and normalized. (which, is an aside question - how does haplogrep handle complex/multinucleotide polymorphisms that haven't been decomposed/normalized into SNPs? I figured I'd do this, just to be safe).

ALL.chrMT.phase3_callmom-v0_4.20130502.genotypes.decomposed.normalized.omni.vcf.gz

stephenturner commented 5 years ago

The un-normalized variants are available via the 1000g FTP:

ftp://ftp.1000genomes.ebi.ac.uk/vol1/ftp/release/20130502/ALL.chrMT.phase3_callmom-v0_4.20130502.genotypes.vcf.gz

haansi commented 5 years ago

Hi Stephen, haplogrep basically can handle multinucleotide polymoprhisms, e.g. 228 . G A,T , but we need to recheck more complex alignment issues (e.g. 55 . TATTTT T,CATTTT,AATTTT,TTT,TTTTT). , as those are not denoted according the nomenclature, and could therefore end up impacting the haplogrep score

stephenturner commented 5 years ago

Thanks @seppinho!

seppinho commented 5 years ago

@stephenturner, just had a look at your normalized vcf from above. It looks like that this vcf misses some positions compared to the original file (see sample HG03478 attached) 1KP3_NORM_HG03478.vcf.gz 1KP3_HG03478.vcf.gz