lindenb / jvarkit

Java utilities for Bioinformatics
https://jvarkit.readthedocs.io/
Other
480 stars 133 forks source link

VcfGO #20

Closed kellermac closed 9 years ago

kellermac commented 9 years ago

Running from Ubunbtu 14.04, not sure what the problem is. I was hoping someone will take a look at the output and point me in the way to fixing it.

hart@hart-ubuntu:~/jvarkit$ java -jar dist/vcfgo.jar I=/home/hart/BigData/VCF/Ef1_7_29_2014Eff.vcf GO_INPUT=http://geneontology.org/gene-associations/gene_association.fb.gz GOA_INPUT=ftp://ftp.ebi.ac.uk/pub/databases/GO/goa/FLY/gene_association.goa_fly.gz OUT=/home/hart/BigData/VCF/EffGO.vcf [Thu Feb 12 12:07:53 CST 2015] com.github.lindenb.jvarkit.tools.vcfgo.VcfGeneOntology GOA=ftp://ftp.ebi.ac.uk/pub/databases/GO/goa/FLY/gene_association.goa_fly.gz GO=http://geneontology.org/gene-associations/gene_association.fb.gz IN=/home/hart/BigData/VCF/Ef1_7_29_2014Eff.vcf OUT=/home/hart/BigData/VCF/EffGO.vcf VERBOSITY=INFO QUIET=false VALIDATION_STRINGENCY=STRICT COMPRESSION_LEVEL=5 MAX_RECORDS_IN_RAM=500000 CREATE_INDEX=false CREATE_MD5_FILE=false [Thu Feb 12 12:07:53 CST 2015] Executing as hart@hart-ubuntu on Linux 3.13.0-45-generic amd64; OpenJDK 64-Bit Server VM 1.7.0_75-b13; Picard version: null JdkDeflater INFO 2015-02-12 12:07:53 AbstractVCFFilter reading from /home/hart/BigData/VCF/Ef1_7_29_2014Eff.vcf INFO 2015-02-12 12:07:53 AbstractVCFFilter writing to /home/hart/BigData/VCF/EffGO.vcf INFO 2015-02-12 12:07:53 AbstractVcfGeneOntology read GO http://geneontology.org/gene-associations/gene_association.fb.gz java.io.IOException: javax.xml.stream.XMLStreamException: ParseError at [row,col]:[1,1] Message: Content is not allowed in prolog. at com.github.lindenb.jvarkit.tools.vcfgo.AbstractVcfGeneOntology.readGO(AbstractVcfGeneOntology.java:60) at com.github.lindenb.jvarkit.tools.vcfgo.VcfGeneOntology.doWork(VcfGeneOntology.java:35) at com.github.lindenb.jvarkit.util.vcf.AbstractVCFFilter.doWork(AbstractVCFFilter.java:73) at com.github.lindenb.jvarkit.util.picard.cmdline.CommandLineProgram.instanceMain(CommandLineProgram.java:179) at com.github.lindenb.jvarkit.util.picard.cmdline.CommandLineProgram.instanceMainWithExit(CommandLineProgram.java:120) at com.github.lindenb.jvarkit.tools.vcfgo.VcfGeneOntology.main(VcfGeneOntology.java:89) Caused by: javax.xml.stream.XMLStreamException: ParseError at [row,col]:[1,1] Message: Content is not allowed in prolog. at com.sun.org.apache.xerces.internal.impl.XMLStreamReaderImpl.next(XMLStreamReaderImpl.java:598) at com.sun.xml.internal.stream.XMLEventReaderImpl.nextEvent(XMLEventReaderImpl.java:83) at com.github.lindenb.jvarkit.util.go.GoTree.parse(GoTree.java:286) at com.github.lindenb.jvarkit.util.go.GoTree.parse(GoTree.java:311) at com.github.lindenb.jvarkit.tools.vcfgo.AbstractVcfGeneOntology.readGO(AbstractVcfGeneOntology.java:55) ... 5 more ERROR 2015-02-12 12:07:54 AbstractVCFFilter
java.io.IOException: javax.xml.stream.XMLStreamException: ParseError at [row,col]:[1,1] Message: Content is not allowed in prolog. at com.github.lindenb.jvarkit.tools.vcfgo.AbstractVcfGeneOntology.readGO(AbstractVcfGeneOntology.java:60) at com.github.lindenb.jvarkit.tools.vcfgo.VcfGeneOntology.doWork(VcfGeneOntology.java:35) at com.github.lindenb.jvarkit.util.vcf.AbstractVCFFilter.doWork(AbstractVCFFilter.java:73) at com.github.lindenb.jvarkit.util.picard.cmdline.CommandLineProgram.instanceMain(CommandLineProgram.java:179) at com.github.lindenb.jvarkit.util.picard.cmdline.CommandLineProgram.instanceMainWithExit(CommandLineProgram.java:120) at com.github.lindenb.jvarkit.tools.vcfgo.VcfGeneOntology.main(VcfGeneOntology.java:89) Caused by: javax.xml.stream.XMLStreamException: ParseError at [row,col]:[1,1] Message: Content is not allowed in prolog. at com.sun.org.apache.xerces.internal.impl.XMLStreamReaderImpl.next(XMLStreamReaderImpl.java:598) at com.sun.xml.internal.stream.XMLEventReaderImpl.nextEvent(XMLEventReaderImpl.java:83) at com.github.lindenb.jvarkit.util.go.GoTree.parse(GoTree.java:286) at com.github.lindenb.jvarkit.util.go.GoTree.parse(GoTree.java:311) at com.github.lindenb.jvarkit.tools.vcfgo.AbstractVcfGeneOntology.readGO(AbstractVcfGeneOntology.java:55) ... 5 more ERROR 2015-02-12 12:07:54 AbstractVCFFilter Commandline was : com.github.lindenb.jvarkit.tools.vcfgo.VcfGeneOntology GOA=ftp://ftp.ebi.ac.uk/pub/databases/GO/goa/FLY/gene_association.goa_fly.gz GO=http://geneontology.org/gene-associations/gene_association.fb.gz IN=/home/hart/BigData/VCF/Ef1_7_29_2014Eff.vcf OUT=/home/hart/BigData/VCF/EffGO.vcf VERBOSITY=INFO QUIET=false VALIDATION_STRINGENCY=STRICT COMPRESSION_LEVEL=5 MAX_RECORDS_IN_RAM=500000 CREATE_INDEX=false CREATE_MD5_FILE=false [Thu Feb 12 12:07:54 CST 2015] com.github.lindenb.jvarkit.tools.vcfgo.VcfGeneOntology done. Elapsed time: 0.01 minutes. Runtime.totalMemory()=501743616

I hope its just some dependancy issue. May the universe shine unending blessings on you and your progeny. -Keller

lindenb commented 9 years ago

the problem is

GO_INPUT=http://geneontology.org/gene-associations/gene_association.fb.gz 

it's not a GO URL but a GOA url. GO_INPUT expects a RDF+XML.gz URL like

http://archive.geneontology.org/latest-termdb/go_daily-termdb.rdf-xml.gz

Anyway, I've changed the code for this tool today. https://github.com/lindenb/jvarkit/wiki/VCFGeneOntology You should use it instead of the old version.

P.