lindenb / jvarkit

Java utilities for Bioinformatics
https://jvarkit.readthedocs.io/
Other
482 stars 133 forks source link

biostar160470 Exited with failure (-1) parser error : Premature end of data in tag Iteration_hits line 1105 #134

Closed cosmiccapybara closed 5 years ago

cosmiccapybara commented 5 years ago

Hello,

I am trying to run biostar160470.jar to retrieve the DNA hits sequence from tblastn. I need to run tblastn of some protein sequences agains several genomes. The problem is that, for some of the genomes, I get an error while parsing the tblastn results.

One of the commands that generates this error is the following:

cat A_thaliana_all_prot_uniq_join.fa | tblastn -num_threads 20 -db GCA_000695325.1_P.bachei_draft_genome_v.1.1_genomic_Pleurobrachia_bachei.fna -outfmt 5 | java -jar /space31/PEG/pcornejo/jvarkit/dist/biostar160470.jar -d GCA_000695325.1_P.bachei_draft_genome_v.1.1_genomic_Pleurobrachia_bachei.fna | xmllint --format - > A_thaliana_all_prot_Pleurobrachia_bachei_tblastn_format5.txt

Genome from : https://www.ncbi.nlm.nih.gov/assembly/GCA_000695325.1

I already ran normal tblastn and I get results, so the problem comes when parsing the results from blast.

The error I am getting is the following:

**[SEVERE][Biostar160470]Proc failed: [blastdbcmd, -db, GCA_000695325.1_P.bachei_draft_genome_v.1.1_genomic_Pleurobrachia_bachei.fna, -entry, gnl|BL_ORD_ID|0, -outfmt, %s, -range, 207323-207664] java.lang.RuntimeException: Proc failed: [blastdbcmd, -db, GCA_000695325.1_P.bachei_draft_genome_v.1.1_genomic_Pleurobrachia_bachei.fna, -entry, gnl|BL_ORD_ID|0, -outfmt, %s, -range, 207323-207664] at com.github.lindenb.jvarkit.tools.biostar.Biostar160470.parseHit(Biostar160470.java:270) at com.github.lindenb.jvarkit.tools.biostar.Biostar160470.parseBlast(Biostar160470.java:188) at com.github.lindenb.jvarkit.tools.biostar.Biostar160470.doWork(Biostar160470.java:336) at com.github.lindenb.jvarkit.util.jcommander.Launcher.instanceMain(Launcher.java:756) at com.github.lindenb.jvarkit.util.jcommander.Launcher.instanceMainWithExit(Launcher.java:919) at com.github.lindenb.jvarkit.tools.biostar.Biostar160470.main(Biostar160470.java:358) [INFO][Launcher]biostar160470 Exited with failure (-1) -:1106: parser error : Premature end of data in tag Iteration_hits line 1105

^ -:1106: parser error : Premature end of data in tag Iteration line 1100

^ -:1106: parser error : Premature end of data in tag BlastOutput_iterations line 18

^ -:1106: parser error : Premature end of data in tag BlastOutput_program line 2

^ -:1106: parser error : Premature end of data in tag BlastOutput line 1

^**

I am not familiarized with java, hope you have any idea of what is going on.

Thanks!

Paola

lindenb commented 5 years ago

try to set the options --bindir (Blast binaries path)

cosmiccapybara commented 5 years ago

Dear Pierre,

thank you for your answer. I am trying to work on a server, unfortunately in this server adding the path to the binary file is not working. It produced this error:

cat A_thaliana_all_prot_uniq_join.fa | usr/local/bioconda/bin/tblastn -num_threads 20 -db GCA_000695325.1_P.bachei_draft_genome_v.1.1_genomic_Pleurobrachia_bachei.fna -outfmt 5 | java -jar /space31/PEG/pcornejo/jvarkit/dist/biostar160470.jar -p usr/local/bioconda/bin -d GCA_000695325.1_P.bachei_draft_genome_v.1.1_genomic_Pleurobrachia_bachei.fna | xmllint --format - > A_thaliana_all_prot_Pleurobrachia_bachei_tblastn_format5_test.txt

-bash: usr/local/bioconda/bin/tblastn: No such file or directory

Message: Premature end of file. javax.xml.stream.XMLStreamException: ParseError at [row,col]:[1,1] Message: Premature end of file. at com.sun.org.apache.xerces.internal.impl.XMLStreamReaderImpl.next(XMLStreamReaderImpl.java:604) at com.sun.xml.internal.stream.XMLEventReaderImpl.peek(XMLEventReaderImpl.java:276) at com.github.lindenb.jvarkit.tools.biostar.Biostar160470.parseBlast(Biostar160470.java:173) at com.github.lindenb.jvarkit.tools.biostar.Biostar160470.doWork(Biostar160470.java:336) at com.github.lindenb.jvarkit.util.jcommander.Launcher.instanceMain(Launcher.java:756) at com.github.lindenb.jvarkit.util.jcommander.Launcher.instanceMainWithExit(Launcher.java:919) at com.github.lindenb.jvarkit.tools.biostar.Biostar160470.main(Biostar160470.java:358) [INFO][Launcher]biostar160470 Exited with failure (-1) -:1: parser error : Document is empty

^

I guess it is something related to permissions or the link to the binary file from my working directory (I already contacted the administrator of the server to solve it)

java version in the server:

openjdk version "1.8.0_144" OpenJDK Runtime Environment (Zulu 8.23.0.3-linux64) (build 1.8.0_144-b01) OpenJDK 64-Bit Server VM (Zulu 8.23.0.3-linux64) (build 25.144-b01, mixed mode)

I doubt the problem is the OpenJDK version since I can run the program for some of the genomes without errors.

I also tried to run it in my own computer and it does not produce any error (even when I tried with an old version of jdk) but there is a problem, it is not retrieving all the hits it should. I know this because I already ran normal tblastn wihout any cutoff value and I know how many hits I should get, I only need to add the hit DNA sequence.

In my computer:

java -version java version "1.8.0_201" Java(TM) SE Runtime Environment (build 1.8.0_201-b09) Java HotSpot(TM) 64-Bit Server VM (build 25.201-b09, mixed mode)

dpkg --list | grep -i jdk

ii openjdk-11-jdk:amd64 11.0.4+11-1~14.04 amd64 OpenJDK Development Kit (JDK) ii openjdk-11-jdk-headless:amd64 11.0.4+11-1~14.04 amd64 OpenJDK Development Kit (JDK) (headless) ii openjdk-11-jre:amd64 11.0.4+11-1~14.04 amd64 OpenJDK Java runtime, using Hotspot JIT ii openjdk-11-jre-headless:amd64 11.0.4+11-1~14.04 amd64 OpenJDK Java runtime, using Hotspot JIT (headless) ii openjdk-7-jre:amd64 7u181-2.6.14-0ubuntu0.3 amd64 OpenJDK Java runtime, using Hotspot JIT ii openjdk-7-jre-headless:amd64 7u181-2.6.14-0ubuntu0.3 amd64 OpenJDK Java runtime, using Hotspot JIT (headless) ii oracle-java8-installer 8u201-1~webupd8~1 all Oracle Java(TM) Development Kit (JDK) 8 ii oracle-java8-set-default 8u201-1~webupd8~1 all Set Oracle JDK 8 as default Java

Again, in the server I can run the program for some of the genomes and these results are complete but with some genomes it does not run. I even ran the program in the server with some of the queries that I identified were not getting results when I ran the program in my computer, it ran properly in the server and gave me results but it does not run with the whole fasta file with all the queries. And I have no idea what's happening when I run it in my computer.

Paola

lindenb commented 5 years ago

usr/local/bioconda/bin ?

shouldn't it be

/ usr/local/bioconda/bin

cosmiccapybara commented 5 years ago

Yes, I corrected that and now tblastn works but I am getting the same error:

cat A_thaliana_all_prot_uniq_join.fa | /usr/local/bioconda/bin/tblastn -num_threads 20 -db GCA_000695325.1_P.bachei_draft_genome_v.1.1_genomic_Pleurobrachia_bachei.fna -outfmt 5 | java -jar /space31/PEG/pcornejo/jvarkit/dist/biostar160470.jar -p /usr/local/bioconda/bin -d GCA_000695325.1_P.bachei_draft_genome_v.1.1_genomic_Pleurobrachia_bachei.fna | xmllint --format - > A_thaliana_all_prot_Pleurobrachia_bachei_tblastn_format5_test.txt

[SEVERE][Biostar160470]Proc failed: [/usr/local/bioconda/bin/blastdbcmd, -db, GCA_000695325.1_P.bachei_draft_genome_v.1.1_genomic_Pleurobrachia_bachei.fna, -entry, gnl|BL_ORD_ID|0, -outfmt, %s, -range, 207323-207664] java.lang.RuntimeException: Proc failed: [/usr/local/bioconda/bin/blastdbcmd, -db, GCA_000695325.1_P.bachei_draft_genome_v.1.1_genomic_Pleurobrachia_bachei.fna, -entry, gnl|BL_ORD_ID|0, -outfmt, %s, -range, 207323-207664] at com.github.lindenb.jvarkit.tools.biostar.Biostar160470.parseHit(Biostar160470.java:270) at com.github.lindenb.jvarkit.tools.biostar.Biostar160470.parseBlast(Biostar160470.java:188) at com.github.lindenb.jvarkit.tools.biostar.Biostar160470.doWork(Biostar160470.java:336) at com.github.lindenb.jvarkit.util.jcommander.Launcher.instanceMain(Launcher.java:756) at com.github.lindenb.jvarkit.util.jcommander.Launcher.instanceMainWithExit(Launcher.java:919) at com.github.lindenb.jvarkit.tools.biostar.Biostar160470.main(Biostar160470.java:358) [INFO][Launcher]biostar160470 Exited with failure (-1) -:1106: parser error : Premature end of data in tag Iteration_hits line 1105

^ -:1106: parser error : Premature end of data in tag Iteration line 1100

^ -:1106: parser error : Premature end of data in tag BlastOutput_iterations line 18

^ -:1106: parser error : Premature end of data in tag BlastOutput_program line 2

^ -:1106: parser error : Premature end of data in tag BlastOutput line 1

^

lindenb commented 5 years ago

what is the output of the following command:

/usr/local/bioconda/bin/blastdbcmd -db "GCA_000695325.1_P.bachei_draft_genome_v.1.1_genomic_Pleurobrachia_bachei.fna" -entry "gnl|BL_ORD_ID|0" -outfmt "%s" -range "207323-207664"

?

cosmiccapybara commented 5 years ago

This: Error: [blastdbcmd] CObject_id::GetId(): Invalid choice selection: NCBI-General::Object-id.str

cosmiccapybara commented 5 years ago

It is solved: I read this https://blastedbio.blogspot.com/2012/10/my-ids-not-good-enough-for-ncbi-blast.html

and ran again the makeblasddb command adding "-parse_seqids" :

makeblastdb -in GCA_000695325.1_P.bachei_draft_genome_v.1.1_genomic_Pleurobrachia_bachei.fna -dbtype nucl -out blastdb_bachei -parse_seqids

Then:

cat A_thaliana_all_prot_uniq_join.fa | /usr/local/bioconda/bin/tblastn -num_threads 20 -db blastdb_bachei -outfmt 5 | java -jar /space31/PEG/pcornejo/jvarkit/dist/biostar160470.jar -p /usr/local/bioconda/bin -d blastdb_bachei | xmllint --format - > A_thaliana_all_prot_Pleurobrachia_bachei_tblastn_format5_test.txt

It ran without errors and It seems the file is complete. I hope it works for all the genomes! Thank you so much!!

Paola