peterjc / galaxy_blast

Galaxy wrappers for NCBI BLAST+ and related BLAST tools.
76 stars 70 forks source link

NCBI BLAST+ blastn overflow error with NCBI NT 2023-09-01 Nucleotide BLAST database #156

Closed kysrpex closed 1 year ago

kysrpex commented 1 year ago

The latest version of NCBI BLAST+ blastn available in this repository seems to be incompatible with the NCBI NT database from September 1, 2023. Below you may find the outputs of a job I launched myself on UseGalaxy.eu to reproduce the issue.

Command Line

blastn  -query '/data/dnb09/galaxy_db/files/0/5/1/dataset_051aeaa5-7cb3-4776-a46b-9ab01c6d3f8e.dat'   -db '"/data/db/databases/blast/nt/2023-09-01/nt"'  -task 'blastn' -evalue '0.001' -out '/data/jwd05e/main/062/440/62440929/outputs/dataset_ba855f06-d5b2-4810-9b63-71b033951036.dat' -outfmt '6 std sallseqid score nident positive gaps ppos qframe sframe qseq sseq qlen slen salltitles'  -num_threads "${GALAXY_SLOTS:-8}"

Tool Standard Error

Error: NCBI C++ Exception:
    T0 "/opt/conda/conda-bld/blast_1595737360567/work/blast/c++/src/serial/objistrasnb.cpp", line 499: Error: (CSerialException::eOverflow) byte 83: overflow error ( at [].[].gi)
    T0 "/opt/conda/conda-bld/blast_1595737360567/work/blast/c++/src/serial/member.cpp", line 768: Error: (CSerialException::eOverflow) ncbi::CMemberInfoFunctions::ReadWithSetFlagMember() - error while reading seqid ( at Blast-def-line-set.[].[].seqid.[].[].gi)

Tool Exit Code

255

The bug can be reproduced on UseGalaxy.eu using the following input [1],

ATGAAAAAGATAAAAATTGTTCCACTTATTTTAATAGTTGTAGTTGTCGGGTTTGGTATATATTTTTATGCTTCCAAAGATAAAGAAATTAATAATACTATTGATGCAATTGAAGATAAAAATTTCAAACAAGTTTATAAAGATAGCAGTTATATTTCTAAAAGCGATAATGGTGAAGTAGAAATGACTGAACGTCCGATAAAAATATATAATAGTTTAGGCGTTAAAGATATAAACATTCAGGATCGTAAAATAAAAAAAGTATCTAAAAATAAAAAACGAGTAGATGCTCAATATAAAATTAAAACAAACTACGGTAACATTGATCGCAACGTTCAATTTAATTTTGTTAAAGAAGATGGTATGTGGAAGTTAGATTGGGATCATAGCGTCATTATTCCAGGAATGCAGAAAGACCAAAGCATACATATTGAAAATTTAAAATCAGAACGTGGTAAAATTTTAGACCGAAACAATGTGGAATTGGCCAATACAGGAACAGCATATGAGATAGGCATCGTTCCAAAGAATGTATCTAAAAAAGATTATAAAGCAATCGCTAAAGAACTAAGTATTTCTGAAGACTATATCAAACAACAAATGGATCAAAATTGGGTACAAGATGATACCTTCGTTCCACTTAAAACCGTTAAAAAAATGGATGAATATTTAAGTGATTTCGCAAAAAAATTTCATCTTACAACTAATGAAACAAAAAGTCGTAACTATCCTCTAGGAAAAGCGACTTCACATCTATTAGGTTATGTTGGTCCCATTAACTCTGAAGAATTAAAACAAAAAGAATATAAAGGCTATAAAGATGATGCAGTTATTGGTAAAAAGGGACTCGAAAAACTTTACGATAAAAAGCTCCAACATGAAGATGGCTATCGTGTCACAATCGTTGACGATAATAGCAATACAATCGCACATACATTAATAGAGAAAAAGAAAAAAGATGGCAAAGATATTCAACTAACTATTGATGCTAAAGTTCAAAAGAGTATTTATAACAACATGAAAAATGATTATGGCTCAGGTACTGCTATCCACCCTCAAACAGGTGAATTATTAGCACTTGTAAGCACACCTTCATATGACGTCTATCCATTTATGTATGGCATGAGTAACGAAGAATATAATAAATTAACCGAAGATAAAAAAGAACCTCTGCTCAACAAGTTCCAGATTACAACTTCACCAGGTTCAACTCAAAAAATATTAACAGCAATGATTGGGTTAAATAACAAAACATTAGACGATAAAACAAGTTATAAAATCGATGGTAAAGGTTGGCAAAAAGATAAATCTTGGGGTGGTTACAACGTTACAAGAAATAAAGTGGTAAATGGTAATATCGACTTAAAACAAGCAATAGAATCATCAGATAACATTTTCTTTGCTAGAGTAGCACTCGAATTAGGCAGTAAGAAATTTGAAAAAGGCATGAAAAAACTAGGTGTTGGTGAAGATATACCAAGTGATTATCCATTTTATAATGCTCAAATTTCAAACAAAAATTTAGATAATGAAATATTATTAGCTGATTCAGGTTACGGACAAGGTGAAATACTGATTAACCCAGTACAGATCCTTTCAATCTATAGCGCATTAGAAAATAATGGCAATATTAACGCACCTCACTTATTAAAAGACACGAAAAACAAAGTTTGGAAGAAAAATATTATTTCCAAAGAAAATATCAATCTATTAACTGATGGTATGCAACAAGTCGTAAATAAAACACATAAAGAAGATATTTATAGATCTTATGCAAACTTAATTGGCAAATCCGGTACTGCAGAACTCAAAATGAAACAAGGAGAAACTGGCAGACAAATTGGGTGGTTTATATCATATGATAAAGATAATCCAAACATGATGATGGCTATTAATGTTAAAGATGTACAAGATAAAGGAATGGCTAGCTACAATGCCAAAATCTCAGGTAAAGTGTATGATGAGCTATATGAGAACGGTAATAAAAAATACGATATAGATGAATAA

and choosing "blastn" as "Type of BLAST".

You may import NCBI-BLAST-blastn-overflow-error-with-NCBI-NT-2023-09-01-Nucleotide-BLAST-database.rocrate.zip to save yourself the hassle of setting up the job.

According to a Stack Overflow post mentioning the same issue [1], the solution may be to update NCBI BLAST+ blastn.

[1] - https://stackoverflow.com/questions/70370949/local-blast-ncbi-c-exception

kysrpex commented 1 year ago

@bgruening This is the BLAST issue I commented this morning.

kysrpex commented 1 year ago

I assume #146 is related.

peterjc commented 1 year ago

Currently the wrapper specifies BLAST+ version 2.10.1 here https://github.com/peterjc/galaxy_blast/blob/master/tools/ncbi_blast_plus/ncbi_macros.xml

If we have reason to believe that version of BLAST+ can't cope with the latest NCBI DB, then updating ought to solve this - and touch wood ought not to be too complicated (assuming not changes to the command line etc). i.e. Issue #146.

Has anyone tried to reproduce this at the command line outside of Galaxy? I can probably do that locally with a recent copy of NCBI NT from August/September 2023...

peterjc commented 1 year ago

Confirming with our local copy of NT on Linux, BLAST 2.14.1 (current latest on bioconda) worked fine with the above command giving 500 hits (default limit), but after downgrading to BLAST 2.10.1 it crashes:

Error: NCBI C++ Exception:
    T0 "/opt/conda/conda-bld/blast_1607337341665/work/blast/c++/src/serial/objistrasnb.cpp", line 499: Error: (CSerialException::eOverflow) byte 83: overflow error ( at [].[].gi)
    T0 "/opt/conda/conda-bld/blast_1607337341665/work/blast/c++/src/serial/member.cpp", line 768: Error: (CSerialException::eOverflow) ncbi::CMemberInfoFunctions::ReadWithSetFlagMember() - error while reading seqid ( at Blast-def-line-set.[].[].seqid.[].[].gi)

Either version takes nearly an hour with 8 cores and 100GB allocated on our cluster:

time blastn -db $BLASTDB/nt -query query.fasta -task 'blastn' -evalue '0.001' -outfm t '6 std sallseqid score nident positive gaps ppos qframe sframe qseq sseq qlen slen salltitles' -out query_shared.tsv -num_threads 8

This is a strong reason to push a BLAST update for the wrappers.

peterjc commented 1 year ago

Updated wrappers released via #157, this should be resolved now - closing issue.

kysrpex commented 1 year ago

Updated wrappers released via #157, this should be resolved now - closing issue.

Thanks!