sanger-tol / blobtoolkit

Nextflow pipeline for BlobToolKit for Sanger ToL production suite
https://pipelines.tol.sanger.ac.uk/blobtoolkit
MIT License
11 stars 1 forks source link

blastn 2.15 support #105

Open muffato opened 2 months ago

muffato commented 2 months ago

When switching to blastn 2.15 (tried both 2.15.0--pl5321h6f7f691_0 and _1), there is this warning on stderr:

The -taxids command line option requires additional data files. Please see the section 'Taxonomic filtering for BLAST databases' in https://www.ncbi.nlm.nih.gov/books/NBK569839/ for details.

which suggests we could be doing something wrong, but I can't find on https://www.ncbi.nlm.nih.gov/books/NBK569839/#_usrman_BLAST_feat_Taxonomic_filtering_fo_ any information on making the option work.

In the source code c++/src/algo/blast/blastinput/blast_args.cpp I can find that the warning is printed whenever an exception is raised

        try{ 
            tb.reset(new CTaxonomy4BlastSQLite());
        }    
        catch(CException &){
            LOG_POST(Warning << "The -taxids command line option requires additional data files. Please see the section 'Taxonomic filtering for BLAST databases' in https://www.ncbi.nlm.nih.gov/books/NBK569839/ for details.");
        }    

so it seems there really is something wrong, but I don't know what. It'd be great to get more information about that exception, what it means, where exactly it's raised, etc/.

I also tried downloading a recent NT databases and faced the same problem. Maybe we're missing some files ?

tkchafin commented 5 days ago

Just posting as I think this covers embedding the taxids in custom blastdbs, I can look in more detail when I get a chance later https://www.ncbi.nlm.nih.gov/books/NBK569841/ https://www.ncbi.nlm.nih.gov/books/NBK569839/#usrman_BLAST_feat.Taxonomic_filtering_fo

muffato commented 4 days ago

We download the NR database from the NCBI and it should already contain the taxon IDs. The problem is that I had the impression that the error was only showing up with 2.15 (same database and same command-line otherwise). Let me find an example

muffato commented 3 days ago

Yes, here is an example

$ cd ~mm49/link/scratch123/blastn/214
$ singularity exec -B /lustre,/nfs /nfs/treeoflife-01/teams/shared/nextflow/cache/nxf_singularity/depot.galaxyproject.org-singularity-blast-2.14.1--pl5321h6f7f691_0.img /bin/bash $PWD/command.sh
Using ./current/nt

$ singularity exec -B /lustre,/nfs /nfs/treeoflife-01/teams/shared/nextflow/cache/nxf_singularity/quay.io-singularity-blast-2.15.0--pl5321h6f7f691_1.img /bin/bash $PWD/command.sh
Using ./current/nt
The -taxids command line option requires additional data files. Please see the section 'Taxonomic filtering for BLAST databases' in https://www.ncbi.nlm.nih.gov/books/NBK569839/ for details.

If I remember correctly, same problem with a fresh download of the NT database at ~mm49/link/scratch123/blastn/db