mhahsler / rBLAST

Interface for the Basic Local Alignment Search Tool (BLAST) - R-Package
GNU General Public License v3.0
103 stars 22 forks source link

-taxids needs addtional data files #36

Open tangwei1129 opened 3 days ago

tangwei1129 commented 3 days ago

Hi rBLAST, Thank you for the pkg. I recently ran into a problem with filtering the blastp results with -taxids 9606. bl <- blast(db = "./swissprot/swissprot", type = "blastp") predict(bl, peptides_xstring_set, BLAST_args = c("-matrix PAM30", "-taxids 9606"))

it gave me result without filtering as well as warning "The -taxids command line option requires additional data files. Please see the section 'Taxonomic filtering for BLAST databases' in https://www.ncbi.nlm.nih.gov/books/NBK569839/ for details."

I wonder which files did I miss? below is file list from database folder swissprot.pdb swissprot.pin swissprot.pog swissprot.pot swissprot.ppi swissprot.ptf swissprot.tar.gz taxdb.bti taxonomy4blast.sqlite3 swissprot.phr swissprot.pjs swissprot.pos swissprot.ppd swissprot.psq swissprot.pto taxdb.btd taxdb.tar.gz

mhahsler commented 3 days ago

Hi,

The package passes your parameters to blastp. Try to rerun the prediction with verbose = TRUE and keep_tmp = TRUE:

predict(bl, peptides_xstring_set, BLAST_args = c("-matrix PAM30", "-taxids 9606"), verbose = TRUE, keep_tmp = TRUE)

It should show you exactly how it calls blastp and you can inspect the files in the tmp folder.

Please let me know what you find out.

Regards, -MFH

tangwei1129 commented 3 days ago

thank you please see below. it still gave me the same warning. see below. the temp folder has the one empty folder, one fasta, and one blast results without filtering still. it did not mention what files I missed.

Starting BLAST

mhahsler commented 3 days ago

I guess it means what it says. This is from the link above:

Starting with BLAST+ 2.15.0, the BLAST+ command line applications support a new feature: they accept non-leaf taxIDs (i.e., those above an organism level, such as the one for primates). This improvement obviates the need to invoke separate tools or have network connectivity to limit non-leaf taxIDs. To support this feature, the NCBI distributes a standalone, file-based database called taxonomy4blast.sqlite3 . This additional database allows efficient taxonomic filtering for BLAST databases. For convenience, this database file is distributed alongside all BLAST databases distributed by the NCBI.

If you are using your own BLAST database(s) and would like to take advantage of this feature, you must set the taxonomy IDs in your database(s) and can get the taxonomy4blast.sqlite3 database by downloading https://ftp.ncbi.nlm.nih.gov/blast/db/taxdb.tar.gz , decompressing it and installing it alongside your other BLAST database(s).

Maybe that is what you are missing.

tangwei1129 commented 2 days ago

I used swissprot database, which is downloaded from BLAST. and I have added the taxonomy4blast.sqlite3 and unzipped taxdb.tar.gz into the same folder with swissprot, which you can see the directory in my first comment. But I still got the warning, so I got confused and asked for your suggestion.

mhahsler commented 2 days ago

It seems like it does not find it there. The documentation is not quite clear where to put it. It just says, "install alongside your database." Can you research this and then let me know what you find out?

It looks like blastn finds the database. Do you have the latest version of BLAST installed?

tangwei1129 commented 2 days ago

yes, the lastest version. 2.16.