steineggerlab / foldseek

Foldseek enables fast and sensitive comparisons of large structure sets.
https://foldseek.com
GNU General Public License v3.0
818 stars 101 forks source link

Confusion on how to apply taxonomy filter to search results #125

Open sthehristov opened 1 year ago

sthehristov commented 1 year ago

Tried the following commands:

foldseek easy-search --taxon-list "Bacteria" 1rqf.cif pdb foldseek_testrun/bacterial_test tmp foldseek easy-search --taxon-list Bacteria 1rqf.cif pdb foldseek_testrun/bacterial_test tmp foldseek easy-search --taxon-list "txid2" 1rqf.cif pdb foldseek_testrun/bacterial_test tmp foldseek easy-search --taxon-list txid2 1rqf.cif pdb foldseek_testrun/bacterial_test tmp

None of them gave me а filtered output with only bacterial proteins. Am I missing something in the command? There is no available documentation on how to apply taxonomic filters in the command line. I can see that this is an option in the web version of foldseek and want to use it with my search settings.

milot-mirdita commented 1 year ago

You need to specify the numeric NCBI taxon identifier for this to work. You can look up the numeric values on the NCBI taxonomy browser: https://www.ncbi.nlm.nih.gov/Taxonomy/Browser/wwwtax.cgi?mode=Undef&id=2&lvl=3&srchmode=1&keep=1&unlock