soedinglab / MMseqs2

MMseqs2: ultra fast and sensitive search and clustering suite
https://mmseqs.com
GNU General Public License v3.0
1.38k stars 195 forks source link

Filter a certain number of species #865

Closed feixiang1209 closed 2 months ago

feixiang1209 commented 2 months ago

I have created mmseq database with ncbi nt, and added taxonomy information. To shorter the search time, I am trying to create a small database with a certain number of species. I have got the ncbi taxIDs of them already, my question is how to filter the ntDATABase with those taxIDs.

In my mind now, I should use the filterdb function with --filter-regex and --filter-column. Problem is I don't know which column is the taxID in the ntDATAbase. Could anyone guide me how to check the structure of the data base and find which column is the taxID? Or are there any other ways to filter the nt database using the taxID list?

Thanks

milot-mirdita commented 2 months ago

There is the mmseqs filtertaxseqdb module. It's examples in the help text should show you how to use this.

feixiang1209 commented 2 months ago

@milot-mirdita Thank you so much!!!!!! I have been working on this all day today but couldn't get a clue. I tried with filterdb and filtertaxdb, turns out there is another "filtertaxseqdb"!!!!

milot-mirdita commented 2 months ago

Sorry, we have built a lot of modules over the years 😅

feixiang1209 commented 2 months ago

No worries at all!