soedinglab / MMseqs2

MMseqs2: ultra fast and sensitive search and clustering suite
https://mmseqs.com
MIT License
1.46k stars 198 forks source link

linclust on fasta file or DB file? #892

Open MarliesJFrancine opened 2 months ago

MarliesJFrancine commented 2 months ago

Hi,

I want to cluster a large dataset of DNA sequences. Must I first convert my fasta file into a DB format file? As is written here: https://github.com/soedinglab/MMseqs2/wiki#linclust, or can I use my fasta file directly? As is written here on the GitHub page.

What is the best approach here?

Kind regards, Marlies

Prangejet commented 3 weeks ago

You can use mmseqs easy-linclust with FASTA file, or convert your FASTA file into a DB file by mmseqs createdb and then cluster it by mmseqs linclust .