soedinglab / MMseqs2

MMseqs2: ultra fast and sensitive search and clustering suite
https://mmseqs.com
MIT License
1.44k stars 197 forks source link

All vs All alignment of nucleotides #854

Open luisas opened 4 months ago

luisas commented 4 months ago

I have a set of nucleotide sequences and I need the pairwise sequence similarity of all vs all.

I understand that one should create a fake_pref and use it to run mmseqs align. Yet, in the documentation I find that the function fake_perf() cannot be used for nucleotides. Is there any way i can use mmseqs align to do an allvsall alignment for nucleotides?

Thanks a lot!

Luisa

milot-mirdita commented 4 months ago
mmseqs easy-search dna.fas dna.fas res tmp --prefilter-mode 1 --search-type 3 --max-seqs 1000000

Prefilter Mode 1 should be the closest you can currently get with MMseqs2. This will run an exhaustive search with an ungapped prefiltering algorithm and then run SW/ksw2 on the accepted hits from the ungapped alignment.

It's not quite exhaustive SW, but it should be very close.

luisas commented 4 months ago

Hi,

thanks a lot for the fast reply.

I am interested in having also comparisons of sequences also with low sequence similarity. With the above command they still get filtered out unfortunately. I tried to play around with other command line options but i understand this is not currently possible, is this correct?

Thanks a ton!

milot-mirdita commented 4 months ago

Nucleotide sequence signal just isn’t as conserved as the protein one, so I don’t think you’ll be able to go much deeper with sequence identity anyway than this procedure would enable.

you can also further lower the min diag score to let more pass though the ungapped prefilter

the better approach would be to do some profile alignment, but mmseqs doesn’t support this for nucleotide yet. So nhmmer might be the way to go currently

luisas commented 4 months ago

Perfect! This helps a lot, thanks :)