kensho-technologies / sequence_align

Efficient implementations of Needleman-Wunsch and other sequence alignment algorithms written in Rust with Python bindings via PyO3.
Apache License 2.0
64 stars 3 forks source link

✨ [Feature Request] Multiple sequence alignment #16

Closed galenseilis closed 2 months ago

galenseilis commented 4 months ago

Multiple sequence alignment would definitely be a huge asset. When I worked on metagenomic sequencing data I would be aligning 1-2 hundred thousand contigs (each about 300-600 base pairs).

(Btw, one thing I like about the current state in this package is that I can align arbitrary sequences of strings, not just macromolecular sequences. That can be handy for certain natural language processing tasks. Please keep that. 🙏)

andrew-titus commented 2 months ago

Hi @galenseilis , thanks for filing the feature request! At this time, we only support pairwise sequence alignment, as we are primarily using this for NLP tasks as you pointed out. If you or others would like to open a pull request with multi-sequence alignment support, would be happy to review!

galenseilis commented 2 months ago

Hey @andrew-titus , thanks for considering this feature request. I don't think I can commit time to develop and test the multiple sequence alignment for this project right now, but I'll keep the opportunity in mind. Cheers.