modernatx / seqlike

Unified biological sequence manipulation in Python
https://modernatx.github.io/seqlike
Apache License 2.0
207 stars 21 forks source link

Adjusting alignment parameters? #24

Open ericmjl opened 2 years ago

ericmjl commented 2 years ago

I have a need to adjust alignment parameters; for example, I have encountered something akin to this issue, and the proposed solution from the author of MAFFT is to adjust one of the MAFFT parameters.

Adjusting alignment parameters via the .seq.align() API might be helpful. A few designs for the user-facing API that I can think of include:

# default aligner is MAFFT, so we can pass through the command line options via kwargs.
sequences.seq.align(ep=1.59, op=0.0)
# want to use MUSCLE instead of MAFFT
from seqlike.AlignCommandLine import MuscleCommandLine as muscle
sequences.seq.align(aligner=muscle, muscle_arg1=something, muscle_arg2=something)
ndousis commented 2 years ago

The first example (MAFFT with kwargs) already works as written.

The second example (using MUSCLE) requires some extra wrapper code to align letter annotations

from seqlike.alignment_commands import _generic_aligner_commandline_stdout, _generic_alignment
from seqlike.AlignCommandLine import MuscleCommandLine

def muscle_alignment(seqrecs, preserve_order=True, **kwargs):
    """Align sequences using Muscle 3.8.

    :param seqrecs: a list or dict of SeqRecord that will be aligned to ref
    :param preserve_order: if True, reorder aligned seqrecs to match input order.
    :param **kwargs: additional arguments for alignment command
    :returns: a MultipleSeqAlignment object with aligned sequences
    """

    def commandline(file_obj, **kwargs):
        cline = MuscleCommandline(input=file_obj.name, **kwargs)
        return _generic_aligner_commandline_stdout(cline)

    # Muscle reorders alignment by default, but don't overwrite 'group' if already set
    if "group" not in kwargs:
        kwargs["group"] = not preserve_order
    return _generic_alignment(commandline, seqrecs, preserve_order=preserve_order, **kwargs)

which then allows:

sequences.seq.align(aligner=muscle_alignment, muscle_arg1=something, muscle_arg2=something)

Note that this only works for Muscle 3.8; the latest version of Muscle (5.1) has a new interface that is incompatible with MuscleCommandline :(