Closed jihoonk1495 closed 5 years ago
Good question. In the C library, there is a function parasail_traceback_generic_extra
that takes a FILE*
stream. There's no good way to wrap that in Python. But there is another function parasail_traceback_generic
that prints to stdout and can handle different representations. Would that be useful? I'm not sure how that would interact with the output of your program, mixing Python output and C library stdout output.
If you want to see the types of output formats supported by these functions, look at the output of the C library's parasail_aligner
application. It offers a few different ways of representing the alignment to the screen.
If I were to attempt a new feature, unfortunately your question is slightly too generic. What should the returned string look like? GTT-AC\n||| ||\nGTTGAC
? Is this a useful representation? Or would you want seq1, the alignment characters, and seq2 output as separate strings? There are so many variations, that's why I'm pressing for more detail.
I am using parasail-python to identify potential motifs in a set of sequences; ideally, I wanted to run local alignment on multiple sequences simultaneously, but as far as I am aware, parasail performs only pairwise alignment. As an alternative, I am running multiple pairwise alignments in a set of sequences, which I believe may share a motif. So I was looking to check local alignment string by eye. I only wrote the example output string GTT-AC\n||| ||\nGTTGAC from experience with Biopython (Bio import pairwise2, Bio.pairwise2 import format_alignment), though it also includes, the un-aligned characters, which I do not want. For my purposes, I believe: GTT-AC\n GTTGAC should be sufficient.
As I am not as familiar with C, which might you recommend to get representation, as described above?
You're right that parasail is only pairwise. What you're asking for does not yet exist in the C library or in the python wrappers. I'll mark this as a feature request.
Thank you!
@jihoonk1495 Ok, I'll need you to try out the new functionality to make sure it's what you wanted. I created a traceback function that returns a struct of the three strings representing the alignment. In python, it looks like this:
>>> import parasail
>>> r = parasail.sw_trace("asdf", "asdc", 10, 1, parasail.blosum62)
>>> r.traceback.ref
'asd'
>>> r.traceback.comp
'|||'
>>> r.traceback.query
'asd'
>>> r = parasail.sw_trace("asdf", "asxdf", 10, 1, parasail.blosum62)
>>> r.traceback.ref
'sxdf'
>>> r.traceback.comp
':.||'
>>> r.traceback.query
'asdf'
>>> t = r.get_traceback('=', '+', '-')
>>> t.ref
'sxdf'
>>> t.comp
'+-=='
>>> t.query
'asdf'
The default parameters to get_traceback(mch='|', sim=':', neg='.')
can be changed if you call the method. Otherwise, the traceback
property will return the default traceback. The returned query and ref strings will contain dashes to indicate indels.
Let me know if that's what you wanted. If so, I will tag a new C library release as well as a new python release. If you don't want to test it, but you otherwise approve of the interface, I'll cut the new releases as-is.
To test the new functionality, you would need to build the C library yourself from the develop branch. These python wrappers are already part of the master branch but the corresponding C functionality that these wrappers depend on are only part of the C library's develop branch.
@jeffdaily
Before I try out the new functionality, I was wondering "neg" signifies. Does it indicate the particular character to be part of the optimal substring? In the above example between sxdf and asdf, what differentiates s - x (position 2) , from s - a (position 1)?
"neg" signifies a substitution where the substitution matrix indicated a negative score. You are allowed to have different characters for indicating an exact match, a positive-valued substitution, and a negative-valued substitution; as opposed to indels.
Releasing 1.1.13 today with these features.
Hello, I was trying to replicate the local alignment explained on the Wikipedia page for Smith–Waterman algorithm:
custMatrix = parasail.matrix_create("ACGT", 3, -3)
seq1 = "TGTTACGG" seq2 = "GGTTGACTA" locAl = parasail.sw_trace(seq1, seq2, 2, 2, custMatrix) locCigar = locAl.cigar
How can I use locAl or locCigar to return just this alignment as a string:
Is there an string output function I am missing?