jeffdaily / parasail-python

Python bindings for the parasail C library.
Other
87 stars 17 forks source link

Adding example for python #3

Closed jeffdaily closed 6 years ago

jeffdaily commented 8 years ago

Originally submitted by @ksahlin on parasail project. Now moved to this parasail-python project.

Hi,

This library looks very interesting. I just installed it (together with the python binding). However, I would kindly request you to add some more information about the usage of the python wrapping library. For example, by walking through a more informative example (or pointing to where documentation about this could be found). For some reference to what the user might initially do (i.e. what I did)

>>> result = parasail.sw_scan_16("AAAAACGGGAGGGAGGAGAG", "AAAAAGGCGGGAGGGAGGGAGGAGA", -11, -1, parasail.blosum62)
>>> result
<parasail.Result object at 0x106c0ceb8>
>>> dir(result)
['__class__', '__delattr__', '__doc__', '__format__', '__getattribute__', '__hash__', '__init__', '__int__', '__long__', '__new__', '__pyx_vtable__', '__reduce__', '__reduce_ex__', '__repr__', '__setattr__', '__sizeof__', '__str__', '__subclasshook__', 'length', 'length_col', 'length_row', 'length_table', 'matches', 'matches_col', 'matches_row', 'matches_table', 'saturated', 'score', 'score_col', 'score_row', 'score_table', 'similar', 'similar_col', 'similar_row', 'similar_table']
>>> result.length_table
>>> result.matches
0
>>> result.length
0
>>> result.score
1356
>>> result.matches
0

This still leaves me clueless.

Also, I noticed that I had to run python setup.py install (after build) to be able to import parasail, so that would be good to add to the installation instructions in the python section.

Furthermore, I might add that I'm mainly interested in retrieving the CIGAR of the alignments, I noticed that you had opened an issue about it. That feature would be great.

Best, Kristoffer

daveuu commented 8 years ago

Having an option to return each sequence as aligned (with '-' inserted where appropriate) would give the python bindings appeal to a broader audience, although it seems the parasail library itself might be targeted at a lower level (just the scores+stats to be used as required?). Reconstructing the sequences as aligned might be quicker on the C side than Python: benefits could be seen if a lot of alignment calls are made.

daveuu commented 8 years ago

If the parasail-python bindings module was to pander more in the direction of the casual python user but maintain performance, the CPython array class might be an option worth pursuing for returning aligned string representations of the input sequences: https://docs.python.org/3/library/array.html.

In supporting the buffer interface, my limited understanding of these sort of things makes me think these arrays allow C code generating the alignment strings to put them into Python-world in a performant way without the overhead of full-blown Python Lists or Strings.

ksahlin commented 6 years ago

@jeffdaily I think you have done a great job with the documentation and original issue could be closed in my opinion. @daveuu Has some further good ideas. Regarding https://github.com/jeffdaily/parasail-python/issues/3#issuecomment-226823053 , since the library returns cigar, the alignment can be reconstructed so I believe this is "solved" up to some level of satisfaction (up to speed enhancement by putting the string conversion in C ).

rsharris commented 3 years ago

I'm posting this so the next newbie to come along won't do what I did.

FYI, in Kristoffer's example, it looks like the gap penalties have the wrong sign (at least relative to the current state of the repo). I was using his example to try to figure out what the interface to the package is. I'd align 1K bp nt strings and end up with alignments consisting of long indels with only a couple of 1bp matches, even when I gave it two identical strings.

Eventually I figured out that -11,-1 should be 11,1. And now I get more reasonable alignments.

I have to wonder, though, where I can go to understand what the functional interface is. For example, I inferred that -11,-1 must be gap penalties. But is there some documentation somewhere for that? I've tried looking at bindings_v2.py but haven't ben able to glean how the python params correspond to the underlying C code (the binding code is naturally very abstract; I was only looking there because I haven't found what I was looking for elsewhere).