jeffdaily / parasail-python

Python bindings for the parasail C library.
Other
87 stars 17 forks source link

Inconsistencies and possible errors in nw results #5

Closed daveuu closed 7 years ago

daveuu commented 8 years ago

Testing parasail.nw_striped_16() gives a different result.score_table() each time:

$ ./venv/bin/python
Python 2.7.11 (default, Mar 31 2016, 06:18:34) 
[GCC 5.3.0] on linux2
Type "help", "copyright", "credits" or "license" for more information.
>>> import parasail
>>> A = 'GGGG'
>>> B = 'GGGG'
>>> open_pen = 11
>>> extend_pen = 1
>>> result = parasail.nw_striped_16(A, B, open_pen, extend_pen, parasail.pam100)
>>> print(result.score_table)
[[          0           0           0           0]
 [          0           0           0           0]
 [          0           0           0           0]
 [          0           0 -1418667952       32697]]
$ ./venv/bin/python
Python 2.7.11 (default, Mar 31 2016, 06:18:34)
[GCC 5.3.0] on linux2
Type "help", "copyright", "credits" or "license" for more information.
>>> import parasail
>>> A = 'GGGG'
>>> B = 'GGGG'
>>> open_pen = 11
>>> extend_pen = 1
>>> result = parasail.nw_striped_16(A, B, open_pen, extend_pen, parasail.pam100)
>>> print(result.score_table)
[[         0          0          0          0]
 [         0          0          0          0]
 [         0          0          0          0]
 [         0          0 1254623312      32688]]

The connection between the Python reference and the actual result seems to disappear when returned within IPython (unless I'm very confused):

$ ./venv/bin/ipython
Python 2.7.11 (default, Mar 31 2016, 06:18:34) 
Type "copyright", "credits" or "license" for more information.

IPython 4.2.0 -- An enhanced Interactive Python.
?         -> Introduction and overview of IPython's features.
%quickref -> Quick reference.
help      -> Python's own help system.
object?   -> Details about 'object', use 'object??' for extra details.

In [1]: import parasail

In [2]: A = 'GGGG'

In [3]: B = 'GGGG'

In [4]: open_pen = 11

In [5]: extend_pen = 1

In [6]: result = parasail.nw_striped_16(A, B, open_pen, extend_pen, parasail.pam100)

In [7]: print(result.score_table)
[[         0          0          0          0]
 [1417427152      32642 1417466776      32642]
 [1686849440      32642 1417467056      32642]
 [         0          0          0          0]]

In [8]: print(result.score_table)
[[         0          0          0          0]
 [1417427152      32642 1417466776      32642]
 [1686849440      32642 1417467056      32642]
 [         0          0          0          0]]

In [9]: result.score_table
Out[9]: 
array([[         0,          0,          0,          0],
       [1417427152,      32642, 1417466776,      32642],
       [1686849440,      32642, 1417467056,      32642],
       [         0,          0,          0,          0]], dtype=int32)

In [10]: result.score_table
Out[10]: 
array([[0, 0, 0, 0],
       [0, 0, 0, 0],
       [0, 0, 0, 0],
       [0, 0, 0, 0]], dtype=int32)
daveuu commented 8 years ago

OK! I just tried:

parasail.nw_stats_table_striped_16(A, B, open_pen, extend_pen, parasail.pam100)

And the output is making more sense now:

In [14]: result.score_table
Out[14]: 
array([[ 5, -6, -7, -8],
       [-6, 10, -1, -2],
       [-7, -1, 11,  0],
       [-8, -2,  0, 12],
       [-9, -3,  3,  5]], dtype=int32)

I guess the non-table, non-stats version of the function is returning uninitiated/empty tables regardless?

jeffdaily commented 8 years ago

Looks like you figured it out faster than I could reply to your first message. Yes, the non-table versions of the functions do not populate that table memory and you're getting garbage. I'm surprised it isn't causing a segmentation fault. Two solutions, 1) better documentation, and/or 2) return None if indeed there shouldn't be a score table.

I'm glad you figured it out! Also glad that's finally working in both py2 and py3. This was my first attempt at a project available via pip, and my first venture into a py3 library.

I hope this library will be useful to you.

daveuu commented 8 years ago

The documentation for the main library is perfectly clear with respect to the function names and whether to expect tables etc. But ideally both those solutions (one really just being adapt existing documentation from main parasail library). Cheers!

(ultimately I'm looking to generate the aligned sequences for moving window calculations but that is addressed in the comment to the other issue)