jeffdaily / parasail

Pairwise Sequence Alignment Library
Other
241 stars 34 forks source link

Attempt to to align an empty string appears to cause a Floating point exception #84

Open rsharris opened 3 years ago

rsharris commented 3 years ago

I'm posting this so that others that encounter a "Floating point Exception" will understand that it may mean their input sequences with empty. I wouldn't say that it needs to be fixed, just that users will want to be aware.

I am using sg_trace_striped_16 and in some cases I was passing a pair of empty strings to it. And my program halted with Floating point exception.

Specifically, in an interactive python3 shell, (Python 3.7.3, parasail v2.4.3, parasail-bindings v1.2.3)

import parasail
subsMatrix = parasail.matrix_create("ACGT",2,-1)
parasail.sg_trace_striped_16("","",11,1,subsMatrix)

results in the message "Floating point exception" and kicks me out of python back to bash.

import parasail
subsMatrix = parasail.matrix_create("ACGT",2,-1)
parasail.sg_trace_striped_16("","AAAA",11,1,subsMatrix)

had the same failure.

But

import parasail
subsMatrix = parasail.matrix_create("ACGT",2,-1)
parasail.sg_trace_striped_16("AAAA","",11,1,subsMatrix)

gave me this error instead:

python3: src/memory.c:245: parasail_result_new_trace: Assertion `b > 0' failed.
Aborted

Obviously the simplest work around is for me to make sure I don't try to align empty sequences. What I'm trying to do is I have a different process (using winnowing) that identifies probable homologous segment pairs, each segment having an anchor point that I want to include in the final alignment. (similar to figure 7 here: http://www.bx.psu.edu/~rsharris/lastz/README.lastz-1.04.03.html#stage_gapped). I am using parasail to align in both directions from the anchor point. The problem arises when my anchor is at either end of the interval.

jeffdaily commented 3 years ago

Thanks for the report. No spoilers, but I'm finding some time to work on a different user request, and a general lack of verifying safe inputs was brought to my attention. Not only that, but the use of assert as a way of checking inputs to functions is bad practice, and as you've observed, causes python to abort. In addition to replacing assert with more friendly alternatives, I'm adding more input-checking to the C library. But it sounds like I should do the same for the python bindings.

Curious why you're using the v1 python bindings. Are you finding a reason to use an older C library with your python bindings?

rsharris commented 3 years ago

I wasn't really offended (for lack of a better word) by it exiting python. I presume the pythonic solution would be to raise an exception so the caller could trap/handle that if they choose. I was a little baffled by it being a "floating point" exception, as I expected striped_16 would be using 16-bit ints.

As to why I'm using "the v1 python bindings", it has to simply be ignorance, unaware of which I'm using and what else might exist. I cloned parasail v2.4.3 since that's the latest tagged release in the repo. I built it using "autoreconf -fi" and so on. I installed parasail-python using "python3 -m pip install --user parasail" and, inspecting the log of that I see that it installed v1.2.3, which is the latest tagged release in the repo. I think I'm using the alignment calls that are described in the readme.

Should I have installed something else?

jeffdaily commented 3 years ago

I think the floating point error might have been a generic divide by zero error. Not 100% sure.

I haven't done a two-step build in quite some time, building the C library first and then building the python bindings. Embarrassingly, I can't remember what the python project will do -- find the system-installed libparasail.so? Or re-download and build the C lib master?

The v1 python bindings are for version 1.x of the C library. Pretty much obsolete at this point, but I keep it around for backwards-compatibility reasons. If I were to add any input validation, it would be in the v2 bindings only. But I would assume most everyone these days is using that by default anyhow. The version check and bindings import happens at runtime in the parasail/__init__.py.

The versioning between the two projects is out of sync, intentionally, but it is confusing. This allows the two projects to develop independently. I try to follow semantic versioning in both cases, and for the C library, appropriate libtool versioning.

rsharris commented 3 years ago

This has to be ignorance on my part. How would I install this as a one-step build and get the v2 bindings?

Basically I stopped reading the parasail-python installation instructions once I got to "Using pip" and had success installing it that way. I didn't read "building from source" and skipped ahead to "quick example."

jeffdaily commented 3 years ago

I failed to ask which OS you're using. Windows, Mac, or Linux? I generally have the most experience on Linux.

For Mac and Linux, pip install parasail should work to get the latest python bindings and the latest underlying C library.

Any easy way to confirm if you're using v1 or v2 bindings? Print the version of the C library from python. If it is (2,x,x), you're using v2 bindings.

~/parasail-python$ python -c 'import parasail; print(parasail.version())'
(2, 4, 3)

Don't confuse this with the version of the python bindings. That would instead be parasail.__version__.

On Linux, a more advanced use case is when someone has already installed the C library to a system location like /usr/local. In that case, if you set the environment variable PARASAIL_SKIP_BUILD it will just install the python bindings without (re)building C parasail. In that case, at run time parasail-python will search your system locations for libparasail.so.

rsharris commented 3 years ago

I should have mentioned the OS. It's some brand of linux, I'm not sure which at the moment.

I actually tried to install on 4 different machines, and failed on 3. The first two were desktop macs here at home (the autoconf suite has been hit or miss for me on my macs, probably I don't have that suite installed right). The third was a server in our lab on campus, running linux of some sort. The fourth is a different server in our lab.

Looks like I am using v2 bindings:

python3
Python 3.7.3 (default, Jul 25 2020, 13:03:44) 
[GCC 8.3.0] on linux
>>> import parasail
>>> parasail.version()
(2, 4, 3)
>>> parasail.__version__
'1.2.3'

(At some point I will get back to trying an install on the mac under conda. Currently I've got what (I guess) is a working install on one machine, and it's more important to me to get up to speed using that than to get it working on my desktop.)