mengyao / Complete-Striped-Smith-Waterman-Library

298 stars 112 forks source link

Python package #29

Closed vishnubob closed 8 years ago

vishnubob commented 8 years ago

This patch adds code to make it easy to install the ssw library and wrapper as a python package. setup.py handles the work of compiling and installing libssw.so along with the wrapper. I also added a function to src/ssw_wrap.py to search for libssw.so, which means you won't need to adjust $LD_LIBRARY_PATH to use your library from python.

vishnubob commented 8 years ago

I wanted to let you know I've created my own Python API/module based on your C library. I've added some nice features to my python API like automatic revcomp searches, BLAST style alignment reports and unit tests. It's still a work in progress, but I'm hopeful it will make it easy for python programmers to get up and running with your library.

mengyao commented 8 years ago

Thank you for your improvements. They are very helpful.

ksahlin commented 8 years ago

Thanks a lot for the repo and for this enhancement --- I needed the specific feature of accessing the "BLAST like" matching string after SW alignment. On most instances this improvement seems to work great, but I think I've found a bug in the cigar conversion for some instances. I'm providing an example below:

        ref_seq = "CCC" + "AGCT"*10
        query_seq = "AGGT"*10
        ssw = Aligner(ref_seq, match=1, mismatch=1, gap_open=1, gap_extend=1, report_secondary=False, report_cigar=True)
        res = ssw.align(query_seq, min_score=10, min_len=20)

        print(res.alignment[0][:])
        print(res.alignment[1][:])
        print(res.alignment[2][:])

gives

CCCAGCTAGCTAGCTAGCTAGCTAGCTAGCTAGCTAGC
**************************************
AGGTAGGTAGGTAGGTAGGTAGGTAGGTAGGTAGGTAG

if i do print(str(res)) i get:

Score            20
Reference begin  3
Reference end    40
Query begin      0
Query end        37
Cigar_string     38M2S

My observation from this is that score is correct: 3 matches and 1 mismatch every 4-tuple ten times gives (3-1)*10=20 (with my penalties). Reference begin also seems correct (skipping the 3 C's). However, shouldn't ref end be 43, and query start and end be 0 and 40 respectively? Furthermore, the cigarstring looks very suspicious. What I would expect is something like:

CCCAGCTAGCTAGCTAGCTAGCTAGCTAGCTAGCTAGCTAGCT
   ||*|||*|||*|||*|||*|||*|||*|||*|||*|||*|
---AGGTAGGTAGGTAGGTAGGTAGGTAGGTAGGTAGGTAGGT

Also a minor note: print(str(res)) gives some easily fixed bugs like: UnboundLocalError: local variable 'msg' referenced before assignment and AttributeError: 'PyAlignRes' object has no attribute 'score2' that are easily found in the __str__() method. but I locally quick fixed them to be able to print the cigar.

Thanks for a great library, Kristoffer

vishnubob commented 8 years ago

kshalin: do me a favor, and try my version of the python package, and see if you get the same errors: https://github.com/vishnubob/ssw

ksahlin commented 8 years ago

Hi,

yes it returns the same alignment, example below:

client-104-39-79-38:workspace kxs624$ python
Python 2.7.9 (default, Dec  1 2015, 18:18:28) 
[GCC 4.2.1 Compatible Apple LLVM 7.0.0 (clang-700.1.76)] on darwin
Type "help", "copyright", "credits" or "license" for more information.
>>> import ssw
>>> ref_seq = "CCC" + "AGCT"*10
>>> query_seq = "AGGT"*10
>>> aligner = ssw.Aligner(ref_seq, gap_open=1, gap_extend=1)
>>> alignment = aligner.align(query_seq)
>>> print(alignment.alignment_report)
Score = 40, Matches = 0, Mismatches = 38, Insertions = 0, Deletions = 0

ref   4   CCCAGCTAGCTAGCTAGCTAGCTAGCTAGCTAGCTAGC
          **************************************
query 1   AGGTAGGTAGGTAGGTAGGTAGGTAGGTAGGTAGGTAG

I see that you have removed the match and mismatch options. What are these penalties set to, as the score is now 40?

vishnubob commented 8 years ago

@ksahlin Ok, thanks for this. Please do me a favor and open an issue on my repository with the same information, and I will followup there.

mengyao commented 8 years ago

Dear Giles,

If you fix this problem, would you mind to make a pull request at this repository?

Hope people don't meet this problem here either.

Thank you a lot for your help in advance.

Yours,

Mengyao

vishnubob commented 8 years ago

@mengyao at this point, the python package I've developed and the python code provided by your library have diverged enough to make this more work then I have time for. I consider your repository the "library" and my package as the python wrapper for your library. This is not an unusual way to architect these kinds of packages, and it means I can adjust my package without having to go through an extra step of generating a pull request.