Closed vishnubob closed 8 years ago
I wanted to let you know I've created my own Python API/module based on your C library. I've added some nice features to my python API like automatic revcomp searches, BLAST style alignment reports and unit tests. It's still a work in progress, but I'm hopeful it will make it easy for python programmers to get up and running with your library.
Thank you for your improvements. They are very helpful.
Thanks a lot for the repo and for this enhancement --- I needed the specific feature of accessing the "BLAST like" matching string after SW alignment. On most instances this improvement seems to work great, but I think I've found a bug in the cigar conversion for some instances. I'm providing an example below:
ref_seq = "CCC" + "AGCT"*10
query_seq = "AGGT"*10
ssw = Aligner(ref_seq, match=1, mismatch=1, gap_open=1, gap_extend=1, report_secondary=False, report_cigar=True)
res = ssw.align(query_seq, min_score=10, min_len=20)
print(res.alignment[0][:])
print(res.alignment[1][:])
print(res.alignment[2][:])
gives
CCCAGCTAGCTAGCTAGCTAGCTAGCTAGCTAGCTAGC
**************************************
AGGTAGGTAGGTAGGTAGGTAGGTAGGTAGGTAGGTAG
if i do print(str(res))
i get:
Score 20
Reference begin 3
Reference end 40
Query begin 0
Query end 37
Cigar_string 38M2S
My observation from this is that score is correct: 3 matches and 1 mismatch every 4-tuple ten times gives (3-1)*10=20 (with my penalties). Reference begin also seems correct (skipping the 3 C's). However, shouldn't ref end be 43, and query start and end be 0 and 40 respectively? Furthermore, the cigarstring looks very suspicious. What I would expect is something like:
CCCAGCTAGCTAGCTAGCTAGCTAGCTAGCTAGCTAGCTAGCT
||*|||*|||*|||*|||*|||*|||*|||*|||*|||*|
---AGGTAGGTAGGTAGGTAGGTAGGTAGGTAGGTAGGTAGGT
Also a minor note: print(str(res))
gives some easily fixed bugs like:
UnboundLocalError: local variable 'msg' referenced before assignment
and
AttributeError: 'PyAlignRes' object has no attribute 'score2'
that are easily found in the __str__()
method. but I locally quick fixed them to be able to print the cigar.
Thanks for a great library, Kristoffer
kshalin: do me a favor, and try my version of the python package, and see if you get the same errors: https://github.com/vishnubob/ssw
Hi,
yes it returns the same alignment, example below:
client-104-39-79-38:workspace kxs624$ python
Python 2.7.9 (default, Dec 1 2015, 18:18:28)
[GCC 4.2.1 Compatible Apple LLVM 7.0.0 (clang-700.1.76)] on darwin
Type "help", "copyright", "credits" or "license" for more information.
>>> import ssw
>>> ref_seq = "CCC" + "AGCT"*10
>>> query_seq = "AGGT"*10
>>> aligner = ssw.Aligner(ref_seq, gap_open=1, gap_extend=1)
>>> alignment = aligner.align(query_seq)
>>> print(alignment.alignment_report)
Score = 40, Matches = 0, Mismatches = 38, Insertions = 0, Deletions = 0
ref 4 CCCAGCTAGCTAGCTAGCTAGCTAGCTAGCTAGCTAGC
**************************************
query 1 AGGTAGGTAGGTAGGTAGGTAGGTAGGTAGGTAGGTAG
I see that you have removed the match
and mismatch
options. What are these penalties set to, as the score is now 40?
@ksahlin Ok, thanks for this. Please do me a favor and open an issue on my repository with the same information, and I will followup there.
Dear Giles,
If you fix this problem, would you mind to make a pull request at this repository?
Hope people don't meet this problem here either.
Thank you a lot for your help in advance.
Yours,
Mengyao
@mengyao at this point, the python package I've developed and the python code provided by your library have diverged enough to make this more work then I have time for. I consider your repository the "library" and my package as the python wrapper for your library. This is not an unusual way to architect these kinds of packages, and it means I can adjust my package without having to go through an extra step of generating a pull request.
This patch adds code to make it easy to install the
ssw
library and wrapper as a python package.setup.py
handles the work of compiling and installinglibssw.so
along with the wrapper. I also added a function tosrc/ssw_wrap.py
to search forlibssw.so
, which means you won't need to adjust$LD_LIBRARY_PATH
to use your library from python.