jeffdaily / parasail-python

Python bindings for the parasail C library.
Other
87 stars 17 forks source link

get alignment positions #39

Open FlorianBrnrd opened 5 years ago

FlorianBrnrd commented 5 years ago

Hi, I'm performing semi-global alignments and I was wondering if there's any way to directly get the position of the beginning and end of the alignment on the reference ? Or do I have to use the cigar string to get the information ? Thanks a lot for your help. Florian.

jeffdaily commented 5 years ago

The end positions of the alignment for both query and reference is readily available using the Result properties end_query and end_ref because those are calculated during the alignment step. You would need to use the Cigar object to get the beginning positions beg_query and beg_ref because they are calculated during the traceback.

FlorianBrnrd commented 5 years ago

Thank you for the very quick answer ! I managed to get what I wanted thanks to your help, however I seem to have a encountered a strange behavior while looking at the start positions of both the query and reference.

I used this bit of code:

aln = pair_align(SEQ, ref)
cigar = aln.cigar
print(aln.traceback.ref)
print(aln.traceback.comp)
print(aln.traceback.query)
print('\n')
print('query end = {}'.format(aln.end_query))
print('ref end = {}'.format(aln.end_ref))
print('query start = {}'.format(cigar.beg_query))
print('ref start = {}'.format(cigar.beg_ref)) 

And here's what it returns:

Capture d’écran 2019-06-06 à 19 36 57

I wonder why it it returns 0 for the beginning of the reference alignment when I can clearly see the alignment starts around 10 ? Or did I misunderstood something ?

Thank you.

mustafa0x commented 1 year ago

I have this same question — cigar.beg_query and cigar.beg_ref are 0.

cavelandiah commented 1 week ago

Still when using cigar.beg_ref is 0. Here is a partial solution:

import re
import parasail

# Perform the sequence Align
result = parasail.sw_trace_striped_64(read, ref, 3, 2, parasail.dnafull)
# Decode CIGAR using UTF-8
new_cigar = result.cigar.decode.decode('utf-8')
# 'D' since is a deletion when compared to reference
no_align_pattern = r'(\d+)D'
# Search pattern if exists
match = re.search(no_align_pattern, new_cigar)
# If a match is found, extract the number of 'deleted' positions and convert to integer
if match:
    no_align_positions = match.group(1)
    new_start = int(no_align_positions)
else:
   new_start = 0

I would like to have some updates in that regard. Many thanks.