pinellolab / CRISPResso2

Analysis of deep sequencing data for rapid and intuitive interpretation of genome editing experiments
Other
271 stars 94 forks source link

match dsODN sequences #201

Open YichaoOU opened 2 years ago

YichaoOU commented 2 years ago

Is your feature request related to a problem? Please describe. Current dsODN sequence matching doesn't allow mismatches df_alleles["Aligned_Sequence"].str.find(args.dsODN) > 0

Describe the solution you'd like we could just do a from skbio.alignment import StripedSmithWaterman to match dsODN.

Yichao

kclem commented 2 years ago

Would you also allow gaps? And how many mismatches?

YichaoOU commented 2 years ago

For the 34bp dsODN sequence, we use a score cutoff of 30, which could potentially allow up to 7 mismatches or a short gap.

for StripedSmithWaterman, a match is +2 and a mismatch is -3.

<html xmlns:v="urn:schemas-microsoft-com:vml" xmlns:o="urn:schemas-microsoft-com:office:office" xmlns:x="urn:schemas-microsoft-com:office:excel" xmlns="http://www.w3.org/TR/REC-html40">

#Match | #Mis-Match | Score -- | -- | -- 34 | 0 | 68 33 | 1 | 63 32 | 2 | 58 31 | 3 | 53 30 | 4 | 48 29 | 5 | 43 28 | 6 | 38 27 | 7 | 33 26 | 8 | 28 25 | 9 | 23