pritykinlab / guidescan-cli

A gRNA database generation tool.
http://www.guidescan.com
13 stars 5 forks source link

specificity values > 1 for certain kmers #22

Open vineetbansal opened 1 year ago

vineetbansal commented 1 year ago

When using the following kmer file:

id,sequence,pam,chromosome,position,sense
AAGACTGTGCGCTAATCTCT_1,AAGACTGTGCGCTAATCTCT,NGG,unknown,0,+

with guidescan enumerate (version v2.1.6) against hg38_noalt.index (mismatches 3 and alt-pam NAG), we get the following sam line:

AAGACTGTGCGCTAATCTCT_1  0   unknown 0   100 23M *   0   0   AAGACTGTGCGCTAATCTCTNGG *   k0:i:1  k1:i:0  k2:i:0  k3:i:3  of:H:c53bf70d00000000000000000000000092ef3a47ffffffff010000000000000092ef3a47ffffffff020000000000000092ef3a47ffffffff9ba7545c000000009c2c699e000000002afe3c2000000000030000000000000092ef3a47ffffffff   sp:f:2.391802

or the following csv lines (succinct mode):

id,sequence,match_chrm,match_position,match_strand,match_distance,specificity
AAGACTGTGCGCTAATCTCT_1,AAGACTGTGCGCTAATCTCTNGG,chr1,234306480,+,0,2.391802
AAGACTGTGCGCTAATCTCT_1,AAGACTGTGCGCTAATCTCTNGG,chr9,12562870,+,3,2.391802
AAGACTGTGCGCTAATCTCT_1,AAGACTGTGCGCTAATCTCTNGG,chr19,3281519,+,3,2.391802
AAGACTGTGCGCTAATCTCT_1,AAGACTGTGCGCTAATCTCTNGG,chr3,49718166,+,3,2.391802

There's clearly something wrong here since specificity is reported > 1.

vineetbansal commented 1 year ago

The actual sequence found in the fna using grep is AAGACTGTGCGCTAATCTCTTAG (i.e. with the alt-pam), indicating that the match reported in both the csv/sam cases is incorrect (the NGG was automatically added). All such detected cases of specificity > 1 seem to be with matches that have the NAG PAM.