htgt / CRISPR-Analyser

C++ package for analysing CRISPR off targets
MIT License
21 stars 8 forks source link

how to use pam_right? #3

Open coronin opened 10 years ago

coronin commented 10 years ago

I notice pam_right in the psql database, as well as in some cpp files. How could we use it?

pam_right=2 assumes to not use

Does pam_right 0/1 mean NGG or NAG?

Thanks!

ah19 commented 10 years ago

We use pam_right to identify the orientation of the CRISPR (only referring to NGG PAMs but it can apply to both). The code to find CRISPRs searches on the positive strand for CCN and NGG -- if we find NGG on the top strand we say that CRISPR is pam_right as the PAM is on the right hand end of the sequence. If we find a CRISPR starting CCN we label it as pam_right = 0. We only have pam_right as 0/1 (false/true), if you want to distinguish between NAG and NGG I'd recommend adding a new boolean field to do so.

One other thing -- if we find a CRISPR sequence that begins with CCN and ends with NGG like so: CCGCCAGCTTTCTGGTGTAAAGG

we add that into the database twice, once with pam_right = 0 and once with pam_right = 1, as there are two guide RNAs within that sequence:

PAM site Guide RNA pam_right
CCG CCAGCTTTCTGGTGTAAAGG 0
AGG CCGCCAGCTTTCTGGTGTAA 1

We need this because CCN isn't actually a valid PAM site, so it tells us that the actual NGG crispr sequence is on the global negative strand and to get the guide RNA we need to first reverse complement the sequence when ordering. We chose to store them this way instead of storing them reverse complemented as it makes finding pairs easier