Position and Cluster_Position

dsantesmasses commented 1 year ago

Hi!

Thanks for the quick reply on the other issues! I managed to run it successfully on chr22.

I am inspecting output.bestMerge.txt. What does Position correspond to? and what is the difference with Cluster_Position? How can I get start and end coordinates of the sequence in DNA?

Thanks!

ManuelTgn commented 1 year ago

Hi @dsantesmasses,

the position field corresponds to the position of the first nucleotide (nt) of each sequence reported as a target. Below I'll give you two brief examples on how to interpret the values in position:

Let us assume that the PAM sequence occurs downstream with respect to the guide (e.g. Cas9). If the reported target has been found on the + strand, position represents the first target's nt. If the reported target has been found on the - strand, position represents the first target's nt, but reverse complemented.
Let us assume that the PAM sequence occurs upstream, with respect to the guide (e.g. Cas12). position still represents the first target's nt, but the reverse complement is computed for targets found on the + strand.

Note that position accounts for bulges.

Cluster_position represents the position of the first PAM's nt minus the guide length, without accounting for bulges. In other words, it identifies all the genome sequences sharing the same exact PAM sequence (without accounting for bulges).

However, we suggest to use position to get the real target position across the genome sequence.

CRISPRme does not report the targets' end coordinates, but they can be computed by adding the length of the complete sequence guide + PAM, to the corresponding value in position.

Let us know if you have any further question.

Manuel

dsantesmasses commented 1 year ago

Hi @ManuelTgn , thanks very much for your reply!

Just to make sure I got it right, in the case of Cas9, position corresponds to the genomic coordinate aligned to the first nt of the guide (see below). Therefore if the alignment is on the top strand, position points to the start coordinate (lowest number) whereas if the alignment is on the reverse strand, position is the end coordinate (highest number), is that correct?

position in + strand
NNNNNNNNNNNPAM
^

position in - strand
PAMNNNNNNNNNNN
             ^

Thanks!

samuelecancellieri commented 1 year ago

Hello @dsantesmasses

Position corresponds always to the position of the first nucleotide in 5'. So if you use a downstream PAM as spcas9, you will have something like,

NNNNNNNPAM P

For both - and + strand, since the software applies reverse complement to negative stranded targets.

If you use an upstream PAM as cas12,

you will have something like this,

PAMNNNNNNN P

Since in this case, the software applies reverse complement to positive stranded targets. But the position is always the first nucleotide of the 5'-3' sequence.

Hope this helps and if you have any other question don't hesitate to ask.

Best, Samuele

pinellolab / CRISPRme

Position and Cluster_Position #30