pinellolab / CRISPResso2

Analysis of deep sequencing data for rapid and intuitive interpretation of genome editing experiments
Other
272 stars 94 forks source link

Alignment strand question #65

Open RebeccaFine opened 3 years ago

RebeccaFine commented 3 years ago

Thanks again for this great tool (creating a separate issue for a very different question)!

I have a question about alignment behavior in CRISPResso. My understanding was that the software was agnostic to the strand of the reference amplicon. Thus, for guides on the positive strand, I provided the guide on the positive strand and the amplicon on the positive strand, and for guides on the negative strand, I provided the guide on the negative strand and the amplicon also on the positive strand.

However, I was recently doing some testing and discovered that for guides on the negative strand, providing the amplicon on the negative strand produced some slightly different results, especially in the count of substitutions. It looks to me like this is probably just a result of semi-ambiguous alignments aligning slightly differently depending on the amplicon orientation. But it does beg the question -- which is the more "correct" way to perform the alignment? Do you have any insight on this (and/or am I misunderstanding anything?)

Thanks so much!

kclem commented 3 years ago

Thanks for using CRISPResso2.

The guide should always be given as the sgRNA sequence (5'-3' - e.g. with Cas9 it should the PAM should be on the right (but don't include the PAM). CRISPResso2 uses this orientation of the guide to set the quantification window (e.g. for Cas9 to the -3 position, right side).

If you give the guide in the opposite direction, the quantification window would be set at the +3 position (left side of the guide) which would quantify different mutation events.

You are correct that you can provide the amplicon sequence in either the forward or reverse-complement direction, but the guide can only be provided in one direction.

If you're still getting differences in quantification depending on the direction of the amplicon, let me know and I can look into it some more.

RebeccaFine commented 3 years ago

Thanks for the reply!

That is in fact how I've been running my samples, and I'm seeing this slight difference. I was able to replicate this with your example data, nhej.r1.fastq.gz/nhej.r2.fastq.gz. I've put the commands I used below -- you can see that the only difference is that I reverse complemented the amplicon (but the guide is given in the same direction, 5' to 3', in both cases). In the lego plots (which I've screenshotted the top of below), you can see that the counts of the reads don't quite match up (and actually that the alignment itself is slightly different in some cases). Please let me know if I'm misunderstanding anything here, or if you can't replicate this! Thanks!

Command 1: CRISPResso \ --fastq_r1 nhej.r1.fastq.gz \ --fastq_r2 nhej.r2.fastq.gz \ --amplicon_seq AATGTCCCCCAATGGGAAGTTCATCTGGCACTGCCCACAGGTGAGGAGGTCATGATCCCCTTCTGGAGCTCCCAACGGGCCGTGGTCTGGTTCATCATCTGTAAGAATGGCTTCAAGAGGCTCGGCTGTGGTT -n nhej -g TGAACCAGACCACGGCCCGT \ --output_folder pos_strand/

Command 2: CRISPResso \ --fastq_r1 nhej.r1.fastq.gz \ --fastq_r2 nhej.r2.fastq.gz \ --amplicon_seq AACCACAGCCGAGCCTCTTGAAGCCATTCTTACAGATGATGAACCAGACCACGGCCCGTTGGGAGCTCCAGAAGGGGATCATGACCTCCTCACCTGTGGGCAGTGCCAGATGAACTTCCCATTGGGGGACATT \ -n nhej \ -g TGAACCAGACCACGGCCCGT \ --output_folder neg_strand/

image image

kclem commented 3 years ago

Reopening this so I don't forget about it. This is somewhat mysterious.