pinellolab / CRISPResso2

Analysis of deep sequencing data for rapid and intuitive interpretation of genome editing experiments
Other
274 stars 95 forks source link

Editing quantification with multiple sgRNAs #499

Open jsromanowski opened 1 day ago

jsromanowski commented 1 day ago

Hello CRISPResso Team -

I am currently using the CRISPRessoWGS feature to quantify editing of CRISPR treated samples with 2 sgRNAs. These cut sites are close in proximity - 4 bases apart to be precise. I noticed when I input both sgRNAs (sgRNA1 and sgRNA2), CRISPRessoWGS is quantifying editing at both sgRNA sites for an individual run where I'd only like editing at one sgRNA's site quantified (i.e. 'CRISPResso_on_sgRNA1' analysis includes alleles that have deletions from sgRNA2's cut site, as shown on allele output table). My guess is because those insertions/deletions from sgRNA2's editing extends into the quantification window of sgRNA1 and is causing CRISPRessoWGS to count them, but I don't think I can shrink this window since the default window of analysis is 1 base on either side of the sgRNA cut site (correct me if I'm wrong). This becomes an issue since the insertions and deletions from sgRNA2's edits do not overlap the sgRNA1 cut site, so even when sgRNA1's cut site is specified, sgRNA2's cut site is also analyzed and therefore miscounts sgRNA1's modified reads by over-quantifying editing events.

This brings me to my question - is there a way to ensure only one sgRNA cut site's editing is quantified at a time, instead of both sgRNA's cut sites? Or better yet, is there a way CRISPRessoWGS can quantify editing at both sgRNA1 and sgRNA2's cut sites without double-counting editing events? (i.e. sgRNA1's edited alleles are not counted again in sgRNA2's editing analysis). Perhaps extending the quantification window to encompass both cut sites might work? I hope this makes sense.

Great work on this package, by the way! Any help would be appreciated.

Best,

Joe

kclem commented 1 day ago

Hi @jsromanowski,

Thanks for using CRISPResso, and I hope I can clear up some confusion here.

  1. How CRISPResso Works (or at least the parts that are relevant to your question).

The quantification window (the bases that where if edits are present the read will be considered 'modified') are set early in the pipeline based on sgRNA positions and the user-specific parameters for quantification window size and offset.

After the quantification window is set, reads are aligned to the reference amplicon, and reads with edits in any of the quantification window bases are set to be 'modified' - that is, the edits in a read aren't assigned to a specific guide, instead, the presence of an edit is noted in the quantification window. The benefit of this is that there is no double-counting of reads by whether they were edited at 1 or 2 sgRNA target sites. Instead, the program reports the number of reads edited at any base in the quantification window.

A the end of the analysis, plots and allele tables are produced, some for the whole amplicon and others that are zoomed in on each cut site. However, all reads aligned to the amplicon contribute to each plot, so the reads (and corresponding mutations) that appear in sgRNA1 plots will also appear in the sgRNA2 plot if they are in the same window. There may be slight variation in the sgRNA1 vs gRNA2 plot because the plots show different bases in their plotting window, and alleles with the same sequence within the plotting window are collapsed. Note that the sgRNA1 plot doesn't only contain reads that were edited by sgRNA1 and the sgRNA2 plot doesn't contain only reads edited by sgRNA2.

  1. The Difficulties Associated With Doing What You Want To Do (at least as I understand it). Especially if your sgRNAs are close together, it's hard to tell which modifications arise from each guide - particularly long deletions. For a single sgRNA, we have seen that long deletions are not really 'centered' at the sgRNA cut site, meaning that they could extend left or right from the predicted cut site in a pretty unpredictable way. Because of this, if we see a deletion that spans the cut site of sgRNA1 and sgRNA2 it is impossible to tell which sgRNA to 'assign' it to. Does that make sense? If you have some better way to assign mutations to specific guides, I'm happy to talk about it.

  2. Some workarounds

    • You may consider looking at the CRISPResso output Modification_count_vectors.txt' file at your cut sites. This will show you the number of insertions, deletions, and substitutions that overlap with that site. Note that this may include some deletions that could have arisen from editing by the neighboring guide, so be careful of double counting.

Do any of those seem like they would help you with this problem? If not, feel free to reach out to me at k.clement@utah.edu if you'd like to talk about this more or come up with better methods to assign edits to single guides.

Good luck!