pinellolab / CRISPResso2

Analysis of deep sequencing data for rapid and intuitive interpretation of genome editing experiments
Other
268 stars 92 forks source link

[Curiousity] primer dimer removal #305

Closed yinshiyi closed 1 year ago

yinshiyi commented 1 year ago

Hi experts,

I recently retrospectively find out my amplicon has a lot of primer dimers [10-20%]. But I never had any problem with crispresso (awesome), my WT sample is always <1% edited, this program seems to have a intrinsic primer dimer removal step (either intentionally or unintentionally).

In the case of primer dimer, a not-so-awesome program would have treated them as deep deletions of 300bp. But I am curious if anyone else has similar observations or some theory on which part of the crispresso removes primer dimers.

I did not see any primer dimer mentioned in the readme/manual file. I figure it is probably is a unintentionally design/functions. Just want to double check in case there is actually a primer-dimer feature flag.

Thank you!

kclem commented 1 year ago

Hi @yinshiyi,

Thanks for using CRISPResso, and I'm glad it's working out for you.

CRISPResso features a filtering step that removes reads with less than 60% homology to the reference sequence. This means that reads must match at least 60% of bases in the reference sequence exactly.

This was implemented to reduce noisy reads from 1) primer dimers and 2) mispriming events. If your reference amplicon is 100bp and your primers are 20bp, primer dimers would only match about 40/100 = 40%. However, small deletions and even large deletions caused by CRISPR nucleases up to 40bp long would pass the filter (60/100bp match = 60%) and be included in the quantification.

The 60% default parameter can be changed by setting the parameter --default_min_aln_score for all amplicons, or it can be set on a per-amplicon basis using the parameter --amplicon_min_alignment_score. For example, if you want to include all reads you can set --default_min_aln_score 0, which may be useful if you have done some sort of biochemical or computational filtering of misprimed or primer-dimer reads before CRISPResso analysis.

Reads that do not pass the minimum alignment score will show up in plot 1a as a drop between the 'Reads after preprocessing' step and the 'Reads aligned' step.

Let me know if that makes sense or if you have any other questions!

Happy analyzing!

yinshiyi commented 1 year ago

@kclem could you help me clarify on the % and base

CRISPResso features a filtering step that removes reads with less than 60% homology to the reference sequence. This means that reads must match at least 60 bases in the reference sequence exactly.

By the looks of the context, it seems to be 60%, not 60bp, please let me know if I misinterpret your intention.

https://github.com/pinellolab/CRISPResso2/blob/83c8ab8f462e7d8c1d04c08c1a398b874f517251/CRISPResso2/CRISPRessoCORE.py#L203

In my case, it would be 350bp reference, 310 deep deletion if primer dimers, 11% < 60%. Great. Does it mean, The maximum deep deletion caused by Cas9 I can capture is 140bp (350*0.4)? In some cases, I do have multiple guide design that would drop out a large region within the amplicon region. In that case it might be wise to lower the --amplicon_min_alignment_score flag to not miss any false-negative?

kclem commented 1 year ago

Right - 60%. I've updated the comment above to clear that up.

kclem commented 1 year ago

And right - with 350bp you'll be able to capture deletions up to 140bp - however, you should consider that these shorter fragments will be preferentially amplified and this may skew quantification.

If you would like to look for larger deletions, you could run it with --default_min_aln_score set to 0, then compare the results against the results with the default parameters.