pinellolab / CRISPResso2

Analysis of deep sequencing data for rapid and intuitive interpretation of genome editing experiments
Other
267 stars 92 forks source link

Using flexiguides with CRISPResso2 pooled #74

Open MyriamShafie opened 3 years ago

MyriamShafie commented 3 years ago

Hello,

We are using CRISPResso2 pooled to map reads to multiple amplicons and it's working very nicely as long as the guide matches the amplicon perfectly. However, we also have cases where a guide is an imperfect match to an amplicon. There is an option in the regular CRISPResso2 to specify flexible guide sequences: can this option be used with CRISPResso2 pooled? If it is already possible, I couldn't figure out how. If not, it would be a really useful feature.

Thanks a lot!

kclem commented 3 years ago

Hi @MyriamShafie

Yes, we've thought about this but it turns out it is a somewhat hard problem -- how many mismatches do you allow between the guide and the off-target, and what if there are multiple best matches.

The pooled experiments we run include on- and off-targets, and the off-targets are computationally predicted. In the computational prediction part, the guide sequence at the off-target is produced, and this information can be directly fed into CRISPRessoPooled.

Is your use case one in which the guide sequences at off-targets are not known? Or is it an issue of converting the output of some computational prediction to CRISPRessoPooled input?

MyriamShafie commented 3 years ago

Hello @kclem,

Thanks for your answer. I'm not sure I completely understand your post but I'll try my best to explain the situation. We are working on wheat, which has three subgenomes. So, when we use a guide to target all three subgenomes at once, it often happens that a guide perfectly matches one or two of the subgenomes and has a couple of mismatches to the third. So the guide sequence is perfectly known, the three amplicon sequences (representing each subgenome) are also known and no cuts can really be called "off-target": it's just that we have multiple targets that we can't hit perfectly all at once. Hence our interest in using CRISPRessoPooled with guides that don't perfectly match an amplicon.

matandro commented 1 month ago

I would like to join this post. A few hardcoded thing about the flexiguide rules are an issue here:

  1. The limit for 2 gaps in the alignment. see here. I think that should be user define and let users handle the consequences of larger numbers
  2. distance is calculated by mismatches. internally we mark gaps as mismatches. e.g. spacer is 10 bases, matched 10 with 2 internal gaps -> 10/12 homology. I understand this can be a contentious issue thought.
  3. Alignment rules are the same as general alignment. Given the difference in guide alignment I would change the matrix and gap rules to the limits of guide matching. Using the matric described at 2 allows us to better describe the correct way to identify the guides.

We had also had a problematic example where 3 MM guide was not called because the initial alignment had a "better" one. The "better" one had 1 MM but and open gap and an extend of 54 position with the costs being default of CRISPRessoPooled (default matrix, -20 open, -2 extend).

Again, an approach that would used a user input limited gap and consider that in the alignment algorithm would not even consider that alignment. Since we are talking about a piece of code that runs at most N * M times where N is the number of flexiguides and M is the number of possible amplicons I think applying a different alignment method, even a less efficient one, should be preferrable.

matandro commented 1 month ago

Another solution is to be able to specify the "spacer" (on/off-target) that is applied for each amplicon in the input file and which will allow us to control the calculations