pinellolab / CRISPResso2

Analysis of deep sequencing data for rapid and intuitive interpretation of genome editing experiments
Other
256 stars 91 forks source link

High % of "AMBIGUOUS" reads – what are they? #446

Closed francoiskroll closed 2 weeks ago

francoiskroll commented 2 weeks ago

I am analysing prime-edited samples. I often get a high % of "AMBIGUOUS" reads in my samples, up to 30%. Looking at the _Alleles_frequencytable, the alignments often seem OK to me.

How can I know why CRISPResso2 labelled a specific as ambiguous? Am I right to think it labels the read as ambiguous if a deletion has removed the prime-editing site? If yes, this seems dangerous to me... I originally assumed they were dodgy alignments so excluded them, while it is in fact crucial that I count them as reads with an unwanted deletion!

For example, this alignment is labelled AMBIGUOUS:

GTACAGTCTGGTGTGGCTCATAAGCCCCATTTTGGGTTTTATCCTACAGCCCGTCATCGGCTCGGCGAGCGACTACTGTAGGTCGTCATAAGGCCGAAGGAGACCGTACATACTCTTACTGGGGATTCTGATGTTAGTGGGCATGACTTTATTTCTAAATGGAGATGCAGTCACAACAGGTGGGTGA
GTACAGTCTGGTGTGGCTCATAAGCCCCATTTTGGGTTTTATCCTACAGCCCGTCATCGGCTCGGCGAGCGACTACTGTAGGTCG-----------AAGGAGACCGTACATACTCTTACTGGGGATTCTGATGTTAGTGGGCATGACTTTATTTCTAAATGGAGATGCAGTCACAACAGGTGGGTGA
                                                                                         ^^

^^ are the two prime-edited nucleotides.

kclem commented 2 weeks ago

Hi @francoiskroll,

Yes, Ambiguous reads align to multiple sequences with the same score. In this case, because the prime edited bases have been deleted, it aligns to the prime-edited and unedited/reference sequence with the same score, so CRISPResso can't assign it to a single amplicon uniquely.

Yes, these reads should be considered, especially if you're getting 30% ambiguous reads. Other groups have used the --discard_indel_reads flag and counted anything with an indel as failed prime editing. However, I'd suggest using the --assign_ambiguous_alignments_to_first_reference flag. This will assign ambiguous reads to the reference amplicon and they will appear as 'modified' there, reflecting a failed prime editing event.

Here are a few things I do to analyze ambiguous reads: https://github.com/pinellolab/CRISPResso2/discussions/267

If you think there's a more intuitive way to analyze ambiguous alignments, I'm happy to discuss it. Feel free to comment here or reach out to me at k.clement@utah.edu.

francoiskroll commented 1 week ago

Hi – Thanks a lot for the explanation. Your solution makes sense. If you want my opinion, I would modestly suggest that something like --assign_ambiguous_alignments_to_first_reference be the default when a deletion removed the site (although FYI I have not tested). If a deletion removed the prime-editing site, it is not "ambiguous" whether prime-editing worked or not, it is failed for sure. I understand "ambiguous" refers here to which reference it should align to, but I doubt that is intuitive (it was not for me).

kclem commented 1 week ago

Thanks - this is good feedback.

Let me know when you try the --assign_ambiguous_alignments_to_first_reference parameter and if it produces the output as expected.