pinellolab / CRISPResso2

Analysis of deep sequencing data for rapid and intuitive interpretation of genome editing experiments
Other
260 stars 91 forks source link

Join mutations type to Alleles frequency table, adjacent sgRNAsworkflow #39

Closed G-Thomson closed 4 years ago

G-Thomson commented 4 years ago

Hi

I have three questions about features I cannot see in the documentation but wonder if they exist:

1) When a coding sequence is provided the Frameshift_analysis.txt gives a summary of the number of modified reads which give either frameshift, in-frame or non-coding mutations. Is it possible to get allele specific mutation calls from this analysis join it with the table in Assign Alleles_frequency_table_around_sgRNA_NNNNN.txt ?

2) What is your recommended workflow when there are two adjacent sgRNAs on a single amplicon. I realize that one could increase the quantification_window_size to encompass both amplicons but it would be nice to compare the editing of one vs the second and the level of duel edits. I guess one could perform each separably and compare but I imagine one would need to retain read IDs. Is this possible?

3) Intuitively I feel that doing an analysis without providing a sgRNA should provide the same results as when I do provide one and set the quantification_window_size to 0. However when I do I get very slight differences which I think is due to very small differences in the number of reads aligning (24478 vs 24474 using the example NHEJ test datasets). I've done each one twice and I get the same thing. Why does providing a sgRNA slightly alter the read alignment?

Many thanks for a great tool!

amarcog commented 4 years ago

Hello @G-Thomson,

Regarding your first question maybe this repository could help you:

https://github.com/amarcog/CRISPResso2parser.py

Specifically "CRISPResso2parser_clonal.py" joints data from files: "CRISPResso_mapping_statistics.txt", "Frameshift_analysis.txt" and "Alleles_frequency_table_around_sgRNA_NNNNN.txt". Furthermore, this script screens and genotypes gene edited clones based on both mutation rates and allele frequencies.

I hope you find it useful!

kclem commented 4 years ago

For 2: I've run into this before, but it's hard to come up with a way to do this without a lot of customization for the specific experimental design. In the last commit, 4e1d6b2b3424e725010e3e1a13522a7386228853, I added a param --write_detailed_allele_table flag to basically output all the information for each allele/read, including the indices of insertions and deletions (columns "insertion_positions" and "deletion_positions"). Your aims could be accomplished by parsing through this file and checking to see if the indices of insertions and positions overlap one or both sgRNA cut sites.

I would leave the 1bp default quantification window setting rather than trying to extend it between the two guides.

3: CRISPResso prefers to align indels around predicted cut sites, so when you provide an sgRNA, it prefers to align them to the predicted cut site for that sgRNA (vs perhaps at the end of the read outside of the quantification window if no sgRNA is given). If you don't provide a guide, the quantification is set to the whole amplicon (minus the --exclude_bp_from_left and right) so it will quantify any sequencing errors as well. You can check out your quantification window in the plots 2a to see how the addition of the sgRNA affects the quantification window settings.