pinellolab / CRISPResso2

Analysis of deep sequencing data for rapid and intuitive interpretation of genome editing experiments
Other
268 stars 92 forks source link

Annotations for alleles not appearing in allele frequency quilt plots #384

Closed JNWorkman closed 6 months ago

JNWorkman commented 6 months ago

Describe the bug Annotations are missing from some plots. Most noticeably, this occurs with allele frequency quilts. Only the colors for each nucleotide are visible, but not the letters. Sometimes a few nucleotide have labels and sometimes none. I've attached an example plot with missing annotation, as well as a example plot which does include annotation to compare to. 2b.Nucleotide_percentage_quilt_around_sgRNA_ATGCTCCATGAACAACCAAA.pdf 9.Alleles_frequency_table_around_sgRNA_ATGCTCCATGAACAACCAAA.pdf

Expected behavior I would expect each nucleotide to be labelled with a letter.

To reproduce Batchfile used is attached. Command is copied below:

CRISPRessoBatch --batch_settings W-Pm1_batchfile.txt

W-Pm1_batchfile.txt

Debug output Paste the entire output when you run CRISPResso with the flag --debug.

kclem commented 6 months ago

Hi @JNWorkman,

We are aware of this issue, but having a hard time tracking it down. It appears to be a bug introduced in the latest version of matplotlib. Could you try downgrading to version 3.7.3 and see if it is resolved?

conda install matplotlib=3.7.3

JNWorkman commented 6 months ago

Hi,

Ah I see. I tried downgrading matplotlib using the command you suggested then reran a crispresso batch analysis. The results are the same, no labels in the allele frequency plot. I've attached the output after including --debug in the command.

thanks, Noah


J. N. Workman PhD Student, Newby Labhttps://newbylab.sites.jhmi.edu/ Human Genetics and Genomics Program

McKusick-Nathans Department of Genetic Medicine

Johns Hopkins University School of Medicine


From: Kendell Clement @.> Sent: Monday, March 4, 2024 3:21 PM To: pinellolab/CRISPResso2 @.> Cc: Noah Workman @.>; Mention @.> Subject: Re: [pinellolab/CRISPResso2] Annotations for alleles not appearing in allele frequency quilt plots (Issue #384)

  External Email - Use Caution

Hi @JNWorkmanhttps://github.com/JNWorkman,

We are aware of this issue, but having a hard time tracking it down. It appears to be a bug introduced in the latest version of matplotlib. Could you try downgrading to version 3.7.3 and see if it is resolved?

conda install matplotlib=3.7.3

— Reply to this email directly, view it on GitHubhttps://github.com/pinellolab/CRISPResso2/issues/384#issuecomment-1977388679, or unsubscribehttps://github.com/notifications/unsubscribe-auth/AZRK4BBOQD74G2D4YQ4EFKLYWTJVZAVCNFSM6AAAAABEFX7Y4SVHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMYTSNZXGM4DQNRXHE. You are receiving this because you were mentioned.Message ID: @.***>

~/crispresso_data/2024.2.28_NW-ST_TET293ts/data/TET/test$ CRISPRessoBatch --batch_settings W-Pm1_batchfile.txt --debug

                                              ~~~CRISPRessoBatch~~~
                        -Analysis of CRISPR/Cas9 outcomes from batch deep sequencing data-

                      _                                                                     _
                     '  )                                                                  '  )
                     .-'                        _________________                          .-'
                    (____                      | __    ___ __    |                        (____
                 C)|     \                     ||__) /\ | /  |__||                     C)|     \
                   \     /                     ||__)/--\| \__|  ||                       \     /
                    \___/                      |_________________|                        \___/

                                           [CRISPResso version 2.2.14]

[Note that starting in version 2.3.0 FLASh and Trimmomatic will be replaced by fastp for read merging and trimming. Accordingly, the --flash_command and --trimmomatic_command parameters will be replaced with --fastp_command. Also, --trimmomatic_options_string will be replaced with --fastp_options_string.

Also in version 2.3.0, when running CRISPRessoPooled in mixed-mode (amplicon file and genome are provided) the default behavior will be as if the --demultiplex_only_at_amplicons parameter is provided. This change means that reads and amplicons do not need to align to the exact locations.] [For support contact @. or @.

INFO @ Mon, 04 Mar 2024 17:05:26: Creating Folder /home/noah/crispresso_data/2024.2.28_NW-ST_TET293ts/data/TET/test/CRISPRessoBatch_on_W-Pm1_batchfile

WARNING @ Mon, 04 Mar 2024 17:05:26: Folder /home/noah/crispresso_data/2024.2.28_NW-ST_TET293ts/data/TET/test/CRISPRessoBatch_on_W-Pm1_batchfile already exists.

INFO @ Mon, 04 Mar 2024 17:05:26: Running CRISPResso with 1 processes

                                                ~~~CRISPResso 2~~~
                         -Analysis of genome editing outcomes from deep sequencing data-

                                                         _
                                                        '  )
                                                        .-'
                                                       (____
                                                    C)|     \
                                                      \     /
                                                       \___/

                                           [CRISPResso version 2.2.14]

[Note that starting in version 2.3.0 FLASh and Trimmomatic will be replaced by fastp for read merging and trimming. Accordingly, the --flash_command and --trimmomatic_command parameters will be replaced with --fastp_command. Also, --trimmomatic_options_string will be replaced with --fastp_options_string.

Also in version 2.3.0, when running CRISPRessoPooled in mixed-mode (amplicon file and genome are provided) the default behavior will be as if the --demultiplex_only_at_amplicons parameter is provided. This change means that reads and amplicons do not need to align to the exact locations.] [For support contact @. or @.

INFO @ Mon, 04 Mar 2024 17:05:26: Creating Folder /home/noah/crispresso_data/2024.2.28_NW-ST_TET293ts/data/TET/test/CRISPRessoBatch_on_W-Pm1_batchfile/CRISPResso_on_test

INFO @ Mon, 04 Mar 2024 17:05:26: Computing quantification windows

INFO @ Mon, 04 Mar 2024 17:05:26: Filtering reads with average bp quality < 30 and single bp quality < 0 and replacing bases with quality < 0 with N ...

Completed in 5 seconds

INFO @ Mon, 04 Mar 2024 17:05:32: Aligning sequences...

INFO @ Mon, 04 Mar 2024 17:05:32: Processing reads; N_TOT_READS: 0 N_COMPUTED_ALN: 0 N_CACHED_ALN: 0 N_COMPUTED_NOTALN: 0 N_CACHED_NOTALN: 0

INFO @ Mon, 04 Mar 2024 17:05:35: Processing reads; N_TOT_READS: 10000 N_COMPUTED_ALN: 7613 N_CACHED_ALN: 2334 N_COMPUTED_NOTALN: 49 N_CACHED_NOTALN: 4

INFO @ Mon, 04 Mar 2024 17:05:37: Processing reads; N_TOT_READS: 20000 N_COMPUTED_ALN: 11390 N_CACHED_ALN: 8513 N_COMPUTED_NOTALN: 84 N_CACHED_NOTALN: 13

INFO @ Mon, 04 Mar 2024 17:05:38: Processing reads; N_TOT_READS: 30000 N_COMPUTED_ALN: 13771 N_CACHED_ALN: 16079 N_COMPUTED_NOTALN: 116 N_CACHED_NOTALN: 34

INFO @ Mon, 04 Mar 2024 17:05:38: Finished reads; N_TOT_READS: 31970 N_COMPUTED_ALN: 14229 N_CACHED_ALN: 17582 N_COMPUTED_NOTALN: 119 N_CACHED_NOTALN: 40

INFO @ Mon, 04 Mar 2024 17:05:38: Done!

INFO @ Mon, 04 Mar 2024 17:05:38: Quantifying indels/substitutions...

INFO @ Mon, 04 Mar 2024 17:05:39: Done!

INFO @ Mon, 04 Mar 2024 17:05:39: Calculating allele frequencies...

INFO @ Mon, 04 Mar 2024 17:05:39: Done!

INFO @ Mon, 04 Mar 2024 17:05:39: Saving processed data...

INFO @ Mon, 04 Mar 2024 17:05:40: Making Plots...

DEBUG @ Mon, 04 Mar 2024 17:05:40: Plotting read bar plot

DEBUG @ Mon, 04 Mar 2024 17:05:40: Plotting read class pie chart and bar plot

INFO @ Mon, 04 Mar 2024 17:05:40: Begin processing plots for amplicon Reference

DEBUG @ Mon, 04 Mar 2024 17:05:40: Plotting nucleotide quilt across amplicon

/home/noah/anaconda3/envs/crispresso2_env/lib/python3.10/site-packages/CRISPResso2/CRISPRessoPlot.py:188: FutureWarning: Series.getitem treating keys as positions is deprecated. In a future version, integer keys will always be treated as labels (consistent with DataFrame behavior). To access a value by position, use ser.iloc[pos] ins_pct = float(mod_pct_df_indexed.loc[sampleName,'Insertions_Left'][pos_ind-2]) DEBUG @ Mon, 04 Mar 2024 17:05:41: Plotting nucleotide distribuition around sgRNA ATGCTCCATGAACAACCAAA for Reference

/home/noah/anaconda3/envs/crispresso2_env/lib/python3.10/site-packages/CRISPResso2/CRISPRessoPlot.py:188: FutureWarning: Series.getitem treating keys as positions is deprecated. In a future version, integer keys will always be treated as labels (consistent with DataFrame behavior). To access a value by position, use ser.iloc[pos] ins_pct = float(mod_pct_df_indexed.loc[sampleName,'Insertions_Left'][pos_ind-2]) DEBUG @ Mon, 04 Mar 2024 17:05:42: Plotting indel size distribution for Reference

DEBUG @ Mon, 04 Mar 2024 17:05:42: Plotting frequency deletions/insertions for Reference

DEBUG @ Mon, 04 Mar 2024 17:05:43: Plotting amplication modifications for Reference

DEBUG @ Mon, 04 Mar 2024 17:05:43: Plotting modification frequency for Reference

DEBUG @ Mon, 04 Mar 2024 17:05:43: Plotting quantification window locations for Reference

DEBUG @ Mon, 04 Mar 2024 17:05:44: Plotting position dependent indel for Reference

DEBUG @ Mon, 04 Mar 2024 17:05:44: Plotting allele distribution around cut for Reference

INFO @ Mon, 04 Mar 2024 17:05:45: Done!

INFO @ Mon, 04 Mar 2024 17:05:45: Done!

INFO @ Mon, 04 Mar 2024 17:05:45: Removing Intermediate files...

INFO @ Mon, 04 Mar 2024 17:05:45: Analysis Complete!

                                    _
                                   '  )
                                   .-'
                                  (____
                               C)|     \
                                 \     /
                                  \___/

INFO @ Mon, 04 Mar 2024 17:05:45: Completed 1/1 runs

INFO @ Mon, 04 Mar 2024 17:05:45: Finished all batches

INFO @ Mon, 04 Mar 2024 17:05:45: Reporting summary for amplicon: "Reference"

INFO @ Mon, 04 Mar 2024 17:05:45: All guides are equal. Performing comparison of batches for amplicon 'Reference'

DEBUG @ Mon, 04 Mar 2024 17:05:45: Plotting nucleotide percentage quilt for amplicon Reference, sgRNA ATGCTCCATGAACAACCAAA

/home/noah/anaconda3/envs/crispresso2_env/lib/python3.10/site-packages/CRISPResso2/CRISPRessoPlot.py:188: FutureWarning: Series.getitem treating keys as positions is deprecated. In a future version, integer keys will always be treated as labels (consistent with DataFrame behavior). To access a value by position, use ser.iloc[pos] ins_pct = float(mod_pct_df_indexed.loc[sampleName,'Insertions_Left'][pos_ind-2]) DEBUG @ Mon, 04 Mar 2024 17:05:45: Plotting nucleotide quilt for Reference

/home/noah/anaconda3/envs/crispresso2_env/lib/python3.10/site-packages/CRISPResso2/CRISPRessoPlot.py:188: FutureWarning: Series.getitem treating keys as positions is deprecated. In a future version, integer keys will always be treated as labels (consistent with DataFrame behavior). To access a value by position, use ser.iloc[pos] ins_pct = float(mod_pct_df_indexed.loc[sampleName,'Insertions_Left'][pos_ind-2]) DEBUG @ Mon, 04 Mar 2024 17:05:47: Plotting allele modification heatmap for Reference

DEBUG @ Mon, 04 Mar 2024 17:05:47: Plotting allele modification line plot for Reference

DEBUG @ Mon, 04 Mar 2024 17:05:47: Plotting allele modification heatmap for Reference

DEBUG @ Mon, 04 Mar 2024 17:05:47: Plotting allele modification line plot for Reference

DEBUG @ Mon, 04 Mar 2024 17:05:47: Plotting allele modification heatmap for Reference

DEBUG @ Mon, 04 Mar 2024 17:05:47: Plotting allele modification line plot for Reference

INFO @ Mon, 04 Mar 2024 17:05:47: Analysis Complete!

                                    _
                                   '  )
                                   .-'
                                  (____
                               C)|     \
                                 \     /
                                  \___/
kclem commented 6 months ago

sorry, I don't see the output - downgrading matplotlib usually fixes the issue. Was the output not overwritten? Could you try running it in a new location and seeing if the output is fixed?

JNWorkman commented 6 months ago

Sorry about that. I realized that I downgraded matplotlib in the wrong conda environment. Downgrading in the crispresso environment did the trick. Thanks for the help!

For anyone having the same error in the future, conda install matplotlib=3.7.3 did not work for me as it couldn't solve the environment. Instead pip install matplotlib==3.7.3 worked.

Colelyman commented 6 months ago

Glad to hear that it is working @JNWorkman! Just an update for you (and #345), we were able to figure out a fix for the newer matplotlib versions, but are just working on making sure the fix works for both versions.

We will keep the issue updated when the fix is merged in!