pinellolab / CRISPResso2

Analysis of deep sequencing data for rapid and intuitive interpretation of genome editing experiments
Other
256 stars 91 forks source link

Alleles frequency table plot not generated #405

Closed JNWorkman closed 2 months ago

JNWorkman commented 3 months ago

Describe the bug There is no png or pdf file generated for the allele frequency table. I do get the .zip archive containing the allele frequency table in .txt format, but no graphic. Usually crispresso generates a file called "9.Alleles_frequency_table_aroundsgRNA[sequence].pdf" that contains a quilt visualization of the allele frequency table; this is missing.

I have not updated or changed anything since last running crispresso except downgrading matlib to address the missing annotations from this plot.

Expected behavior Usually crispresso generates a file called "9.Alleles_frequency_table_aroundsgRNA[sequence].pdf" that contains a quilt visualization of the allele frequency table.

To reproduce CRISPRessoBatch --batch_settings W-ko_batchfile.txt

example line from batchfile: r1 g a n w wc q TET-NG-W1291X-579-1_S13_L001_R1_001.fastq.gz ttctgtgtgtggttatgccacagcttaatacagagttagattagacttcttttcaaactcattttgcatatagacacctataatatcagctgcacagcctatataatgctatccatagcaatgaatttggtcttttgatttttcaggagaacttgcgcctgtcaggggctggatccagaaacctgtggtgcctccttctcttttggttgttcatggagcatgtactacaatggatgtaagtttgccagaagcaagatcccaaggaagtttaagctgcttggggatgac TET-NG-W1291X-579-1 20 -10 30

Debug output Paste the entire output when you run CRISPResso with the flag --debug.

 ~~~CRISPRessoBatch~~~
             -Analysis of CRISPR/Cas9 outcomes from batch deep sequencing data-

                 _                                                         _
                '  )                                                      '  )
                .-'                  _________________                    .-'
               (____                | __    ___ __    |                  (____
            C)|     \               ||__) /\ | /  |__||               C)|     \
              \     /               ||__)/--\| \__|  ||                 \     /
               \___/                |_________________|                  \___/

                                [CRISPResso version 2.2.14]

[Note that starting in version 2.3.0 FLASh and Trimmomatic will be replaced by fastp for read merging and trimming. Accordingly, the --flash_command and --trimmomatic_command parameters will be replaced with --fastp_command. Also, --trimmomatic_options_string will be replaced with --fastp_options_string.

Also in version 2.3.0, when running CRISPRessoPooled in mixed-mode (amplicon file and genome are provided) the default behavior will be as if the --demultiplex_only_at_amplicons parameter is provided. This change means that reads and amplicons do not need to align to the exact locations.] [For support contact k.clement@utah.edu or support@edilytics.com]

INFO @ Mon, 01 Apr 2024 12:56:48: Creating Folder /home/noah/crispresso_data/2024.4.1_NW-ST_GNE-TET/test/CRISPRessoBatch_on_test_batchfile

/home/noah/anaconda3/envs/crispresso2_env/lib/python3.10/site-packages/CRISPResso2/CRISPRessoBatchCORE.py:190: FutureWarning: Setting an item of incompatible dtype is deprecated and will raise in a future error of pandas. Value '' has dtype incompatible with float64, please explicitly cast to a compatible dtype first. batch_params[arg].fillna(value=getattr(args, arg), inplace=True) INFO @ Mon, 01 Apr 2024 12:56:48: Running CRISPResso with 1 processes

                                     ~~~CRISPResso 2~~~
              -Analysis of genome editing outcomes from deep sequencing data-

                                              _
                                             '  )
                                             .-'
                                            (____
                                         C)|     \
                                           \     /
                                            \___/

                                [CRISPResso version 2.2.14]

[Note that starting in version 2.3.0 FLASh and Trimmomatic will be replaced by fastp for read merging and trimming. Accordingly, the --flash_command and --trimmomatic_command parameters will be replaced with --fastp_command. Also, --trimmomatic_options_string will be replaced with --fastp_options_string.

Also in version 2.3.0, when running CRISPRessoPooled in mixed-mode (amplicon file and genome are provided) the default behavior will be as if the --demultiplex_only_at_amplicons parameter is provided. This change means that reads and amplicons do not need to align to the exact locations.] [For support contact k.clement@utah.edu or support@edilytics.com]

INFO @ Mon, 01 Apr 2024 12:56:49: Creating Folder /home/noah/crispresso_data/2024.4.1_NW-ST_GNE-TET/test/CRISPRessoBatch_on_test_batchfile/CRISPResso_on_TET-NG-W1291X-579-1

INFO @ Mon, 01 Apr 2024 12:56:49: Computing quantification windows

INFO @ Mon, 01 Apr 2024 12:56:49: Filtering reads with average bp quality < 30 and single bp quality < 0 and replacing bases with quality < 0 with N ...

Completed in 5 seconds

INFO @ Mon, 01 Apr 2024 12:56:55: Aligning sequences...

INFO @ Mon, 01 Apr 2024 12:56:55: Processing reads; N_TOT_READS: 0 N_COMPUTED_ALN: 0 N_CACHED_ALN: 0 N_COMPUTED_NOTALN: 0 N_CACHED_NOTALN: 0

INFO @ Mon, 01 Apr 2024 12:56:56: Processing reads; N_TOT_READS: 10000 N_COMPUTED_ALN: 3386 N_CACHED_ALN: 6575 N_COMPUTED_NOTALN: 38 N_CACHED_NOTALN: 1

INFO @ Mon, 01 Apr 2024 12:56:57: Processing reads; N_TOT_READS: 20000 N_COMPUTED_ALN: 6074 N_CACHED_ALN: 13838 N_COMPUTED_NOTALN: 87 N_CACHED_NOTALN: 1

INFO @ Mon, 01 Apr 2024 12:56:58: Processing reads; N_TOT_READS: 30000 N_COMPUTED_ALN: 8102 N_CACHED_ALN: 21775 N_COMPUTED_NOTALN: 122 N_CACHED_NOTALN: 1

INFO @ Mon, 01 Apr 2024 12:56:58: Finished reads; N_TOT_READS: 30640 N_COMPUTED_ALN: 8234 N_CACHED_ALN: 22282 N_COMPUTED_NOTALN: 123 N_CACHED_NOTALN: 1

INFO @ Mon, 01 Apr 2024 12:56:58: Done!

INFO @ Mon, 01 Apr 2024 12:56:58: Quantifying indels/substitutions...

INFO @ Mon, 01 Apr 2024 12:56:59: Done!

INFO @ Mon, 01 Apr 2024 12:56:59: Calculating allele frequencies...

INFO @ Mon, 01 Apr 2024 12:56:59: Done!

INFO @ Mon, 01 Apr 2024 12:56:59: Saving processed data...

INFO @ Mon, 01 Apr 2024 12:56:59: Making Plots...

DEBUG @ Mon, 01 Apr 2024 12:56:59: Plotting read bar plot

DEBUG @ Mon, 01 Apr 2024 12:56:59: Plotting read class pie chart and bar plot

INFO @ Mon, 01 Apr 2024 12:57:00: Begin processing plots for amplicon Reference

DEBUG @ Mon, 01 Apr 2024 12:57:00: Plotting nucleotide quilt across amplicon

/home/noah/anaconda3/envs/crispresso2_env/lib/python3.10/site-packages/CRISPResso2/CRISPRessoPlot.py:188: FutureWarning: Series.getitem treating keys as positions is deprecated. In a future version, integer keys will always be treated as labels (consistent with DataFrame behavior). To access a value by position, use ser.iloc[pos] ins_pct = float(mod_pct_df_indexed.loc[sampleName,'Insertions_Left'][pos_ind-2]) DEBUG @ Mon, 01 Apr 2024 12:57:01: Plotting indel size distribution for Reference

DEBUG @ Mon, 01 Apr 2024 12:57:01: Plotting frequency deletions/insertions for Reference

DEBUG @ Mon, 01 Apr 2024 12:57:02: Plotting amplication modifications for Reference

DEBUG @ Mon, 01 Apr 2024 12:57:02: Plotting modification frequency for Reference

DEBUG @ Mon, 01 Apr 2024 12:57:03: Plotting quantification window locations for Reference

DEBUG @ Mon, 01 Apr 2024 12:57:03: Plotting position dependent indel for Reference

INFO @ Mon, 01 Apr 2024 12:57:03: Done!

INFO @ Mon, 01 Apr 2024 12:57:03: Done!

INFO @ Mon, 01 Apr 2024 12:57:03: Removing Intermediate files...

INFO @ Mon, 01 Apr 2024 12:57:04: Analysis Complete!

                                    _
                                   '  )
                                   .-'
                                  (____
                               C)|     \
                                 \     /
                                  \___/

INFO @ Mon, 01 Apr 2024 12:57:04: Completed 1/1 runs

INFO @ Mon, 01 Apr 2024 12:57:04: Finished all batches

INFO @ Mon, 01 Apr 2024 12:57:04: Reporting summary for amplicon: "Reference"

DEBUG @ Mon, 01 Apr 2024 12:57:04: Plotting nucleotide quilt for Reference

/home/noah/anaconda3/envs/crispresso2_env/lib/python3.10/site-packages/CRISPResso2/CRISPRessoPlot.py:188: FutureWarning: Series.getitem treating keys as positions is deprecated. In a future version, integer keys will always be treated as labels (consistent with DataFrame behavior). To access a value by position, use ser.iloc[pos] ins_pct = float(mod_pct_df_indexed.loc[sampleName,'Insertions_Left'][pos_ind-2]) DEBUG @ Mon, 01 Apr 2024 12:57:05: Plotting allele modification heatmap for Reference

DEBUG @ Mon, 01 Apr 2024 12:57:05: Plotting allele modification line plot for Reference

DEBUG @ Mon, 01 Apr 2024 12:57:05: Plotting allele modification heatmap for Reference

DEBUG @ Mon, 01 Apr 2024 12:57:05: Plotting allele modification line plot for Reference

DEBUG @ Mon, 01 Apr 2024 12:57:05: Plotting allele modification heatmap for Reference

DEBUG @ Mon, 01 Apr 2024 12:57:05: Plotting allele modification line plot for Reference

INFO @ Mon, 01 Apr 2024 12:57:05: Analysis Complete!

                                    _
                                   '  )
                                   .-'
                                  (____
                               C)|     \
                                 \     /
                                  \___/
kclem commented 3 months ago

Hi @JNWorkman nothing jumps out at me from the logs.

Are there reads assigned to the amplicon/guide? If there are no reads, there will be no plot 9.

If you run the single CRISPResso command (not CRISPRessoBatch) does it produce the file?

JNWorkman commented 3 months ago

Hi @kclem

Thanks for looking into this.

There are reads assigned to the amplicon, yes. Nothing else about the analysis is unusual except the missing plot. Running the single command yields the same results.

I did figure out the problem however. My batchfile did not include a sequence for the guide, only the amplicon. Including the guide sequence fixed the issue and the plot is generated correctly. Apologies for the hassle over a simple mistake on my part.

A related question, is there a way to alter which positions the zoomed in plots like the allele frequency table cover? For example, say I used a pegRNA that is at position 100 in the amplicon, but the actual edited nucleotide I'm interested in assessing is at position 200. Can I tell crispresso what window to use for the plots instead of the default "around sgRNA"?

thanks!

Colelyman commented 3 months ago

Hi @JNWorkman,

Glad to hear that you got it figured out!

The easiest way to achieve what you are looking to do with the allele frequency table is with the https://github.com/pinellolab/CRISPResso2/blob/master/scripts/plotCustomAllelePlot.py script. You can use the --plot_center argument to set where the plot is centered.

Let us know if you have any trouble.

Thanks, Cole