pinellolab / CRISPResso2

Analysis of deep sequencing data for rapid and intuitive interpretation of genome editing experiments
Other
272 stars 94 forks source link

Problem with Cpf1 editing analyis #143

Closed kumar2501 closed 3 years ago

kumar2501 commented 3 years ago

Describe the bug I could see the INDELS in allele frequency table from .html file but didn't see any value in text files (allele frequency table around...) in the output folder. Warning during run on macosBigSur MacBook Pro (13-inch, M1, 2020) WARNING: The requested image's platform (linux/amd64) does not match the detected host platform (linux/arm64/v8) and no specific platform was requested

Expected behavior I should see the frequency of editing to be around 50%, which i don't observe using CRISPRESSO output text file but with RGEN online tool. May be it is not picking up some deletions.

To reproduce CRISPResso command to reproduce the behavior.

Debug output The commands runs normal but may be not picking up some deletions.

Command docker run -v ${PWD}:/DATA -w /DATA -i pinellolab/crispresso2 CRISPResso --fastq_r1 2_R1_001.fastq.gz --fastq_r2 2_R2_001.fastq.gz --amplicon_seq NNNNNNN --guide_seq N(20) --cleavage_offset 1 --min_average_read_quality 30 --amplicon_min_alignment_score 60 --ignore_substitutions --plot_window_size 60 --min_frequency_alleles_around_cut_to_plot 0.05 --max_rows_alleles_around_cut_to_plot 100 --output_folder 2_RNP

Am I missing something? let me know.

Thanks in advance.

kclem commented 3 years ago

Hi @kumar2501

I'm surprised it ran to completion. It looks you're trying to run a amd64 docker image on your arm64 system.

Try using pinellolab/crispresso2:v2.1.1_armv8 which was built for that system.

Let me know if you still see these problems.

kumar2501 commented 3 years ago

It worked but when I try to run for Cpf1 by adding --cleavage_offset 1, no html or any output is generated. Can you let me know whats happening with Cpf1 analysis. Am I missing anything?

Thanks for your time.

kclem commented 3 years ago

The --cleavage_offset seems to work on my machine. Can you run with --debug and see share the output?

kumar2501 commented 3 years ago

The output do not not generate .html file and here is the log of the run

CRISPResso version 2.1.1 [Command used]: /opt/conda/bin/CRISPResso --fastq_r1 4_R1_001.fastq.gz --fastq_r2 4_R2_001.fastq.gz --amplicon_seq NNNNNNNNNN500 --guide_seq N20 --cleavage_offset 1 --min_average_read_quality 30 --amplicon_min_alignment_score 60 --ignore_substitutions --plot_window_size 60 --min_frequency_alleles_around_cut_to_plot 0.05 --max_rows_alleles_around_cut_to_plot 100 --output_folder 4_RNP_AsCas12a_CMC1a_genomic --debug

[Execution log]: Filtering reads with average bp quality < 30 and single bp quality < 0 and replacing bases with quality < 0 with N ... Estimating average read length... Checking average read length from /DATA/4_RNP_AsCas12a__genomic/CRISPResso_on_4_R1_001_4_R2_001/4_R1_001_filtered.fastq.gz Average read length is 245 from /DATA/4_RNP_AsCas12a_CMC1a_genomic/CRISPResso_on_4_R1_001_4_R2_001/4_R1_001_filtered.fastq.gz Merging paired sequences with Flash... Running FLASH command: flash "/DATA/4_RNP_AsCas12a_CMC1a_genomic/CRISPResso_on_4_R1_001_4_R2_001/4_R1_001_filtered.fastq.gz" "/DATA/4_RNP_AsCas12a_CMC1a_genomic/CRISPResso_on_4_R1_001_4_R2_001/4_R2_001_filtered.fastq.gz" --min-overlap 10 --max-overlap 100 --allow-outies -z -d /DATA/4_RNP_AsCas12a_CMC1a_genomic/CRISPResso_on_4_R1_001_4_R2_001 -o out >>/DATA/4_RNP_AsCas12a_CMC1a_genomic/CRISPResso_on_4_R1_001_4_R2_001/CRISPResso_RUNNING_LOG.txt 2>&1 [FLASH] Starting FLASH v1.2.11 [FLASH] Fast Length Adjustment of SHort reads [FLASH]
[FLASH] Input files: [FLASH] /DATA/4_RNP_AsCas12a_CMC1a_genomic/CRISPResso_on_4_R1_001_4_R2_001/4_R1_001_filtered.fastq.gz [FLASH] /DATA/4_RNP_AsCas12a_CMC1a_genomic/CRISPResso_on_4_R1_001_4_R2_001/4_R2_001_filtered.fastq.gz [FLASH]
[FLASH] Output files: [FLASH] /DATA/4_RNP_AsCas12a_CMC1a_genomic/CRISPResso_on_4_R1_001_4_R2_001/out.extendedFrags.fastq.gz [FLASH] /DATA/4_RNP_AsCas12a_CMC1a_genomic/CRISPResso_on_4_R1_001_4_R2_001/out.notCombined_1.fastq.gz [FLASH] /DATA/4_RNP_AsCas12a_CMC1a_genomic/CRISPResso_on_4_R1_001_4_R2_001/out.notCombined_2.fastq.gz [FLASH] /DATA/4_RNP_AsCas12a_CMC1a_genomic/CRISPResso_on_4_R1_001_4_R2_001/out.hist [FLASH] /DATA/4_RNP_AsCas12a_CMC1a_genomic/CRISPResso_on_4_R1_001_4_R2_001/out.histogram [FLASH]
[FLASH] Parameters: [FLASH] Min overlap: 10 [FLASH] Max overlap: 100 [FLASH] Max mismatch density: 0.250000 [FLASH] Allow "outie" pairs: true [FLASH] Cap mismatch quals: false [FLASH] Combiner threads: 4 [FLASH] Input format: FASTQ, phred_offset=33 [FLASH] Output format: FASTQ, phred_offset=33, gzip [FLASH]
[FLASH] Starting reader and writer threads [FLASH] Starting 4 combiner threads [FLASH] Processed 25000 read pairs [FLASH] Processed 50000 read pairs [FLASH] Processed 75000 read pairs [FLASH] Processed 100000 read pairs [FLASH] Processed 125000 read pairs [FLASH] Processed 150000 read pairs [FLASH] Processed 175000 read pairs [FLASH] Processed 200000 read pairs [FLASH] Processed 225000 read pairs [FLASH] Processed 250000 read pairs [FLASH] Processed 275000 read pairs [FLASH] Processed 300000 read pairs [FLASH] Processed 325000 read pairs [FLASH] Processed 350000 read pairs [FLASH] Processed 375000 read pairs [FLASH] Processed 400000 read pairs [FLASH] Processed 425000 read pairs [FLASH] Processed 450000 read pairs [FLASH] Processed 475000 read pairs [FLASH] Processed 498177 read pairs [FLASH]
[FLASH] Read combination statistics: [FLASH] Total pairs: 498177 [FLASH] Combined pairs: 496845 [FLASH] Innie pairs: 496626 (99.96% of combined) [FLASH] Outie pairs: 219 (0.04% of combined) [FLASH] Uncombined pairs: 1332 [FLASH] Percent combined: 99.73% [FLASH]
[FLASH] Writing histogram files. [FLASH]
[FLASH] FLASH v1.2.11 complete! [FLASH] 66.310 seconds elapsed Done! Aligning sequences... Processing reads; N_TOT_READS: 0 N_COMPUTED_ALN: 0 N_CACHED_ALN: 0 N_COMPUTED_NOTALN: 0 N_CACHED_NOTALN: 0 Processing reads; N_TOT_READS: 10000 N_COMPUTED_ALN: 3454 N_CACHED_ALN: 5926 N_COMPUTED_NOTALN: 442 N_CACHED_NOTALN: 178 Processing reads; N_TOT_READS: 20000 N_COMPUTED_ALN: 6021 N_CACHED_ALN: 12775 N_COMPUTED_NOTALN: 765 N_CACHED_NOTALN: 439 Processing reads; N_TOT_READS: 30000 N_COMPUTED_ALN: 8468 N_CACHED_ALN: 19760 N_COMPUTED_NOTALN: 1053 N_CACHED_NOTALN: 719 Processing reads; N_TOT_READS: 40000 N_COMPUTED_ALN: 10762 N_CACHED_ALN: 26907 N_COMPUTED_NOTALN: 1327 N_CACHED_NOTALN: 1004 Processing reads; N_TOT_READS: 50000 N_COMPUTED_ALN: 12965 N_CACHED_ALN: 34144 N_COMPUTED_NOTALN: 1569 N_CACHED_NOTALN: 1322 Processing reads; N_TOT_READS: 60000 N_COMPUTED_ALN: 15043 N_CACHED_ALN: 41454 N_COMPUTED_NOTALN: 1827 N_CACHED_NOTALN: 1676 Processing reads; N_TOT_READS: 70000 N_COMPUTED_ALN: 17099 N_CACHED_ALN: 48867 N_COMPUTED_NOTALN: 2056 N_CACHED_NOTALN: 1978 Processing reads; N_TOT_READS: 80000 N_COMPUTED_ALN: 19155 N_CACHED_ALN: 56203 N_COMPUTED_NOTALN: 2310 N_CACHED_NOTALN: 2332 Processing reads; N_TOT_READS: 90000 N_COMPUTED_ALN: 21017 N_CACHED_ALN: 63794 N_COMPUTED_NOTALN: 2512 N_CACHED_NOTALN: 2677 Processing reads; N_TOT_READS: 100000 N_COMPUTED_ALN: 22747 N_CACHED_ALN: 71534 N_COMPUTED_NOTALN: 2729 N_CACHED_NOTALN: 2990 Processing reads; N_TOT_READS: 110000 N_COMPUTED_ALN: 24535 N_CACHED_ALN: 79175 N_COMPUTED_NOTALN: 2946 N_CACHED_NOTALN: 3344 Processing reads; N_TOT_READS: 120000 N_COMPUTED_ALN: 26268 N_CACHED_ALN: 86886 N_COMPUTED_NOTALN: 3153 N_CACHED_NOTALN: 3693 Processing reads; N_TOT_READS: 130000 N_COMPUTED_ALN: 27947 N_CACHED_ALN: 94658 N_COMPUTED_NOTALN: 3360 N_CACHED_NOTALN: 4035 Processing reads; N_TOT_READS: 140000 N_COMPUTED_ALN: 29619 N_CACHED_ALN: 102469 N_COMPUTED_NOTALN: 3547 N_CACHED_NOTALN: 4365 Processing reads; N_TOT_READS: 150000 N_COMPUTED_ALN: 31177 N_CACHED_ALN: 110357 N_COMPUTED_NOTALN: 3733 N_CACHED_NOTALN: 4733 Processing reads; N_TOT_READS: 160000 N_COMPUTED_ALN: 32772 N_CACHED_ALN: 118176 N_COMPUTED_NOTALN: 3931 N_CACHED_NOTALN: 5121 Processing reads; N_TOT_READS: 170000 N_COMPUTED_ALN: 34369 N_CACHED_ALN: 126037 N_COMPUTED_NOTALN: 4132 N_CACHED_NOTALN: 5462 Processing reads; N_TOT_READS: 180000 N_COMPUTED_ALN: 36062 N_CACHED_ALN: 133816 N_COMPUTED_NOTALN: 4314 N_CACHED_NOTALN: 5808 Processing reads; N_TOT_READS: 190000 N_COMPUTED_ALN: 37670 N_CACHED_ALN: 141708 N_COMPUTED_NOTALN: 4488 N_CACHED_NOTALN: 6134 Processing reads; N_TOT_READS: 200000 N_COMPUTED_ALN: 39243 N_CACHED_ALN: 149603 N_COMPUTED_NOTALN: 4671 N_CACHED_NOTALN: 6483 Processing reads; N_TOT_READS: 210000 N_COMPUTED_ALN: 40803 N_CACHED_ALN: 157500 N_COMPUTED_NOTALN: 4838 N_CACHED_NOTALN: 6859 Processing reads; N_TOT_READS: 220000 N_COMPUTED_ALN: 42434 N_CACHED_ALN: 165347 N_COMPUTED_NOTALN: 5016 N_CACHED_NOTALN: 7203 Processing reads; N_TOT_READS: 230000 N_COMPUTED_ALN: 44111 N_CACHED_ALN: 173128 N_COMPUTED_NOTALN: 5200 N_CACHED_NOTALN: 7561 Processing reads; N_TOT_READS: 240000 N_COMPUTED_ALN: 45566 N_CACHED_ALN: 181151 N_COMPUTED_NOTALN: 5377 N_CACHED_NOTALN: 7906 Processing reads; N_TOT_READS: 250000 N_COMPUTED_ALN: 47118 N_CACHED_ALN: 189059 N_COMPUTED_NOTALN: 5552 N_CACHED_NOTALN: 8271 Processing reads; N_TOT_READS: 260000 N_COMPUTED_ALN: 48883 N_CACHED_ALN: 196694 N_COMPUTED_NOTALN: 5752 N_CACHED_NOTALN: 8671 Processing reads; N_TOT_READS: 270000 N_COMPUTED_ALN: 50598 N_CACHED_ALN: 204354 N_COMPUTED_NOTALN: 5953 N_CACHED_NOTALN: 9095 Processing reads; N_TOT_READS: 280000 N_COMPUTED_ALN: 52338 N_CACHED_ALN: 212023 N_COMPUTED_NOTALN: 6151 N_CACHED_NOTALN: 9488 Processing reads; N_TOT_READS: 290000 N_COMPUTED_ALN: 54028 N_CACHED_ALN: 219702 N_COMPUTED_NOTALN: 6336 N_CACHED_NOTALN: 9934 Processing reads; N_TOT_READS: 300000 N_COMPUTED_ALN: 55654 N_CACHED_ALN: 227550 N_COMPUTED_NOTALN: 6500 N_CACHED_NOTALN: 10296 Processing reads; N_TOT_READS: 310000 N_COMPUTED_ALN: 57316 N_CACHED_ALN: 235364 N_COMPUTED_NOTALN: 6679 N_CACHED_NOTALN: 10641 Processing reads; N_TOT_READS: 320000 N_COMPUTED_ALN: 58965 N_CACHED_ALN: 243138 N_COMPUTED_NOTALN: 6850 N_CACHED_NOTALN: 11047 Processing reads; N_TOT_READS: 330000 N_COMPUTED_ALN: 60719 N_CACHED_ALN: 250812 N_COMPUTED_NOTALN: 7033 N_CACHED_NOTALN: 11436 Processing reads; N_TOT_READS: 340000 N_COMPUTED_ALN: 62323 N_CACHED_ALN: 258645 N_COMPUTED_NOTALN: 7207 N_CACHED_NOTALN: 11825 Processing reads; N_TOT_READS: 350000 N_COMPUTED_ALN: 63860 N_CACHED_ALN: 266528 N_COMPUTED_NOTALN: 7380 N_CACHED_NOTALN: 12232 Processing reads; N_TOT_READS: 360000 N_COMPUTED_ALN: 65304 N_CACHED_ALN: 274524 N_COMPUTED_NOTALN: 7548 N_CACHED_NOTALN: 12624 Processing reads; N_TOT_READS: 370000 N_COMPUTED_ALN: 66799 N_CACHED_ALN: 282468 N_COMPUTED_NOTALN: 7732 N_CACHED_NOTALN: 13001 Processing reads; N_TOT_READS: 380000 N_COMPUTED_ALN: 68444 N_CACHED_ALN: 290231 N_COMPUTED_NOTALN: 7902 N_CACHED_NOTALN: 13423 Processing reads; N_TOT_READS: 390000 N_COMPUTED_ALN: 70039 N_CACHED_ALN: 298062 N_COMPUTED_NOTALN: 8070 N_CACHED_NOTALN: 13829 Processing reads; N_TOT_READS: 400000 N_COMPUTED_ALN: 71577 N_CACHED_ALN: 306035 N_COMPUTED_NOTALN: 8224 N_CACHED_NOTALN: 14164 Processing reads; N_TOT_READS: 410000 N_COMPUTED_ALN: 73104 N_CACHED_ALN: 314030 N_COMPUTED_NOTALN: 8367 N_CACHED_NOTALN: 14499 Processing reads; N_TOT_READS: 420000 N_COMPUTED_ALN: 74516 N_CACHED_ALN: 322108 N_COMPUTED_NOTALN: 8515 N_CACHED_NOTALN: 14861 Processing reads; N_TOT_READS: 430000 N_COMPUTED_ALN: 75996 N_CACHED_ALN: 330111 N_COMPUTED_NOTALN: 8687 N_CACHED_NOTALN: 15206 Processing reads; N_TOT_READS: 440000 N_COMPUTED_ALN: 77599 N_CACHED_ALN: 337952 N_COMPUTED_NOTALN: 8841 N_CACHED_NOTALN: 15608 Processing reads; N_TOT_READS: 450000 N_COMPUTED_ALN: 79036 N_CACHED_ALN: 346043 N_COMPUTED_NOTALN: 8975 N_CACHED_NOTALN: 15946 Processing reads; N_TOT_READS: 460000 N_COMPUTED_ALN: 80554 N_CACHED_ALN: 354014 N_COMPUTED_NOTALN: 9130 N_CACHED_NOTALN: 16302 Processing reads; N_TOT_READS: 470000 N_COMPUTED_ALN: 82004 N_CACHED_ALN: 362047 N_COMPUTED_NOTALN: 9286 N_CACHED_NOTALN: 16663 Processing reads; N_TOT_READS: 480000 N_COMPUTED_ALN: 83723 N_CACHED_ALN: 369833 N_COMPUTED_NOTALN: 9442 N_CACHED_NOTALN: 17002 Processing reads; N_TOT_READS: 490000 N_COMPUTED_ALN: 85038 N_CACHED_ALN: 378037 N_COMPUTED_NOTALN: 9572 N_CACHED_NOTALN: 17353 Finished reads; N_TOT_READS: 496845 N_COMPUTED_ALN: 86015 N_CACHED_ALN: 383575 N_COMPUTED_NOTALN: 9660 N_CACHED_NOTALN: 17595 Done! Quantifying indels/substitutions... Done! Calculating allele frequencies... Done! Saving processed data... Making Plots...

kclem commented 3 years ago

It appears that the run did not complete.

The debug should be printed to the screen when you run it, with a line showing where the error came from. Can you copy that output here?

kumar2501 commented 3 years ago

When I re-run with --dubug, it returns with same output as I posted previously. Will you be comfortable if I send you files and command through email if you can share.

kclem commented 3 years ago

Sure. You can send them to kclement@mgh.harvard.edu.

I've seen this type of problem when docker kills off the CRISPResso process -- have you tried to increase the memory that docker can use? https://forums.docker.com/t/how-to-increase-memory-size-that-is-available-for-a-docker-container/78483

kclem commented 3 years ago

For others who may have this issue, this ended up being a problem with the quantification windows settings:

The Allele_frequency_table_around_XXXX.txt file only contains the indels that are in the quantification window. For this example, the quantification window is set to the be the end of your guide (appropriate for cpf1), but the indels appear to be occurring in the middle of your guide.

Try setting the window center to -6 to capture these deletions, and they should appear in the allele tables.

--quantification_window_center -6