Closed AndreaBarghetti closed 1 year ago
Hi @AndreaBarghetti,
Thanks for using CRISPResso and submitting this issue! The mixed-mode of CRISPRessoPooled is currently supported, and there is definitely something weird going on here. I am looking into it and will keep this thread updated with what I find.
Have you happened to try the mixed-mode on any different datasets? If so, do you see similar results?
Thanks, Cole
Thanks for working on it.
I tried on my own dataset but got the same issue.
I have only tried it using the docker image however
The issue with this particular case is that the reads don't exactly align to the start/end of the amplicons (see attached image of the aligned reads on top compared to the aligned amplicon on bottom). You can remedy this by adding the --demultiplex_only_at_amplicons
which resolves the issue by including reads that overlap with the amplicon instead of demultiplexing by only exact position alignment. In a future release we will have the --demultiplex_only_at_amplicons
be the default when in mixed-mode to mitigate this issue.
Note that in the genome mode or the amplicon mode the exact start/end check isn't performed, which is why the same data runs as expected in these other modes.
Please let us know if you still see unexpected results on your data when using the --demultiplex_only_at_amplicons
parameter!
Have you tested it on the test dataset ?
Doesn't seem to work for me.
I tried to run CRISPResso2 on the test dataset, with --demultiplex_only_at_amplicons, as you suggested:
docker run -d -v ${PWD}:/DATA -w /DATA -i pinellolab/crispresso2 CRISPRessoPooled -r1 Both.Cas9.fastq -x smallGenome/smallGenome -f Cas9.amplicons.txt -p 8 --keep_intermediate --name AMPLICONS_AND_GENOME_SRR1046762 --min_reads_to_use_region 100 --debug --demultiplex_only_at_amplicons -o ./output3
here's the CRISPRessoPooled_RUNNING_LOG:
CRISPResso version 2.2.11a [Command used]: /opt/conda/bin/CRISPRessoPooled -r1 Both.Cas9.fastq -x smallGenome/smallGenome -f Cas9.amplicons.txt -p 8 --keep_intermediate --name AMPLICONS_AND_GENOME_SRR1046762 --min_reads_to_use_region 100 --debug --demultiplex_only_at_amplicons -o ./output3
[Execution log]: Processing input Unable to find matches for header values. Using the default header values and order. Header variable names in order: ['amplicon_name', 'amplicon_seq', 'guide_seq', 'expected_hdr_amplicon_seq', 'coding_seq'] Mapping amplicons to the reference genome... The amplicon [FANC] was mapped to: chr11:1047-1270 The amplicon [HEK3] was mapped to: chr9:966-1198 The uncompressed reference fasta file for smallGenome/smallGenome is already present! Skipping generation. Aligning reads to the provided genome index... Aligning with command: bowtie2 -x smallGenome/smallGenome -p 8 --end-to-end -N 0 --np 0 --mp 3,2 --score-min L,-5,-1.2000000000000002 -U /DATA/output3/CRISPRessoPooled_on_AMPLICONS_AND_GENOME_SRR1046762/Both.Cas9.fastq 2>>/DATA/output3/CRISPRessoPooled_on_AMPLICONS_AND_GENOME_SRR1046762/CRISPRessoPooled_RUNNING_LOG.txt| samtools view -bS - | samtools sort -@ 8 - -o /DATA/output3/CRISPRessoPooled_on_AMPLICONS_AND_GENOME_SRR1046762/AMPLICONS_AND_GENOME_SRR1046762_GENOME_ALIGNED.bam 500 reads; of these: 500 (100.00%) were unpaired; of these: 2 (0.40%) aligned 0 times 480 (96.00%) aligned exactly 1 time 18 (3.60%) aligned >1 times 99.60% overall alignment rate Preparing to demultiplex reads aligned to positions overlapping amplicons in the genome... Wrote demultiplexing commands to /DATA/output3/CRISPRessoPooled_on_AMPLICONS_AND_GENOME_SRR1046762/DEMUX_COMMANDS.txt Demultiplexing reads by location (2 genomic regions)... Processing amplicon: FANC
The amplicon [FANC] has enough reads (248) mapped to it! Running CRISPResso!
Running CRISPResso:CRISPResso -r1 /DATA/output3/CRISPRessoPooled_on_AMPLICONS_AND_GENOME_SRR1046762/MAPPED_REGIONS/REGION_chr11_1047_1270.fastq.gz -o /DATA/output3/CRISPRessoPooled_on_AMPLICONS_AND_GENOME_SRR1046762 --name FANC --trimmomatic_command trimmomatic --prime_editing_pegRNA_scaffold_min_match_length 1 --n_processes 1 --conversion_nuc_from C --flexiguide_homology 80 --default_min_aln_score 60 --aln_seed_count 5 --quantification_window_size 1 --exclude_bp_from_left 15 --keep_intermediate --conversion_nuc_to T --aln_seed_min 2 --needleman_wunsch_gap_extend -2 --min_paired_end_reads_overlap 10 --exclude_bp_from_right 15 --bowtie2_index smallGenome/smallGenome --quantification_window_center -3 --max_paired_end_reads_overlap 100 --min_average_read_quality 0 --needleman_wunsch_gap_open -20 --min_bp_quality_or_N 0 --min_frequency_alleles_around_cut_to_plot 0.2 --prime_editing_pegRNA_extension_quantification_window_size 5 --max_rows_alleles_around_cut_to_plot 50 --flash_command flash --aln_seed_len 10 --min_single_bp_quality 0 --needleman_wunsch_aln_matrix_loc EDNAFULL --needleman_wunsch_gap_incentive 1 --debug --guide_seq GGAATCCCTTCTGCAGCACC --plot_window_size 20 Processing amplicon: HEK3
The amplicon [HEK3] has enough reads (250) mapped to it! Running CRISPResso!
Running CRISPResso:CRISPResso -r1 /DATA/output3/CRISPRessoPooled_on_AMPLICONS_AND_GENOME_SRR1046762/MAPPED_REGIONS/REGION_chr9_966_1198.fastq.gz -o /DATA/output3/CRISPRessoPooled_on_AMPLICONS_AND_GENOME_SRR1046762 --name HEK3 --trimmomatic_command trimmomatic --prime_editing_pegRNA_scaffold_min_match_length 1 --n_processes 1 --conversion_nuc_from C --flexiguide_homology 80 --default_min_aln_score 60 --aln_seed_count 5 --quantification_window_size 1 --exclude_bp_from_left 15 --keep_intermediate --conversion_nuc_to T --aln_seed_min 2 --needleman_wunsch_gap_extend -2 --min_paired_end_reads_overlap 10 --exclude_bp_from_right 15 --bowtie2_index smallGenome/smallGenome --quantification_window_center -3 --max_paired_end_reads_overlap 100 --min_average_read_quality 0 --needleman_wunsch_gap_open -20 --min_bp_quality_or_N 0 --min_frequency_alleles_around_cut_to_plot 0.2 --prime_editing_pegRNA_extension_quantification_window_size 5 --max_rows_alleles_around_cut_to_plot 50 --flash_command flash --aln_seed_len 10 --min_single_bp_quality 0 --needleman_wunsch_aln_matrix_loc EDNAFULL --needleman_wunsch_gap_incentive 1 --debug --guide_seq GGCCCAGACTGAGCACGTGA --plot_window_size 20
ERROR: CRISPResso amplicon #0 failed. For more information, try running the command: "CRISPResso -r1 /DATA/output3/CRISPRessoPooled_on_AMPLICONS_AND_GENOME_SRR1046762/MAPPED_REGIONS/REGION_chr11_1047_1270.fastq.gz -o /DATA/output3/CRISPRessoPooled_on_AMPLICONS_AND_GENOME_SRR1046762 --name FANC --trimmomatic_command trimmomatic --prime_editing_pegRNA_scaffold_min_match_length 1 --n_processes 1 --conversion_nuc_from C --flexiguide_homology 80 --default_min_aln_score 60 --aln_seed_count 5 --quantification_window_size 1 --exclude_bp_from_left 15 --keep_intermediate --conversion_nuc_to T --aln_seed_min 2 --needleman_wunsch_gap_extend -2 --min_paired_end_reads_overlap 10 --exclude_bp_from_right 15 --bowtie2_index smallGenome/smallGenome --quantification_window_center -3 --max_paired_end_reads_overlap 100 --min_average_read_quality 0 --needleman_wunsch_gap_open -20 --min_bp_quality_or_N 0 --min_frequency_alleles_around_cut_to_plot 0.2 --prime_editing_pegRNA_extension_quantification_window_size 5 --max_rows_alleles_around_cut_to_plot 50 --flash_command flash --aln_seed_len 10 --min_single_bp_quality 0 --needleman_wunsch_aln_matrix_loc EDNAFULL --needleman_wunsch_gap_incentive 1 --debug --guide_seq GGAATCCCTTCTGCAGCACC --plot_window_size 20"
My apologies @AndreaBarghetti I forgot to mention that this PR https://github.com/edilytics/CRISPResso2/pull/11 needs to be merged in first, which will be released soon.
The fixes were released last week, would you mind seeing if it works with your data now @AndreaBarghetti?
I'll try ASAP and let you know. Thanks !
The fixes were released last week, would you mind seeing if it works with your data now @AndreaBarghetti?
I tested it on my own dataset and it all worked fine!
Thanks a lot for solving the issue so quickly, Amazing job!
Great to hear! Thanks for using CRISPResso2 and happy analyzing!
Hi,
I have been trying, unsuccessfully, to run CRISPResso2 pooled with mixed-mode, as according to the documentation.
I noticed that there are tests for CRISPRessoPooled in "amplicon mode":
echo Running CRISPRessoPooled CRISPRessoPooled -r1 Both.Cas9.fastq -f Cas9.amplicons.txt -p 2 --keep_intermediate --min_reads_to_use_region 100 --debug &> CRISPRessoPooled_on_Both.Cas9.log
but I couldn't see any tests for "the genome mode" and "mixed-mode"I tried to run CRISPRessoPooled in mixed mode with the test datasets:
docker run -d -v ${PWD}:/DATA -w /DATA -i pinellolab/crispresso2 CRISPRessoPooled -r1 Both.Cas9.fastq -x smallGenome/smallGenome -f Cas9.amplicons.txt -p 2 --keep_intermediate --min_reads_to_use_region 100 --debug -o ./output
It runs without errors, but the output is not what one would expect:
CRISPRessoPooled_RUNNING_LOG.txt:
CRISPResso_amplicon_aligned_locations.txt
SAMPLES_QUANTIFICATION_SUMMARY.txt
Is the mixed-mode still supported?
Thanks!