Open tangxj98 opened 3 months ago
Hi @tangxj98,
The amplicon sequence (CTGGCTCCTTCTGTTGTTTCTCTTGGCTCCAGGACCCCCGCAGCAAACACAAGTTTAAGATCCACACGTACTCCAGCCCCAC) is the sequence of the first region in your region file (focused.region.txt) as contained in your genome file (GCA_000001405.15_GRCh38_no_alt_short_headers_nonACTG_to_N.fa).
What is the design of your targeted sequencing experiment? Do you have two primers outside of the guides amplifying the entire interior region? If you are expecting deletion of the interior gene, the short PCR products will be artificially inflated compared to the longer non-deletion amplicons.
I would suggest designing multiple primers to generate amplicons of approximately the same size like this:
--P1--> <--P2-- --P3--> <--P4--
|g1 {380Kbp gene} |g2
This strategy would allow you to measure small indels (P1/P2 and P3/P4) as well as the long deletion (P1/P4).
Hi @kclem ,
Thanks for the reply. The design of the targeted sequencing actually contains tiles of primers that covers the whole genes. Below is how the customized targeted sequencing panel look like in genome browser.
The focused.region.txt I used was a bed file containing two 100bp regions spanning over the two double strand break points of the sgRNAs. Is this the right way to make the focused.region.txt? Sorry I am a newbie with CRISPR. I sincerely appreciate your suggestion.
By specifying two 100bp regions, you'll be able to estimate the indel rate at the two sites, but you won't be able to measure the rate at which the interior gene is deleted (because the reads from cells where the gene is deleted won't align to your genome).
Note that reads must cover the entire 100bp region to be included in the analysis, so I would also suggest shrinking the region to ~30bp depending on your read length.
Again, the main problem with using WGS for this analysis is that the reads that support the large gene deletion aren't aligned to the genome reference and won't be considered for analysis.
If you want to quantify indels at each target site as well as the large deletion, I would suggest creating 3 reference sequences: 1) the 100bp reference at the left cut site, 2) 100bp reference at the right cut site and 3) reference where the interior gene is deleted (probably 50bp from each arm, so 100bp in total). You can use CRISPRessoPooled to align reads to these references and analyze cutting frequencies.
I hope that helps!
Hi @kclem,
I used the CRISPRessoPooled to run the job. CRISPRessoPooled -r1 S106_Par.FCAFM53VM5_L1_R1_IGTTCGCCAGT-GATAAGTCGA.fastq.gz -r2 S106_Par.FCAFM53VM5_L1_R2_IGTTCGCCAGT-GATAAGTCGA.fastq.gz -f AMPLICONS_FILE.txt --name AMPLICONS_S106_Par --max_paired_end_reads_overlap 160 --keep_intermediate >& AMPLICONS_S106_Par.log &
However, it is very weird that the intermediate bam had no alignment reads. From the log: Alignment command: bowtie2 -x CRISPRessoPooled_on_AMPLICONS_S106_Par/CUSTOM_BOWTIE2_INDEX -p 1 --end-to-end -N 0 --np 0 --mp 3,2 --score-min L,-5,-1.2000000000000002 -U CRISPRessoPooled_on_AMPLICONS_S106_Par/out.extendedFrags.fastq.gz 2>>CRISPRessoPooled_on_AMPLICONS_S106_Par/CRISPRessoPooled_RUNNING_LOG.txt | samtools view -bS - > CRISPRessoPooled_on_AMPLICONS_S106_Par/CRISPResso_AMPLICONS_ALIGNED.bam 13616521 reads; of these: 13616521 (100.00%) were unpaired; of these: 13616521 (100.00%) aligned 0 times 0 (0.00%) aligned exactly 1 time 0 (0.00%) aligned >1 times 0.00% overall alignment rate
The AMPLICON file: $ cat ../AMPLICONS_FILE.txt aroundExon4_1 TGGAGTACGTGTGGATCTTAAACTTGTGTTTGCTGCGGGGGTCCTGGAGCCAAGAG TTAAACTTGTGTTTGCTGCG aroundExon5_9 GCCACCTACCGAGGACAATGAGGACGTCCCTGTCGATGTGGGCCTGGATGTAGATG GAGGACGTCCCTGTCGATGT deletion CTACCGAGGACAATGAGGACGTCCCTGTCGATGCGGGGGTCCTGGAGCCAAGAGAA NA
What makes it weird is that, I can grep the whole "TGGAGTACGTGTGGATCTTAAACTTGTGTTTGCTGCGGGGGTCCTGGAGCCAAGAG" from the intermediate out.extendedFrags.fastq.gz and found many exact matches. How came the bowtie alignment turned out having nothing mapped? Could you please provide some insights here? Or shall I use a mixed or genome mode to run it?
Many thanks!
How long are your reads? Perhaps bowtie has a hard time aligning reads to references shorter than the read length. Yes, you could try running in mixed mode or setting a larger amplicon length in your AMPLICONS_FILE.txt.
You should also check that your AMPLICONS_FILE.txt is correct. In your CRISPRessoPooled command you reference AMPLICONS_FILE.txt, but you cat ../AMPLICONS_FILE.txt.
My read is 150bp. I will try to make a larger amplicon that's longer than the read length.
The previous AMPLICONS_FILE.txt looks like: <html xmlns:v="urn:schemas-microsoft-com:vml" xmlns:o="urn:schemas-microsoft-com:office:office" xmlns:x="urn:schemas-microsoft-com:office:excel" xmlns="http://www.w3.org/TR/REC-html40">
aroundExon4_1 | TGGAGTACGTGTGGATCTTAAACTTGTGTTTGCTGCGGGGGTCCTGGAGCCAAGAG | TTAAACTTGTGTTTGCTGCG -- | -- | -- aroundExon5_9 | GCCACCTACCGAGGACAATGAGGACGTCCCTGTCGATGTGGGCCTGGATGTAGATG | GAGGACGTCCCTGTCGATGT deletion | CTACCGAGGACAATGAGGACGTCCCTGTCGATGCGGGGGTCCTGGAGCCAAGAGAA | NA
I am working on a CRISPR data treated by two sgRNAs and sequenced with targeted sequencing. The distance between the two sgRNA is 3000bp which is much longer than amplicon. The targeted area is the whole gene (380Kbp). Which mode should I use?
I tried to run with CRISPRessoWGS but got errors. My WGS mode command: CRISPRessoWGS -b S106_D.sorted.bam -f focused.region.txt -r refs/human/GRCh38/processed/2015_04_04/seqs_for_alignment_pipelines.ucsc_ids/bwa_index/GCA_000001405.15_GRCh38_no_alt_short_headers_nonACTG_to_N.fa --name CRISPR_S106_D -g TTAAACTTGTGTTTGCTGCG,GAGGACGTCCCTGTCGATGT
The error message asked me to try the amplicon mode:
ERROR: CRISPResso region #0 failed. For more information, try running the command: " CRISPResso -r1 CRISPRessoWGS_on_CRISPR_S106_D/ANALYZED_REGIONS/REGION_0.fastq.gz -a CTGGCTCCTTCTGTTGTTTCTCTTGGCTCCAGGACCCCCGCAGCAAACACAAGTTTAAGATCCACACGTACTCCAGCCCCAC -o CRISPRessoWGS_on_CRISPR_S106_D --name exon4-1 --max_rows_alleles_around_cut_to_plot 50 --prime_editing_pegRNA_scaffold_min_match_length 1 --needleman_wunsch_gap_extend -2 --needleman_wunsch_aln_matrix_loc EDNAFULL --conversion_nuc_to T --n_processes 1 --aln_seed_len 10 --trimmomatic_command trimmomatic --min_frequency_alleles_around_cut_to_plot 0.2 --min_bp_quality_or_N 0 --aln_seed_min 2 --flash_command flash --default_min_aln_score 60 --flexiguide_homology 80 --min_average_read_quality 0 --prime_editing_pegRNA_extension_quantification_window_size 5 --min_paired_end_reads_overlap 10 --needleman_wunsch_gap_incentive 1 --plot_window_size 20 --quantification_window_center -3 --quantification_window_size 1 --min_single_bp_quality 0 --max_paired_end_reads_overlap 100 --needleman_wunsch_gap_open -20 --conversion_nuc_from C --exclude_bp_from_left 15 --aln_seed_count 5 --exclude_bp_from_right 15 --guide_seq TTAAACTTGTGTTTGCTGCG,GAGGACGTCCCTGTCGATGT &> lo
I tried and the error message is: ERROR: The guide sequence 1 (GAGGACGTCCCTGTCGATGT) provided is not present in the amplicon sequences!
I don't know how the amplicon sequence (CTGGCTCCTTCTGTTGTTTCTCTTGGCTCCAGGACCCCCGCAGCAAACACAAGTTTAAGATCCACACGTACTCCAGCCCCAC) was made up by the code. Could you please give some suggestion about how to work with my targeted sequencing data?
Thank you very much!