pinellolab / CRISPResso2

Analysis of deep sequencing data for rapid and intuitive interpretation of genome editing experiments
Other
270 stars 94 forks source link

My alignment is 0 #312

Closed yuanyuan12543 closed 1 year ago

yuanyuan12543 commented 1 year ago

Hi, I tried to run the command as below, but the alignment is 0, I don't know why, could you please help me to find out the reason?

module load crispresso2

CRISPResso -r1 gAE1-H_R1_001.fastq.gz -a AAGTTTGAATGTAATTTATTATATCCCGTTCCAGAACCTGCGGCAATTGTGGAAAAGCCAGAATCCATAA,ATTAAGATAAAAGTCCAACCAACCAACCTGAAACTTGTACTGAGGCTGTACAGCTGTCTT -an AE,Cas9 -g GTTCCAGAACCTGCGGCAAT,ACCAACCAACCTGAAACTTG -gn spAE,spCas9g

I am attaching the log file here as well.

Thank you! CRISPResso_RUNNING_LOG.txt

kclem commented 1 year ago

Hi @yuanyuan12543

It appears from your log that none of the reads in your fastq file aligned to the reference sequences you have provided. Can you view the first few reads from your gAE1-H_R1_001.fastq.gz file and try to manually align them to your reference sequences to make sure you've amplified the correct region? If you can't open the file you can upload it here and I can take a look.

yuanyuan12543 commented 1 year ago

Thank you for the help. I am attaching the fastq file here. gAE1-H_R1_001.fastq.gz

Best,

Junli

yuanyuan12543 commented 1 year ago

I did a blast alignment manually, the reads should be able to map to the reference sequences.

yuanyuan12543 commented 1 year ago

Actually 10% reads should be able to map to the reference sequences.

kclem commented 1 year ago

The reads in your fastq file appear to be much longer (some as long as 250bp) than the given amplicons (70bp). By default, CRISPResso discards reads with less than 60% homology (60% of bases must match exactly) and even if 70/250bp match, this is only 70/250=28% homology. If you'd like to include lower homology alignments, please use the --default_min_aln_score.

CRISPResso is designed to analyze reads from targeted amplicon sequencing. Are you sure you're using amplicon sequencing? Are there adapters that need to be trimmed to get the reads down to 70bp?

yuanyuan12543 commented 1 year ago

Hi, thank you for the suggestion, I will try to modify the --default_min_aln_score. Yes, it is designed for targeted amplicon sequencing, but for exon deletion. I am using CRISPResso to detect the single mutation for some reads who do not have the complete deletion. I am sorry it is confusion. Thank you for your help! Will get back to you if the program works when adjusting the --default_min_aln_score.

kclem commented 1 year ago

Hi @yuanyuan12543 Did this solve your problem? If it didn't work, let us know and we'll re-open this issue.

tenlives commented 1 year ago

The reads in your fastq file appear to be much longer (some as long as 250bp) than the given amplicons (70bp). By default, CRISPResso discards reads with less than 60% homology (60% of bases must match exactly) and even if 70/250bp match, this is only 70/250=28% homology. If you'd like to include lower homology alignments, please use the --default_min_aln_score.

CRISPResso is designed to analyze reads from targeted amplicon sequencing. Are you sure you're using amplicon sequencing? Are there adapters that need to be trimmed to get the reads down to 70bp?

Hi, I set --default_min_aln_score with 70. I tested reads with150bp, it has 120bp similar to reference, but the output is no reads aligned, so should I set other parameters?

kclem commented 1 year ago

Hi @tenlives,

Make sure you're using the correct reference sequence and fastq input file. I'd pull out the first few reads from your fastq file and manually align them to your expected amplicon to make sure the amplicon sequence or the read file haven't been mixed up.

You can also try setting --default_min_aln_score 0 to accept all alignment if you think your amplicon sequence is correct, but your reads may have lower homology.

tenlives commented 1 year ago

Hi @tenlives,

Make sure you're using the correct reference sequence and fastq input file. I'd pull out the first few reads from your fastq file and manually align them to your expected amplicon to make sure the amplicon sequence or the read file haven't been mixed up.

You can also try setting --default_min_aln_score 0 to accept all alignment if you think your amplicon sequence is correct, but your reads may have lower homology.

I may misunderstand this parameter. The amplicon seq is 300bp, if I set default_min_aln_score=70, the amplicon seq is 300bp, the reads should be at least 210bp long? if the reads is 150bp, if they have 120bp matched, it wont be aligned?

kclem commented 1 year ago

Yes, this is correct. If the amplicon sequence is 300bp, this will accept deletions up to 90bp long (210/300bp match).

I'm not sure I understand your second part of the question. If your reads are 150bp and have 120bp matched to a 300bp reference amplicon they will not align (120/300=40% alignment).

If you want me to look at this you can email me at k.clement@utah.edu and I can look at your fastq sequences and your reference sequences.

tenlives commented 1 year ago

Yes, this is correct. If the amplicon sequence is 300bp, this will accept deletions up to 90bp long (210/300bp match).

I'm not sure I understand your second part of the question. If your reads are 150bp and have 120bp matched to a 300bp reference amplicon they will not align (120/300=40% alignment).

If you want me to look at this you can email me at k.clement@utah.edu and I can look at your fastq sequences and your reference sequences.

Thank you very much! I figure it out with your kind explanation. You fully solved my problem. Very useful tool!

tenlives commented 1 year ago

Yes, this is correct. If the amplicon sequence is 300bp, this will accept deletions up to 90bp long (210/300bp match).

I'm not sure I understand your second part of the question. If your reads are 150bp and have 120bp matched to a 300bp reference amplicon they will not align (120/300=40% alignment).

If you want me to look at this you can email me at k.clement@utah.edu and I can look at your fastq sequences and your reference sequences.

Sorry to bother you again, if I use CRISPRessoPooled, how tow set different --default_min_aln_score for every amplicon alignment

Colelyman commented 1 year ago

Hi @tenlives,

No bother at all. Unfortunately you can't set different alignment scores for each amplicon using CRISPRessoPooled, but you can do that using CRISPRessoBatch (by adding a column to the input batch file called default_min_aln_score).

Thanks, Cole

tenlives commented 1 year ago

Hi @tenlives,

No bother at all. Unfortunately you can't set different alignment scores for each amplicon using CRISPRessoPooled, but you can do that using CRISPRessoBatch (by adding a column to the input batch file called default_min_aln_score).

Thanks, Cole

Thanks for replying in time, I have one pooled fastq from different amplicon sequences. If I use CRISPRessoBatch, for each gene, the fastq_r1 are the same fastq?

Colelyman commented 1 year ago

This is the tricky part, you may have to run CRISPRessoPooled with --default_min_aln_score 0 to essentially just demultiplex your single fastq files into the separate amplicons, and then run CRISPRessoBatch with the separate desired default_min_aln_score values to refine the alignments.

In the CRISPRessoPooled output folder, you can find a file called AMPL_{amplicon name}.fastq.gz (where {amplicon name} is replaced with the name of the amplicon) that has the reads associated with the amplicon. It would be these fastq files that would be used as input to CRISPRessoBatch.

tenlives commented 1 year ago

This is the tricky part, you may have to run CRISPRessoPooled with --default_min_aln_score 0 to essentially just demultiplex your single fastq files into the separate amplicons, and then run CRISPRessoBatch with the separate desired default_min_aln_score values to refine the alignments.

In the CRISPRessoPooled output folder, you can find a file called AMPL_{amplicon name}.fastq.gz (where {amplicon name} is replaced with the name of the amplicon) that has the reads associated with the amplicon. It would be these fastq files that would be used as input to CRISPRessoBatch.

cool! I will try this method.

Colelyman commented 1 year ago

Good luck!

tenlives commented 1 year ago

This is the tricky part, you may have to run CRISPRessoPooled with --default_min_aln_score 0 to essentially just demultiplex your single fastq files into the separate amplicons, and then run CRISPRessoBatch with the separate desired default_min_aln_score values to refine the alignments. In the CRISPRessoPooled output folder, you can find a file called AMPL_{amplicon name}.fastq.gz (where {amplicon name} is replaced with the name of the amplicon) that has the reads associated with the amplicon. It would be these fastq files that would be used as input to CRISPRessoBatch.

cool! I will try this method.

Hi, I setted --default_min_aln_score 0, but there is still %10 reads unaligned, could I reduce the unaligned rate, is it possible that each read have a best alignment no matter what the alignment score is ? 32.45% reads aligned for more than one times, will these reads be assigned for more than one times. 1694138102427

Colelyman commented 1 year ago

These statistics are actually the results from the bowtie2, so the --default_min_aln_score isn't going to affect these statistics. Not sure how to get that last 10% to align, but you can try to copy that bowtie2 alignment command and change the parameters so that more of your reads align. When you have parameters that are suitable, you can pass them to CRISPRessoPooled with --bowtie2_options_string. Good luck!