lucapinello / CRISPResso

Software pipeline for the analysis of CRISPR-Cas9 genome editing outcomes from sequencing data
Other
131 stars 55 forks source link

No reads aligned? #26

Closed milw closed 6 years ago

milw commented 6 years ago

I'm getting this error with my data, trying to align just one of the paired end read files. The amplicon input is a single line of text (I don't think it's terminated by a newline character). Alignment of the same reads to this sequence in CLC has no problems. Also test run of CRISPResso completed successfully, no problem. [Command used]: CRISPResso /usr/local/bin/CRISPResso -r1 2_S2_L001_R1_001.fastq.gz -a GATCGGAGAATAAGCATGAGTAGTTATTGAGATCTGGGTCTGACTGCAGGTAGCGTGGTCTTCTAGACGTTTAAGTGGGAGATTTGGAGGGGATGAGGAATGAAGGAACTTCAGGATAG AAAAGGGCTGAAGTCAAGTTCAGCTCCTAAAATGGATGTGGGAGCAAACTTTGAAGATAAACTGAATGACCCAGAGGATGAAACAGCGCAGATCAAAGAGGGGCCTGGAGCTCTGAGAAGAGAAGGAGACTCATCCGTGTTGAGTTTCCACAAGTACTGTCTTGAGTTTTGCAATAAAAGTGGGATAGC AGAGTTGAGTGAGCCGTAGGCTGAGTTCTCTCTTTTGTCTCCTAAGTTTTTATGACTACAAAAATCAGTAGTATGTCCTGAAATAATCATTAAGCTGTTTGAAAGTATGACTGCTTGCCATGTAGATACCATGGCTTGCTGAATAATCAGAAGAGGTGTGACTCTTATTCTAAAATTTGTCACAAAATG TCAAAATGAGAGACTCTGTAGGAACG

[Execution log]: Preparing files for the alignment... Done! Aligning sequences... Needleman-Wunsch global alignment of two sequences Align sequences to reverse complement of the amplicon... Done! Needleman-Wunsch global alignment of two sequences Quantifying indels/substitutions... Alignment error, please check your input.

ERROR: Zero sequences aligned, please check your amplicon sequence

lucapinello commented 6 years ago

@milw if the provided example runs fine you should double check the amplicon sequence you are providing.

1) Take a look to the first reads of your fasta files and see if they match the amplicon you are providing. You can do this with:

zcat 2_S2_L001_R1_001.fastq.gz | head

2) Also if you have paired end reads you should provide both otherwise the alignment may fail since you are providing a longer amplicon then the part you have in a single file, in fact if the sequence homology is less than 60% you don't get any alignment. This can be changed with the parameter:

--min_identity_score Minimum identity score for the alignment (default: 60.0)

3) It seems you have a space in your amplicon: (I have added a # so you can easily see where)

GATCGGAGAATAAGCATGAGTAGTTATTGAGATCTGGGTCTGACTGCAGGTAGCGTGGTCTTCTAGACGTTTAAGTGGGAGATTTGGAGGGGATGAGGAATGAAGGAACTTCAGGATAG AAAAGGGCTGAAGTCAAGTTCAGCTCCTAAAATGGATGTGGGAGCAAACTTTGAAGATAAACTGAATGACCCAGAGGATGAAACAGCGCAGATCAAAGAGGGGCCTGGAGCTCTGAGAAGAGAAGGAGACTCATCCGTGTTGAGTTTCCACAAGTACTGTCTTGAGTTTTGCAATAAAAGTGGGATAGC AGAGTTGAGTGAGCCGTAGGCTGAGTTCTCTCTTTTGTCTCCTAAGTTTTTATGACTACAAAAATCAGTAGTATGTCCTGAAATAATCATTAAGCTGTTTGAAAGTATGACTGCTTGCCATGTAGATACCATGGCTTGCTGAATAATCAGAAGAGGTGTGACTCTTATTCTAAAATTTGTCACAAAATG#TCAAAATGAGAGACTCTGTAGGAACG

lucapinello commented 6 years ago

Closing this, if you have any other questions please reply below.

milw commented 6 years ago

Hi Luca, thanks for looking into that. I do have paired reads, but they are mostly non-overlapping, so they can't be merged during that stage. For the alignment identity, is that % calculated on the amplicon length or as a % of the read length? I thought it would be the latter, but if its % of the amplicon, then I'll need to use 0.1 or so to allow detection of reads that only partially align. I'll double check the amplicon sequence too, but I think the space might be an artifact of pasting into these github post windows- if I open it in Nano, its only a single line of text. cheers- Scott

milw commented 6 years ago

When I tried to use both reads for PE, I get a Flash error, and its suggestion of manually setting min and max overlap doesn't work (how to feed those parameters when calling CRISPResso?)

[Execution log]: Estimating average read length... Merging paired sequences with Flash... [FLASH] ERROR: Maximum overlap (-121) cannot be less than the minimum overlap (4). Please make sure you have provided the read length and fragment length correctly. Or, alternatively, specify the minimum and maximum overlap manually with the --min-overlap and --max-overlap options. [FLASH] FLASH did not complete successfully; exiting with failure status (1) Merging error, please check your input. ERROR: Flash failed to run, please check the log file.