pinellolab / CRISPResso2

Analysis of deep sequencing data for rapid and intuitive interpretation of genome editing experiments
Other
274 stars 95 forks source link

alignment problem #21

Closed chrispam416 closed 5 years ago

chrispam416 commented 5 years ago

Describe the bug A clear and concise description of what the bug is. Alignment issue - main concern here is probably the amplicon sequence (is it the sgRNA sequence (a.k.a the recognition sequence), or would the adapters/index sequences need to be included?)

ERROR: Error: No alignments were found

Expected behavior A clear and concise description of what you expected to happen. Alignment of amplicon sequences - Not sure which one is supposed to be used - the sgRNA region/scaffold region of the amplicon product

To reproduce CRISPResso command to reproduce the behavior. docker run -v ${PWD}:/DATA -w /DATA -i pinellolab/crispresso2 CRISPResso --fastq_r1 G1.fastq.gz --amplicon_seq ACACTCTTTCCCTACACGACGCTCTTCCGATCT

Debug output Paste the entire output when you run CRISPResso with the flag --debug. Traceback (most recent call last): File "/opt/conda/lib/python2.7/site-packages/CRISPResso2-2.0.30-py2.7-linux-x86_64.egg/CRISPResso2/CRISPRessoCORE.py", line 419, in main CRISPRessoShared.check_file(args.fastq_r1) File "/opt/conda/lib/python2.7/site-packages/CRISPResso2-2.0.30-py2.7-linux-x86_64.egg/CRISPResso2/CRISPRessoShared.py", line 259, in check_file raise BadParameterException("The specified file '"+filename + "' cannot be opened.\nAvailable files in current directory: " + str(files_in_dir)) BadParameterException: The specified file 'G1.fastq.gz' cannot be opened. Available files in current directory: ['nhej.r1.fastq.gz', 'CRISPResso_on_nhej', 'Homo_sapiens.zip', 'hg19', 'CRISPResso_on_nhej.html', 'Test_data_20-8-19', 'nhej.r2.fastq.gz']

kclem commented 5 years ago

CRISPResso is designed to run on amplicon sequencing data -- where you have designed primers to amplify a region around your predicted edit site (usually 100-200bp). The reference (wildtype, unedited) sequence for that 100-200bp region (the amplicon product) is provided using the -a parameter.

The sgRNA sequence (recognition sequence) is provided using the -g parameter.
Please trim your adapters/index sequences from your reads (in G1.fastq.gz) before running CRISPResso, or use the --trim_sequences parameter to run trimmomatic inside CRISPResso (make sure to specify --trimmomatic_options_string e.g. "ILLUMINACLIP:NexteraPE-PE.fa:0:90:10:0:true", where NexteraPE-PE.fa is a file containing sequences of adapters to be trimmed.

After trimming, make sure you provide the absolute path to the fastq file. In your debug output, it says that the file 'G1.fastq.gz' can't be found -- the only files in the current directory are listed (['nhej.r1.fastq.gz', 'CRISPResso_on_nhej', 'Homo_sapiens.zip', 'hg19', 'CRISPResso_on_nhej.html', 'Test_data_20-8-19', 'nhej.r2.fastq.gz']).

Please run CRISPResso from the directory where your G1.fastq.gz file is.