Closed KellenX closed 3 years ago
Hi @KellenX,
Thanks for using CRISPResso!
Have you tried relaxing the --min_reads_to_use_region
parameter? The default requires 1000 reads to align to an amplicon before it is analyzed by CRISPResso. If you think that reads are aligning to these regions, try setting this value to a lower number (e.g. --min_reads_to_use_region 1
) and see if it processes those reads.
However, more concerning is the error gzip: stdin: unexpected end of file
and the fact that your region files are empty (0 bytes). Are your input fastq files gzipped? Or are they plain text files that end in the suffix .gz?
Can you run CRISPResso Pooled on the test data? Download the test inputs here: https://raw.githubusercontent.com/pinellolab/CRISPResso2/master/tests/Cas9.amplicons.txt https://raw.githubusercontent.com/pinellolab/CRISPResso2/master/tests/Both.Cas9.fastq
And if you run CRISPRessoPooled -r1 Both.Cas9.fastq -f Cas9.amplicons.txt -p 2 --keep_intermediate --min_reads_to_use_region 100
do you get output?
Hi @kclem Thank you for you guidance. However, your sample code didn't work on the platform I'm using.
I downloaded these three test inputs: https://raw.githubusercontent.com/pinellolab/CRISPResso2/master/tests/Cas9.amplicons.txt https://raw.githubusercontent.com/pinellolab/CRISPResso2/master/tests/Both.Cas9.fastq https://raw.githubusercontent.com/pinellolab/CRISPResso2/master/tests/Cas9.regions.txt
Then ran your command CRISPRessoPooled -r1 Both.Cas9.fastq -f Cas9.amplicons.txt -p 2 --keep_intermediate --min_reads_to_use_region 100
This is what I got: $ !CRISPRessoPooled CRISPRessoPooled -r1 Both.Cas9.fastq -f Cas9.amplicons.txt -p 2 --keep_intermediate --min_reads_to_use_region 100
~~~CRISPRessoPooled~~~
-Analysis of CRISPR/Cas9 outcomes from POOLED deep sequencing data-
_ _
' ) ' )
.-' _______________________ .-'
(____ | __ __ __ __ __ | (____
C)| \ ||__)/ \/ \| |_ | \ | C)| \
\ / || \__/\__/|__|__|__/ | \ /
\___/ |_______________________| \___/
[CRISPResso version 2.1.2]
[Note that starting in version 2.1.0 insertion quantification has been changed to only include insertions completely contained by the quantification window. To use the legacy quantification method (i.e. include insertions directly adjacent to the quantification window) please use the parameter --use_legacy_insertion_quantification] [For support contact kclement@mgh.harvard.edu]
INFO @ Fri, 23 Jul 2021 16:11:26: Checking dependencies...
INFO @ Fri, 23 Jul 2021 16:11:26: All the required dependencies are present!
INFO @ Fri, 23 Jul 2021 16:11:26: Only the Amplicon description file was provided. The analysis will be perfomed using only the provided amplicons sequences.
INFO @ Fri, 23 Jul 2021 16:11:26: Creating Folder CRISPRessoPooled_on_Both.Cas9
INFO @ Fri, 23 Jul 2021 16:11:26: Done!
INFO @ Fri, 23 Jul 2021 16:11:26: Processing input
INFO @ Fri, 23 Jul 2021 16:11:26: Creating a custom index file with all the amplicons...
INFO @ Fri, 23 Jul 2021 16:11:27: Align reads to the amplicons...
INFO @ Fri, 23 Jul 2021 16:11:27: Alignment command: bowtie2 -x CRISPRessoPooled_on_Both.Cas9/CUSTOM_BOWTIE2_INDEX -p 2 --end-to-end -N 0 --np 0 --mp 3,2 --score-min L,-5,-1.2 -U CRISPRessoPooled_on_Both.Cas9/Both.Cas9.fastq 2>>CRISPRessoPooled_on_Both.Cas9/CRISPRessoPooled_RUNNING_LOG.txt | samtools view -bS - > CRISPRessoPooled_on_Both.Cas9/CRISPResso_AMPLICONS_ALIGNED.bam
INFO @ Fri, 23 Jul 2021 16:11:28: Demultiplex reads and run CRISPResso on each amplicon...
INFO @ Fri, 23 Jul 2021 16:11:28:
Processing:FANC
gzip: stdin: unexpected end of file WARNING @ Fri, 23 Jul 2021 16:11:28: Skipping amplicon [FANC] because no reads align to it
INFO @ Fri, 23 Jul 2021 16:11:28:
Processing:HEK3
gzip: stdin: unexpected end of file WARNING @ Fri, 23 Jul 2021 16:11:28: Skipping amplicon [HEK3] because no reads align to it
INFO @ Fri, 23 Jul 2021 16:11:28: Running CRISPResso with 2 processes
INFO @ Fri, 23 Jul 2021 16:11:28: Finished all amplicons
WARNING @ Fri, 23 Jul 2021 16:11:28: Skipping the folder CRISPResso_on_FANC: not enough reads, incomplete, or empty folder.
WARNING @ Fri, 23 Jul 2021 16:11:28: Skipping the folder CRISPResso_on_HEK3: not enough reads, incomplete, or empty folder.
WARNING @ Fri, 23 Jul 2021 16:11:29: Less than half (0/500) of reads aligned to amplicons. Finding most frequent unaligned reads.
WARNING @ Fri, 23 Jul 2021 16:11:29: Perhaps one or more of the given amplicon sequences were incomplete or incorrect. Below is a list of the most frequent unaligned reads (in the first 10000 unaligned reads). Check this list to see if an amplicon is among these reads.
INFO @ Fri, 23 Jul 2021 16:11:29: All Done!
_
' )
.-'
(____
C)| \
\ /
\___/
Hm.. those files work for me on a couple of systems and should work for you as well.
How have you installed CRISPResso, or how are you running it?
I used conda 4.10.3 to install CRISPResso2 on my university computing cluster Operating System: CentOS Linux 7 (Core) CPE OS Name: cpe:/o:centos:centos:7 Kernel: Linux 3.10.0-1160.25.1.el7.x86_64
(base) Me@Cluster:~$ conda create -n crispresso2_env -c bioconda crispresso2 python=2.7
(base) Me@Cluster:~$ conda activate crispresso2_env
(crispresso2_env) Me@Cluster:~$ CRISPResso -h
This installation allowed me to run CRISPResso on one pair of fastq.gz inputs at a time. Can you please provide me a list of inputs to test running CRISPRessoBatch?
For batch: Download these files: https://raw.githubusercontent.com/pinellolab/CRISPResso2/master/tests/FANC.local.batch https://raw.githubusercontent.com/pinellolab/CRISPResso2/master/tests/FANC.Cas9.fastq https://raw.githubusercontent.com/pinellolab/CRISPResso2/master/tests/FANC.Untreated.fastq
Then you should be able to run:
CRISPRessoBatch -bs FANC.local.batch -a CGGATGTTCCAATCAGTACGCAGAGAGTCGCCGTCTCCAAGGTGAAAGCGGAAGTAGGGCCTTCGCGCACCTCATGGAATCCCTTCTGCAGCACCTGGATCGCTTTTCCGAGCTTCTGGCGGTCTCAAGCACTACCTACGTCAGCACCTGGGACCCCGCCACCGTGCGCCGGGCCTTGCAGTGGGCGCGCTACCTGCGCCACATCCATCGGCGCTTTGGTCGG -g GGAATCCCTTCTGCAGCACC -p 2 --debug --base_editor
I just tested CRISPRessoBatch
and it worked perfectly! It seems that this mysterious bug impaired CRISPRessoPooled
only. I'm willing to help with finding it out.
tldr; in your conda env run:
conda install tbb=2020.2
and then run your CRISPRessoPooled command. I hope that works.
Full answer: For the CRISPRessoPooled bit, I think I may know what is going on. I don't have a centos machine, but I tried the following:
docker run -it centos:centos7 /bin/bash
yum install -y wget unzip gcc openssl
wget https://repo.anaconda.com/miniconda/Miniconda3-latest-Linux-x86_64.sh
bash Miniconda3-latest-Linux-x86_64.sh -b -p /root/miniconda3
/root/miniconda3/condabin/conda init
source /root/.bashrc
conda create -y -n crispresso2_env -c bioconda crispresso2 python=2.7
conda activate crispresso2_env
conda install -y -c conda-forge openssl=1.0.2 #for samtools
wget https://github.com/pinellolab/CRISPResso2/archive/refs/heads/master.zip
unzip master.zip
cd CRISPResso2-master/
python setup.py install
wget https://raw.githubusercontent.com/pinellolab/CRISPResso2/master/tests/Cas9.amplicons.txt
wget https://raw.githubusercontent.com/pinellolab/CRISPResso2/master/tests/Both.Cas9.fastq
wget https://raw.githubusercontent.com/pinellolab/CRISPResso2/master/tests/Cas9.regions.txt
CRISPRessoPooled -r1 Both.Cas9.fastq -f Cas9.amplicons.txt -p 2 --keep_intermediate --min_reads_to_use_region 100
Then, in the CRISPRessoPooled_RUNNING_LOG.txt I see:
/root/miniconda3/envs/crispresso2_env/bin/bowtie2-build-s: error while loading shared libraries: libtbb.so.2: cannot open shared object file: No such file or directory
This can be solved by running
conda install tbb=2020.2
Now, everything works correctly on my end. Does this solve your problem as well?
Thank you for this reply! This problem is eventually solved by downgrading tbb.
Before, CRISPRessoPooled
failed on my end with
Name Version Build Channel tbb 2021.3.0 h4bd325d_0 conda-forge
By executing conda install tbb=2020.2
I get
Name Version Build Channel tbb 2020.2 h4bd325d_4 conda-forge
Everything works well now.
Describe the bug Hello, we are using CRISPRessoPooled(amplicon mode), to map reads to multiple amplicons and analyze base editing. However, no reads was mapped to any of my five reference amplicon sequences, therefore CRISPResso ended with "Skipping amplicon [*] because no reads align to it"
However, when I tried to separate reads into five pairs of FASTQ files with homemade script then run CRISPResso on each pair, CRISPResso worked perfectly.
With --keep_intermediate, AMPLICONS.fa captured all my five reference amplicon sequences, but MAPPING_STATISTICS.txt says: """ READS IN INPUTS:51049 READS AFTER PREPROCESSING:50004 READS ALIGNED:0 """
Five .fastq.gz files were generated but were all empty (size = 0b)
Expected behavior Pooled reads get sorted into five FASTQ files.
To reproduce Python 2.7.15 [CRISPResso version 2.1.2] installed on July 9, 2021 with "$ conda create -n crispresso2_env -c bioconda crispresso2 python=2.7"
The shell command: "$ CRISPRessoPooled -r1 VT01-mix5_R1_001.fastq.gz -r2 VT01-mix5_R2_001.fastq.gz -f VT01-mix5.txt -n VT01-mix5 -o crispressoPooled_BE_VT01-mix5 --base_editor_output --conversion_nuc_from C --conversion_nuc_to T -w 10 -wc -10"
In VT01-mix5.txt: """ Site2 CCTGGCTGAGCTAACTGTGACAGCATGTGGTAATTTTCCAGCCCGCTGGCCCTGTAAAGGAAACTGGAACACAAAGCATAGACTGCGGGGCGGGCCAGCCTGAATAGCTGCAAACAAGTGCAGAATATCTGATGATGTCATACGCACAGTTTGAC GAACACAAAGCATAGACTGC NA NA Site4 CTGGGTGGAAGGAAGGGAGGAAGGGCGAGGCAGAGGGTCCAAAGCAGGATGACAGGCAGGGGCACCGCGGCGCCCCGGTGGCACTGCGGCTGGAGGTGGGGGTTAAAGCGGAGACTCTGGTGCTGTGTGACTACAGTGGGGGCCCTGCCCTCTCTGAGCCCCCGCCTCCAGGCCTGTGTGTGTGTCTCCGTTCGGGTTGA GGCACTGCGGCTGGAGGTGG NA NA HBG115 ccccttccccacactatctcaatgcaaatatctgtctgaaacggtccctggctaaactccacccatgggttggccagccttgccttgaccaatagccttgacaaggcaaacttgaccaatagtcttagagtatccagtgaggccaggggccggcggctggctagggatgaagaat cttgaccaatagccttgaca NA NA EMX GCGGCTGCGACCATGTTCCAGCCCGCGGCCAAGCGCGGCTTTACCATAGAGTCCTTGGTGGCCAAGGACGGCGGCACCGGCGGGGGCACTGGCGGCGGGGGCGCGGGCTCCCATCTCCTGGCGGCGGCCGCCTCCGAGGAACCGCTCCGGCCCACGGCGCTCAACTACCCTCACCCCAG AAGGACGGCGGCACCGGCGG NA NA FGF6 CTGCTGACATGAAACCAAAGCCTCCATCGGGCACTCGGGTTGAGAGCAGAGGGACCCAGGCTGAGCCGCGGCCGGTAGAGACCATGGCTCGGGGACGCTCTCTAGCTCGCCCGCTTGCTCCGTCCCTAGTTGATGATATTTGATTTGACCTATCTTTATAGTTGCTTCAGCAGCCCGGCCCTCCCCTCCACCCATCATCGCCCTGACGTCAACCCGCCCAGCTCACTTATCCAGGGCTGTAAC CCCGCTTGCTCCGTCCCTAG NA NA """
I'd like to send the input files VT01-mix5_R1_001.fastq.gz, VT01-mix5_R2_001.fastq.gz and intermediate files to you by email if you need them.
Debug output Paste the entire output when you run CRISPResso with the flag
--debug
.""" $ CRISPRessoPooled --base_editor_output --conversion_nuc_from C --conversion_nuc_to T -w 10 -wc -10 -r1 VT01-mix5_R1_001.fastq.gz -r2 VT01-mix5_R2_001.fastq.gz -f VT01-mix5.txt -n VT01-mix5 -o crispressoPooled_BE_VT01-mix5 --keep_intermediate --debug
[Note that starting in version 2.1.0 insertion quantification has been changed to only include insertions completely contained by the quantification window. To use the legacy quantification method (i.e. include insertions directly adjacent to the quantification window) please use the parameter --use_legacy_insertion_quantification] [For support contact kclement@mgh.harvard.edu]
INFO @ Thu, 22 Jul 2021 16:17:27: Checking dependencies...
INFO @ Thu, 22 Jul 2021 16:17:27: All the required dependencies are present!
INFO @ Thu, 22 Jul 2021 16:17:27: Only the Amplicon description file was provided. The analysis will be perfomed using only the provided amplicons sequences.
INFO @ Thu, 22 Jul 2021 16:17:27: Creating Folder /cache/home/Data20210512/crispressoPooled_BE_VT01-mix5/CRISPRessoPooled_on_VT01-mix5
WARNING @ Thu, 22 Jul 2021 16:17:27: Folder /cache/home/Data20210512/crispressoPooled_BE_VT01-mix5/CRISPRessoPooled_on_VT01-mix5 already exists.
INFO @ Thu, 22 Jul 2021 16:17:27: Processing input
INFO @ Thu, 22 Jul 2021 16:17:27: Merging paired sequences with Flash...
INFO @ Thu, 22 Jul 2021 16:17:28: Done!
INFO @ Thu, 22 Jul 2021 16:17:29: Creating a custom index file with all the amplicons...
INFO @ Thu, 22 Jul 2021 16:17:29: Align reads to the amplicons...
INFO @ Thu, 22 Jul 2021 16:17:29: Alignment command: bowtie2 -x /cache/home/Data20210512/crispressoPooled_BE_VT01-mix5/CRISPRessoPooled_on_VT01-mix5/CUSTOM_BOWTIE2_INDEX -p 1 --end-to-end -N 0 --np 0 --mp 3,2 --score-min L,-5,-1.2 -U /cache/home/Data20210512/crispressoPooled_BE_VT01-mix5/CRISPRessoPooled_on_VT01-mix5/out.extendedFrags.fastq.gz 2>>/cache/home/Data20210512/crispressoPooled_BE_VT01-mix5/CRISPRessoPooled_on_VT01-mix5/CRISPRessoPooled_RUNNING_LOG.txt | samtools view -bS - > /cache/home/Data20210512/crispressoPooled_BE_VT01-mix5/CRISPRessoPooled_on_VT01-mix5/CRISPResso_AMPLICONS_ALIGNED.bam
INFO @ Thu, 22 Jul 2021 16:17:29: Demultiplex reads and run CRISPResso on each amplicon...
INFO @ Thu, 22 Jul 2021 16:17:29:
Processing:Site2
gzip: stdin: unexpected end of file WARNING @ Thu, 22 Jul 2021 16:17:29: Skipping amplicon [Site2] because no reads align to it
INFO @ Thu, 22 Jul 2021 16:17:29:
Processing:Site4
gzip: stdin: unexpected end of file WARNING @ Thu, 22 Jul 2021 16:17:29: Skipping amplicon [Site4] because no reads align to it
INFO @ Thu, 22 Jul 2021 16:17:29:
Processing:HBG115
gzip: stdin: unexpected end of file WARNING @ Thu, 22 Jul 2021 16:17:29: Skipping amplicon [HBG115] because no reads align to it
INFO @ Thu, 22 Jul 2021 16:17:29:
Processing:EMX
gzip: stdin: unexpected end of file WARNING @ Thu, 22 Jul 2021 16:17:29: Skipping amplicon [EMX] because no reads align to it
INFO @ Thu, 22 Jul 2021 16:17:29:
Processing:FGF6
gzip: stdin: unexpected end of file WARNING @ Thu, 22 Jul 2021 16:17:29: Skipping amplicon [FGF6] because no reads align to it
INFO @ Thu, 22 Jul 2021 16:17:29: Running CRISPResso with 1 processes
INFO @ Thu, 22 Jul 2021 16:17:29: Finished all amplicons
WARNING @ Thu, 22 Jul 2021 16:17:30: Skipping the folder CRISPResso_on_Site2: not enough reads, incomplete, or empty folder.
WARNING @ Thu, 22 Jul 2021 16:17:30: Skipping the folder CRISPResso_on_Site4: not enough reads, incomplete, or empty folder.
WARNING @ Thu, 22 Jul 2021 16:17:30: Skipping the folder CRISPResso_on_HBG115: not enough reads, incomplete, or empty folder.
WARNING @ Thu, 22 Jul 2021 16:17:30: Skipping the folder CRISPResso_on_EMX: not enough reads, incomplete, or empty folder.
WARNING @ Thu, 22 Jul 2021 16:17:30: Skipping the folder CRISPResso_on_FGF6: not enough reads, incomplete, or empty folder.
WARNING @ Thu, 22 Jul 2021 16:17:30: Less than half (0/51049) of reads aligned to amplicons. Finding most frequent unaligned reads.
WARNING @ Thu, 22 Jul 2021 16:17:30: Perhaps one or more of the given amplicon sequences were incomplete or incorrect. Below is a list of the most frequent unaligned reads (in the first 10000 unaligned reads). Check this list to see if an amplicon is among these reads.
INFO @ Thu, 22 Jul 2021 16:17:30: All Done!
"""