pinellolab / CRISPResso2

Analysis of deep sequencing data for rapid and intuitive interpretation of genome editing experiments
Other
272 stars 94 forks source link

Crispressopooled giving ERROR: type object 'object' has no attribute 'dtype' #234

Closed Rajeefa closed 1 year ago

Rajeefa commented 2 years ago

I ran this command and it gave me CRISPRessoPooled -r1 TREATED_R1.fastq.gz -r2 TREATED_R2.fastq.gz -f crispressoexp_config.txt -x Genomes/hg38/genome --name AMPLICONS_AND_GENOME --gene_annotations gencodev41.gz --debug

and it gave me this error INFO @ Sun, 24 Jul 2022 06:51:12: Loading gene coordinates from annotation file: gencodev41.gz...

INFO @ Sun, 24 Jul 2022 06:51:14: Failed to load the gene annotations file.

INFO @ Sun, 24 Jul 2022 06:51:14: Mapping amplicons to the reference genome...

Traceback (most recent call last): File "/home/medtech/miniconda3/envs/crispresso2_env/lib/python3.8/site-packages/CRISPResso2/CRISPRessoPooledCORE.py", line 845, in main additional_columns_df = pd.DataFrame(additional_columns, columns=['Name', 'chr_id', 'bpstart', 'bpend', 'strand', 'Reference_Sequence']).set_index('Name') File "/home/medtech/.local/lib/python3.8/site-packages/pandas/core/frame.py", line 490, in init mgr = init_dict({}, index, columns, dtype=dtype) File "/home/medtech/.local/lib/python3.8/site-packages/pandas/core/internals/construction.py", line 239, in init_dict val = construct_1d_arraylike_from_scalar(np.nan, len(index), nan_dtype) File "/home/medtech/.local/lib/python3.8/site-packages/pandas/core/dtypes/cast.py", line 1440, in construct_1d_arraylike_from_scalar dtype = dtype.dtype AttributeError: type object 'object' has no attribute 'dtype' CRITICAL @ Sun, 24 Jul 2022 06:51:29:

ERROR: type object 'object' has no attribute 'dtype'

Please let me know what's causing this issue here.

Colelyman commented 2 years ago

Thanks for using CRISPResso2 @Rajeefa! Would you mind posting the version of CRISPResso2 that you are using?

Also, it looks like there is an error reading the gene annotation file. Did you happen to follow these instructions to download it before running CRISPResso2?

--gene_annotations: Gene Annotation Table from UCSC Genome Browser Tables http://genome.ucsc.edu/cgi-bin/hgTables?command=start, please select as table "knownGene", as output format "all fields from selected table" and as file returned "gzip compressed". (default: '')

Rajeefa commented 2 years ago

@Colelyman The version is CRISPResso2.2.9 . As mentioned in the instructions i downloaded gene annotations gencode v41 and tried but it also gave me the same error.

Colelyman commented 2 years ago

Thanks for the information. Do you happen to know if the file gencodev41.gz is in the current directory? If not, try specifying the path to the file.

Also, would you mind sending the crispressoexp_config.txt file? I will try and reproduce this error using those inputs.

Rajeefa commented 2 years ago

Hi Colelyman, Actually here i changed to the docker version of crispresso2pooled, made few changes in the number of fields in the crispressoexp_config.txt file renamed as text.txt and did run the command :

docker run -v ${PWD}:/DATA -w /DATA -i pinellolab/crispresso2 CRISPRessoPooled -r1 TREATED_R1.fastq.gz -r2 TREATED_R2.fastq.gz -f test.txt -x Genomes/hg38/genome --name AMPLICONS_AND_GENOME2 --gene_annotations gencodev41.gz --debug

INFO @ Sun, 07 Aug 2022 08:12:58: Checking dependencies...

INFO @ Sun, 07 Aug 2022 08:12:58: All the required dependencies are present!

INFO @ Sun, 07 Aug 2022 08:12:58: Amplicon description file and bowtie2 reference genome index files provided. The analysis will be perfomed using the reads that are aligned only to the amplicons provided and not to other genomic regions.

INFO @ Sun, 07 Aug 2022 08:12:58: Creating Folder CRISPRessoPooled_on_AMPLICONS_AND_GENOME2

WARNING @ Sun, 07 Aug 2022 08:12:58: Folder CRISPRessoPooled_on_AMPLICONS_AND_GENOME2 already exists.

INFO @ Sun, 07 Aug 2022 08:12:58: Processing input

INFO @ Sun, 07 Aug 2022 08:12:58: Merging paired sequences with Flash...

INFO @ Sun, 07 Aug 2022 08:12:58: Flash command: flash --allow-outies TREATED_R1.fastq.gz TREATED_R2.fastq.gz --max-overlap 100 --min-overlap 10 -z -d CRISPRessoPooled_on_AMPLICONS_AND_GENOME2 >>CRISPRessoPooled_on_AMPLICONS_AND_GENOME2/CRISPRessoPooled_RUNNING_LOG.txt 2>&1

INFO @ Sun, 07 Aug 2022 08:26:41: Done!

INFO @ Sun, 07 Aug 2022 08:30:51: Loading gene coordinates from annotation file: gencodev41.gz...

INFO @ Sun, 07 Aug 2022 08:30:54: Failed to load the gene annotations file.

INFO @ Sun, 07 Aug 2022 08:30:54: Matching header amplicon_name with amplicon_name.

INFO @ Sun, 07 Aug 2022 08:30:54: Matching header amplicon_seq with amplicon_seq.

INFO @ Sun, 07 Aug 2022 08:30:54: Matching header guide_seq with guide_seq.

INFO @ Sun, 07 Aug 2022 08:30:54: Matching header expected_hdr_amplicon_seq with expected_hdr_amplicon_seq.

INFO @ Sun, 07 Aug 2022 08:30:54: Header variable names in order: ['amplicon_name', 'amplicon_seq', 'guide_seq', 'expected_hdr_amplicon_seq']

INFO @ Sun, 07 Aug 2022 08:30:54: Detected header in amplicon file.

INFO @ Sun, 07 Aug 2022 08:30:54: Mapping amplicons to the reference genome...

INFO @ Sun, 07 Aug 2022 08:31:19: The amplicon [EMX1_1] was mapped to: chr1:40305668-40305894

INFO @ Sun, 07 Aug 2022 08:31:19: The amplicon [EMX1_2] was mapped to: chr1:162169736-162169950

INFO @ Sun, 07 Aug 2022 08:31:19: The amplicon [EMX1_3] was mapped to: chr3:137688812-137689057

INFO @ Sun, 07 Aug 2022 08:31:19: The amplicon [EMX1_4] was mapped to: chr5:173847924-173848172

INFO @ Sun, 07 Aug 2022 08:31:19: The amplicon [EMX1_5] was mapped to: chr6:23925087-23925344

INFO @ Sun, 07 Aug 2022 08:31:19: The amplicon [EMX1_6] was mapped to: chr11:36124906-36125163

INFO @ Sun, 07 Aug 2022 08:31:19: The amplicon [EMX1_7] was mapped to: chr12:13256461-13256702

INFO @ Sun, 07 Aug 2022 08:31:19: The amplicon [EMX1_8] was mapped to: chr12:122062588-122062788

INFO @ Sun, 07 Aug 2022 08:31:19: The amplicon [EMX1_9] was mapped to: chr13:42630573-42630807

INFO @ Sun, 07 Aug 2022 08:31:19: The amplicon [EMX1_10] was mapped to: chr17:41571401-41571632

INFO @ Sun, 07 Aug 2022 08:31:19: The amplicon [EMX1_11] was mapped to: chr20:20468382-20468610

INFO @ Sun, 07 Aug 2022 08:31:19: The amplicon [EMX2_1] was mapped to: chr4:29739156-29739406

INFO @ Sun, 07 Aug 2022 08:31:19: The amplicon [EMX2_2] was mapped to: chr7:22073609-22073864

INFO @ Sun, 07 Aug 2022 08:31:19: The amplicon [EMX2_3] was mapped to: chr12:45695907-45696134

INFO @ Sun, 07 Aug 2022 08:31:19: The amplicon [EMX2_4] was mapped to: chr17:41571364-41571620

INFO @ Sun, 07 Aug 2022 08:31:19: The uncompressed reference fasta file for Genomes/hg38/genome is already present! Skipping generation.

INFO @ Sun, 07 Aug 2022 08:31:19: Aligning reads to the provided genome index...

INFO @ Sun, 07 Aug 2022 08:31:19: Aligning with command: bowtie2 -x Genomes/hg38/genome -p 1 --end-to-end -N 0 --np 0 --mp 3,2 --score-min L,-5,-1.2000000000000002 -U CRISPRessoPooled_on_AMPLICONS_AND_GENOME2/out.extendedFrags.fastq.gz 2>>CRISPRessoPooled_on_AMPLICONS_AND_GENOME2/CRISPRessoPooled_RUNNING_LOG.txt| samtools view -bS - | samtools sort -@ 1 - -o CRISPRessoPooled_on_AMPLICONS_AND_GENOME2/AMPLICONS_AND_GENOME2_GENOME_ALIGNED.bam

[bam_sort_core] merging from 23 files and 1 in-memory blocks... INFO @ Sun, 07 Aug 2022 14:33:07: Deleting partially-completed demultiplexing in CRISPRessoPooled_on_AMPLICONS_AND_GENOME2/MAPPED_REGIONS/...

INFO @ Sun, 07 Aug 2022 14:33:07: Preparing to demultiplex reads aligned to the genome...

INFO @ Sun, 07 Aug 2022 14:33:18: Wrote demultiplexing commands to CRISPRessoPooled_on_AMPLICONS_AND_GENOME2/DEMUX_COMMANDS.txt

INFO @ Sun, 07 Aug 2022 14:33:18: Demultiplexing reads by location (732 genomic regions)...

and gave me the error:

CRITICAL @ Wed, 10 Aug 2022 02:33:18:

ERROR:

CRISPResso2 failed Traceback (most recent call last): File "/opt/conda/lib/python3.10/site-packages/CRISPResso2-2.2.9-py3.10-linux-x86_64.egg/CRISPResso2/CRISPRessoPooledCORE.py", line 1111, in main CRISPRessoMultiProcessing.run_parallel_commands(chr_commands, n_processes=n_processes_for_pooled, descriptor='Demultiplexing reads by location', continue_on_fail=args.skip_failed) File "/opt/conda/lib/python3.10/site-packages/CRISPResso2-2.2.9-py3.10-linux-x86_64.egg/CRISPResso2/CRISPRessoMultiProcessing.py", line 194, in run_parallel_commands raise e File "/opt/conda/lib/python3.10/site-packages/CRISPResso2-2.2.9-py3.10-linux-x86_64.egg/CRISPResso2/CRISPRessoMultiProcessing.py", line 181, in run_parallel_commands ret_vals = res.get(606060) # Without the timeout this blocking call ignores all signals. File "/opt/conda/lib/python3.10/multiprocessing/pool.py", line 767, in get raise TimeoutError multiprocessing.context.TimeoutError

Please find the attached file test.txt CRISPRessoPooled_RUNNING_LOG.txt

Could you please help me understand the cause of the error here.

Colelyman commented 2 years ago

Thanks for the update, I think there are a few ways that you can proceed: