metagenome-atlas / atlas

ATLAS - Three commands to start analyzing your metagenome data
https://metagenome-atlas.github.io/
BSD 3-Clause "New" or "Revised" License
377 stars 99 forks source link

Error in rule rename_gene_catalog, not gziped file #299

Closed MaxRubinBlum closed 4 years ago

MaxRubinBlum commented 4 years ago

Dear Silas, thank you for putting together a great pipeline. Running a test on my data - got the following issue:

[Mon Jul 6 19:09:53 2020] Finished job 35. 147 of 211 steps (70%) done

[Mon Jul 6 19:09:53 2020] localrule rename_gene_catalog: input: Genecatalog/all_genes/predicted_genes.fna, Genecatalog/all_genes/predicted_genes.faa, Genecatalog/clustering/orf2gene.tsv.gz, Genecatalog/representatives_of_clusters.fasta output: Genecatalog/gene_catalog.fna, Genecatalog/gene_catalog.faa jobid: 34 resources: mem=60, time=5

Job counts: count jobs 1 rename_gene_catalog 1 [Mon Jul 6 19:09:54 2020] Error in rule rename_gene_catalog: jobid: 0 output: Genecatalog/gene_catalog.fna, Genecatalog/gene_catalog.faa

RuleException: OSError in line 300 of /home/bioinf/miniconda3/envs/atlasenv/lib/python3.6/site-packages/atlas/rules/genecatalog.snakefile: Not a gzipped file (b'OR') File "/home/bioinf/miniconda3/envs/atlasenv/lib/python3.6/site-packages/atlas/rules/genecatalog.snakefile", line 300, in __rule_rename_gene_catalog File "/home/bioinf/miniconda3/envs/atlasenv/lib/python3.6/site-packages/pandas/io/parsers.py", line 678, in parser_f File "/home/bioinf/miniconda3/envs/atlasenv/lib/python3.6/site-packages/pandas/io/parsers.py", line 440, in _read File "/home/bioinf/miniconda3/envs/atlasenv/lib/python3.6/site-packages/pandas/io/parsers.py", line 787, in init File "/home/bioinf/miniconda3/envs/atlasenv/lib/python3.6/site-packages/pandas/io/parsers.py", line 1014, in _make_engine File "/home/bioinf/miniconda3/envs/atlasenv/lib/python3.6/site-packages/pandas/io/parsers.py", line 1708, in init File "pandas/_libs/parsers.pyx", line 539, in pandas._libs.parsers.TextReader.cinit File "pandas/_libs/parsers.pyx", line 737, in pandas._libs.parsers.TextReader._get_header File "pandas/_libs/parsers.pyx", line 932, in pandas._libs.parsers.TextReader._tokenize_rows File "pandas/_libs/parsers.pyx", line 2101, in pandas._libs.parsers.raise_parser_error File "/home/bioinf/miniconda3/envs/atlasenv/lib/python3.6/_compression.py", line 68, in readinto File "/home/bioinf/miniconda3/envs/atlasenv/lib/python3.6/gzip.py", line 463, in read File "/home/bioinf/miniconda3/envs/atlasenv/lib/python3.6/gzip.py", line 411, in _read_gzip_header File "/home/bioinf/miniconda3/envs/atlasenv/lib/python3.6/concurrent/futures/thread.py", line 56, in run Exiting because a job execution failed. Look above for error message [Mon Jul 6 19:50:51 2020] Finished job 75. 148 of 211 steps (70%) done Shutting down, this might take some time. Exiting because a job execution failed. Look above for error message Note the path to the log file for debugging. Documentation is available at: https://metagenome-atlas.readthedocs.io Issues can be raised at: https://github.com/metagenome-atlas/atlas/issues Complete log: /media/bioinf/Data12/Maxim/Zeev/AtlasTest/.snakemake/log/2020-07-06T144251.631985.snakemake.log [2020-07-06 19:50 CRITICAL] Command 'snakemake --snakefile /home/bioinf/miniconda3/envs/atlasenv/lib/python3.6/site-packages/atlas/Snakefile --directory /media/bioinf/Data12/Maxim/Zeev/AtlasTest --jobs 64 --rerun-incomplete --configfile '/media/bioinf/Data12/Maxim/Zeev/AtlasTest/config.yaml' --nolock --use-conda --conda-prefix /media/bioinf/Data/ATLAS/conda_envs all ' returned non-zero exit status 1. Please help with identifying the problem. Many thanks!

SilasK commented 4 years ago

Hey @MaxRubinBlum ,

The error is saying that the file Genecatalog/clustering/orf2gene.tsv.gz is not a gzipped file.

Can you check what it is? e.g. by.

file Genecatalog/clustering/orf2gene.tsv.gz
head Genecatalog/clustering/orf2gene.tsv.gz
gunzip -c Genecatalog/clustering/orf2gene.tsv.gz | head

Maybe it was not correctly written or it is empty. You may also remove it and restart atlas. What is value in your config file for:

genecatalog:
  source: contigs               
  clustermethod: linclust  
MaxRubinBlum commented 4 years ago

Many thanks for the swift reply!

The file is indeed not gzipped:

bioinf@mrblab-ws:/media/bioinf/Data12/Maxim/Zeev/AtlasTest$ file Genecatalog/clustering/orf2gene.tsv.gz Genecatalog/clustering/orf2gene.tsv.gz: ASCII text

head Genecatalog/clustering/orf2gene.tsv.gz ORF Gene Ein_9_10 Gene000001 Ein_9_11 Gene000002 Ein_9_12 Gene000003 Ein_9_17 Gene000004 Ein_9_18 Gene000005 Ein_9_20 Gene000006 Ein_9_21 Gene000007 Ein_9_24 Gene000008 Ein_9_25 Gene000009

for genecatalog I used the default settings:

genecatalog: source: contigs
clustermethod: linclust
minlength_nt: 100 minid: 0.95
coverage: 0.9 extra: '' SubsetSize: 500000

shall I just gzip the file and restart atlas - what would be the best practice for resuming the analysis? Thanks!

amojarro commented 4 years ago

Hi @SilasK,

I had the exact error as @MaxRubinBlum. I simply renamed orf2gene.tsv.gz to orf2gene.tsv then gzipped with: gzip orf2gene.tsv

I then ran: atlas run genecatalog -w work-dir/

MaxRubinBlum commented 4 years ago

Hi @amojarro,

many thanks! I will run it as you suggested.