metagenome-atlas / atlas

ATLAS - Three commands to start analyzing your metagenome data
https://metagenome-atlas.github.io/
BSD 3-Clause "New" or "Revised" License
368 stars 97 forks source link

Error in predict_genes_genomes.py #254

Closed spleonard1 closed 4 years ago

spleonard1 commented 4 years ago

I can't make any sense of this error in the "predict_genes_genomes" rule. Any ideas? Running on a linux cluster. Thanks!

Executing: snakemake --snakefile /stor/work/Ochman/sean/projects/atlas-tutorial/atlas/atlas/Snakefile --directory /stor/work/Ochman/sean/projects/bee_metagenomes/full_run_1 --jobs 72 --rerun-incomplete --configfile '/stor/work/Ochman/sean/projects/bee_metagenomes/full_run_1/config.yaml' --nolock --use-conda --conda-prefix /stor/work/Ochman/sean/projects/bee_metagenomes/databases/conda_envs all
Building DAG of jobs... Updating job 239 (run_all_checkm_lineage_wf). Updating job 6761 (build_db_genomes). Updating job 240 (combine_bined_coverages_MAGs). Updating job 241 (combine_coverages_MAGs). Updating job 243 (predict_genes_genomes). Updating job 2659 (identify). Updating job 245 (classify). Updating job 4 (genomes). Using shell: /bin/bash Provided cores: 72 Rules claiming more threads will be scaled down. Job counts: count jobs 1 add_eggNOG_header 1 align 1 all 1 bam_2_sam_MAGs 1 classify 1 cluster_genes 1 combine_bined_coverages_MAGs 1 combine_coverages_MAGs 1 combine_egg_nogg_annotations 1 concat_genes 1 filter_genes 1 gene_subsets 1 genecatalog 1 genomes 1 get_rep_proteins 1 identify 1 infer 1 pileup_MAGs 1 predict_genes_genomes 1 rename_gene_catalog 1 rename_protein_catalog 21

[Tue Oct 22 11:34:00 2019] rule bam_2_sam_MAGs: input: genomes/alignments/SRR7986791.bam output: genomes/alignments/SRR7986791.sam jobid: 2599 wildcards: sample=SRR7986791 threads: 8 resources: mem=80

[Tue Oct 22 11:34:00 2019] rule predict_genes_genomes: input: genomes/genomes output: genomes/annotations/genes log: logs/genomes/prodigal.log jobid: 243 threads: 8

[Tue Oct 22 11:34:00 2019] rule identify: input: genomes/genomes, /stor/work/Ochman/sean/projects/bee_metagenomes/databases/GTDB-TK/downloaded_success output: genomes/taxonomy/identify log: logs/taxonomy/gtdbtk_identify.txt, genomes/taxonomy/gtdbtk.log jobid: 2659 threads: 8

Job counts: count jobs 1 bam_2_sam_MAGs 1 Activating conda environment: /stor/work/Ochman/sean/projects/bee_metagenomes/databases/conda_envs/96f9accc java -ea -Xmx200m -cp /stor/work/Ochman/sean/projects/bee_metagenomes/databases/conda_envs/f6f0da5b/opt/bbmap-37.78/current/ jgi.ReformatReads in=genomes/alignments/SRR7986791.bam out=genomes/alignments/SRR7986791.sam sam=1.3 Executing jgi.ReformatReads [in=genomes/alignments/SRR7986791.bam, out=genomes/alignments/SRR7986791.sam, sam=1.3]

Found samtools 1.3.1 Input is being processed as unpaired Waiting on header to be read from a sam file. Input: 1540316 reads 183523890 bases Output: 1540316 reads (100.00%) 183523890 bases (100.00%)

Time: 8.030 seconds. Reads Processed: 1540k 191.82k reads/sec Bases Processed: 183m 22.85m bases/sec [Tue Oct 22 11:34:12 2019] Finished job 2599. 1 of 21 steps (5%) done

[Tue Oct 22 11:34:12 2019] rule pileup_MAGs: input: genomes/alignments/SRR7986791.sam output: genomes/alignments/SRR7986791_base_coverage.txt.gz, genomes/alignments/SRR7986791_coverage_histogram.txt, genomes/alignments/SRR7986791_coverage.txt, genomes/alignments/SRR7986791_coverage_binned.txt log: logs/genomes/alignments/pilup_SRR7986791.log jobid: 1768 wildcards: sample=SRR7986791 threads: 8 resources: mem=80, java_mem=68

Activating conda environment: /stor/work/Ochman/sean/projects/bee_metagenomes/databases/conda_envs/f6f0da5b Job counts: count jobs 1 predict_genes_genomes 1 Traceback (most recent call last): File "/stor/work/Ochman/sean/projects/bee_metagenomes/full_run_1/.snakemake/shadow/tmp8mzrmw3k/.snakemake/scripts/tmp7l63qsgu.predict_genes_of_genomes.py", line 3, in import sys; sys.path.extend(["/stor/home/spl552/miniconda3/envs/atlas/lib/python3.6/site-packages", "/stor/work/Ochman/sean/projects/atlas-tutorial/atlas/atlas/rules"]); import pickle; snakemake = pickle.loads(b'\x80\x03csnakemake.script\nSnakemake\nq\x00)\x81q\x01}q\x02(X\x05\x00\x00\x00inputq\x03csnakemake.io\nInputFiles\nq\x04)\x81q\x05X\x0f\x00\x00\x00genomes/genomesq\x06a}q\x07(X\x06\x00\x00\x00_namesq\x08}q\tX\x03\x00\x00\x00dirq\nK\x00N\x86q\x0bsh\nh\x06ubX\x06\x00\x00\x00outputq\x0ccsnakemake.io\nOutputFiles\nq\r)\x81q\x0eX\x19\x00\x00\x00genomes/annotations/genesq\x0fa}q\x10h\x08}q\x11sbX\x06\x00\x00\x00paramsq\x12csnakemake.io\nParams\nq\x13)\x81q\x14}q\x15h\x08}q\x16sbX\t\x00\x00\x00wildcardsq\x17csnakemake.io\nWildcards\nq\x18)\x81q\x19}q\x1ah\x08}q\x1bsbX\x07\x00\x00\x00threadsq\x1cK\x08X\t\x00\x00\x00resourcesq\x1dcsnakemake.io\nResources\nq\x1e)\x81q\x1f(K\x08K\x01e}q (h\x08}q!(X\x06\x00\x00\x00_coresq"K\x00N\x86q#X\x06\x00\x00\x00_nodesq$K\x01N\x86q%uh"K\x08h$K\x01ubX\x03\x00\x00\x00logq&csnakemake.io\nLog\nq\')\x81q(X\x19\x00\x00\x00logs/genomes/prodigal.logq)a}q*h\x08}q+sbX\x06\x00\x00\x00configq,}q-(X\t\x00\x00\x00data_typeq.X\n\x00\x00\x00metagenomeq/X\x06\x00\x00\x00tmpdirq0X\x04\x00\x00\x00/tmpq1h\x1cK\x08X\r\x00\x00\x00simplejob_memq2K\nX\x11\x00\x00\x00simplejob_threadsq3K\x04X\x03\x00\x00\x00memq4KPX\t\x00\x00\x00large_memq5K\xfaX\r\x00\x00\x00large_threadsq6K\x10X\x18\x00\x00\x00preprocess_adapter_min_kq7K\x08X\x1f\x00\x00\x00preprocess_minimum_base_qualityq8K\nX$\x00\x00\x00preprocess_allowable_kmer_mismatchesq9K\x01X&\x00\x00\x00preprocess_reference_kmer_match_lengthq:K\x1bX&\x00\x00\x00preprocess_minimum_passing_read_lengthq;K3X!\x00\x00\x00preprocess_minimum_base_frequencyq<G?\xa9\x99\x99\x99\x99\x99\x9aX\x0b\x00\x00\x00deduplicateq=\x88X"\x00\x00\x00error_correction_overlapping_pairsq>\x88X\x15\x00\x00\x00contaminant_max_indelq?K\x14X\x15\x00\x00\x00contaminant_min_ratioq@G?\xe4\xcc\xcc\xcc\xcc\xcc\xcdX\x17\x00\x00\x00contaminant_kmer_lengthqAK\rX\x18\x00\x00\x00contaminant_minimum_hitsqBK\x01X\x15\x00\x00\x00contaminant_ambiguousqCX\x04\x00\x00\x00bestqDX\x17\x00\x00\x00duplicates_only_opticalqE\x89X\x1e\x00\x00\x00duplicates_allow_substitutionsqFK\x02X\x1f\x00\x00\x00normalize_reads_before_assemblyqG\x89X\x19\x00\x00\x00normalization_kmer_lengthqHK\x15X\x1a\x00\x00\x00normalization_target_depthqIM\x10\'X\x1b\x00\x00\x00normalization_minimum_kmersqJK\x03X \x00\x00\x00error_correction_before_assemblyqK\x88X\x1b\x00\x00\x00merge_pairs_before_assemblyqL\x88X\t\x00\x00\x00merging_kqMK>X\x0f\x00\x00\x00merging_extend2qNK(X\r\x00\x00\x00merging_flagsqOX\x11\x00\x00\x00ecct iterations=5qPX\t\x00\x00\x00assemblerqQX\x06\x00\x00\x00spadesqRX\x0f\x00\x00\x00assembly_memoryqSK\xfaX\x10\x00\x00\x00assembly_threadsqTK\x08X\x11\x00\x00\x00megahit_min_countqUK\x02X\r\x00\x00\x00megahit_k_minqVK\x15X\r\x00\x00\x00megahit_k_maxqWKyX\x0e\x00\x00\x00megahit_k_stepqXK\x14X\x13\x00\x00\x00megahit_merge_levelqYX\x07\x00\x00\x0020,0.98qZX\x13\x00\x00\x00megahit_prune_levelq[K\x02X\x17\x00\x00\x00megahit_low_local_ratioq\G?\xc9\x99\x99\x99\x99\x99\x9aX\x0e\x00\x00\x00megahit_presetq]X\x07\x00\x00\x00defaultq^X\x15\x00\x00\x00minimum_contig_lengthq_M,\x01X\x1f\x00\x00\x00prefilter_minimum_contig_lengthq`K\xc8X\x08\x00\x00\x00spades_kqaX\x04\x00\x00\x00autoqbX\x14\x00\x00\x00spades_use_scaffoldsqc\x89X\r\x00\x00\x00spades_presetqdX\x04\x00\x00\x00metaqeX\x0c\x00\x00\x00spades_extraqfX\x00\x00\x00\x00qgX\x17\x00\x00\x00spades_skip_BayesHammerqh\x88X\r\x00\x00\x00longread_typeqiX\x04\x00\x00\x00noneqjX\x0e\x00\x00\x00filter_contigsqk\x88X\x18\x00\x00\x00minimum_average_coverageqlK\x01X\x1d\x00\x00\x00minimum_percent_covered_basesqmK\x14X\x14\x00\x00\x00minimum_mapped_readsqnK\x00X\x0e\x00\x00\x00contig_trim_bpqoK\x00X\x16\x00\x00\x00minimum_region_overlapqpK\x01X\x1c\x00\x00\x00feature_counts_allow_overlapqq\x88X\x1f\x00\x00\x00contig_count_multi_mapped_readsqr\x89X\r\x00\x00\x00contig_min_idqsG?\xe8Q\xeb\x85\x1e\xb8RX\x16\x00\x00\x00contig_map_paired_onlyqt\x88X!\x00\x00\x00contig_max_distance_between_pairsquM\xe8\x03X\x19\x00\x00\x00maximum_counted_map_sitesqvK\nX\x0b\x00\x00\x00genecatalogqw}qx(X\x06\x00\x00\x00sourceqyX\x07\x00\x00\x00genomesqzX\r\x00\x00\x00clustermethodq{X\x08\x00\x00\x00linclustq|X\t\x00\x00\x00minlengthq}KdX\x05\x00\x00\x00minidq~G?\xec\xcc\xcc\xcc\xcc\xcc\xcdX\x08\x00\x00\x00coverageq\x7fG?\xec\xcc\xcc\xcc\xcc\xcc\xcdX\x05\x00\x00\x00extraq\x80hgX\n\x00\x00\x00SubsetSizeq\x81J \xa1\x07\x00X\x0c\x00\x00\x00minlength_ntq\x82KduX\x16\x00\x00\x00perform_genome_binningq\x83\x88X\x0c\x00\x00\x00final_binnerq\x84X\x07\x00\x00\x00DASToolq\x85X\x06\x00\x00\x00binnerq\x86]q\x87(X\x07\x00\x00\x00metabatq\x88X\x06\x00\x00\x00maxbinq\x89eX\x07\x00\x00\x00metabatq\x8a}q\x8b(X\x0b\x00\x00\x00sensitivityq\x8cX\t\x00\x00\x00sensitiveq\x8dX\x11\x00\x00\x00min_contig_lengthq\x8eM\xdc\x05uX\x07\x00\x00\x00concoctq\x8f}q\x90(X\x12\x00\x00\x00Nexpected_clustersq\x91K\xc8X\x0b\x00\x00\x00read_lengthq\x92KdX\x0b\x00\x00\x00Niterationsq\x93M\xf4\x01h\x8eM\xe8\x03uX\x06\x00\x00\x00maxbinq\x94}q\x95(X\r\x00\x00\x00max_iterationq\x96K2X\x0e\x00\x00\x00prob_thresholdq\x97G?\xe9\x99\x99\x99\x99\x99\x9ah\x8eM\xe8\x03uX\x07\x00\x00\x00DASToolq\x98}q\x99(X\r\x00\x00\x00search_engineq\x9aX\x07\x00\x00\x00diamondq\x9bX\x0f\x00\x00\x00score_thresholdq\x9cG?\xd9\x99\x99\x99\x99\x99\x9aX\x11\x00\x00\x00duplicate_penaltyq\x9dG?\xe3333333X\x0f\x00\x00\x00megabin_penaltyq\x9eG?\xe0\x00\x00\x00\x00\x00\x00uX\x0b\x00\x00\x00annotationsq\x9f]q\xa0(X\t\x00\x00\x00gtdb_treeq\xa1X\r\x00\x00\x00gtdb_taxonomyq\xa2X\x0f\x00\x00\x00checkm_taxonomyq\xa3eX\x14\x00\x00\x00genome_dereplicationq\xa4}q\xa5(X\x06\x00\x00\x00filterq\xa6}q\xa7(X\x08\x00\x00\x00noFilterq\xa8\x89X\x06\x00\x00\x00lengthq\xa9M\x88\x13X\x0c\x00\x00\x00completenessq\xaaK2X\r\x00\x00\x00contaminationq\xabK\nuX\x05\x00\x00\x00scoreq\xac}q\xad(h\xaaK\x01h\xabK\x05X\x14\x00\x00\x00strain_heterogeneityq\xaeK\x00X\x03\x00\x00\x00N50q\xafG?\xe0\x00\x00\x00\x00\x00\x00h\xa9K\x00uX\x03\x00\x00\x00ANIq\xb0G?\xeeffffffX\x07\x00\x00\x00overlapq\xb1G?\xe3333333X\x0b\x00\x00\x00sketch_sizeq\xb2M\x88\x13X\x0e\x00\x00\x00opt_parametersq\xb3hguX\t\x00\x00\x00cat_rangeq\xb4K\x05X\x0c\x00\x00\x00cat_fractionq\xb5G?\xd3333333X\x08\x00\x00\x00java_memq\xb6K@X\x0c\x00\x00\x00database_dirq\xb7X9\x00\x00\x00/stor/work/Ochman/sean/projects/bee_metagenomes/databasesq\xb8X\x12\x00\x00\x00interleaved_fastqsq\xb9\x89X\x13\x00\x00\x00preprocess_adaptersq\xbaXE\x00\x00\x00/stor/work/Ochman/sean/projects/bee_metagenomes/databases/adapters.faq\xbbX\x16\x00\x00\x00contaminant_referencesq\xbc}q\xbd(X\x04\x00\x00\x00PhiXq\xbeXJ\x00\x00\x00/stor/work/Ochman/sean/projects/bee_metagenomes/databases/phiX174_virus.faq\xbfX\x04\x00\x00\x00apisq\xc0XF\x00\x00\x00/stor/work/Ochman/sean/projects/bee_metagenomes/databases/apis_mell.faq\xc1X\x05\x00\x00\x00nos_cq\xc2XK\x00\x00\x00/stor/work/Ochman/sean/projects/bee_metagenomes/databases/nosema_ceranae.faq\xc3X\x05\x00\x00\x00nos_aq\xc4XH\x00\x00\x00/stor/work/Ochman/sean/projects/bee_metagenomes/databases/nosema_apis.faq\xc5X\x03\x00\x00\x00lotq\xc6XL\x00\x00\x00/stor/work/Ochman/sean/projects/bee_metagenomes/databases/lotmaria_passim.faq\xc7uX\x0b\x00\x00\x00diamond_memq\xc8KdX\x0f\x00\x00\x00diamond_threadsq\xc9K\x0cuX\x04\x00\x00\x00ruleq\xcaX\x15\x00\x00\x00predict_genes_genomesq\xcbX\x0f\x00\x00\x00bench_iterationq\xccNX\t\x00\x00\x00scriptdirq\xcdX@\x00\x00\x00/stor/work/Ochman/sean/projects/atlas-tutorial/atlas/atlas/rulesq\xceub.'); from snakemake.logging import logger; logger.printshellcmds = False; real_file = file; file__ = '/stor/work/Ochman/sean/projects/atlas-tutorial/atlas/atlas/rules/predict_genes_of_genomes.py'; File "/stor/home/spl552/miniconda3/envs/atlas/lib/python3.6/site-packages/snakemake/init__.py", line 21, in from snakemake.workflow import Workflow File "/stor/home/spl552/miniconda3/envs/atlas/lib/python3.6/site-packages/snakemake/workflow.py", line 30, in from snakemake.dag import DAG File "/stor/home/spl552/miniconda3/envs/atlas/lib/python3.6/site-packages/snakemake/dag.py", line 1618 return f"#{hex_r:0>2X}{hex_g:0>2X}{hex_b:0>2X}" ^ SyntaxError: invalid syntax [Tue Oct 22 11:34:23 2019] Error in rule predict_genes_genomes: jobid: 0 output: genomes/annotations/genes log: logs/genomes/prodigal.log (check log file(s) for error message) conda-env: /stor/work/Ochman/sean/projects/bee_metagenomes/databases/conda_envs/f6f0da5b

RuleException: CalledProcessError in line 451 of /stor/work/Ochman/sean/projects/atlas-tutorial/atlas/atlas/rules/genomes.smk: Command 'source /stor/home/spl552/miniconda3/bin/activate '/stor/work/Ochman/sean/projects/bee_metagenomes/databases/conda_envs/f6f0da5b'; set -euo pipefail; python /stor/work/Ochman/sean/projects/bee_metagenomes/full_run_1/.snakemake/shadow/tmp8mzrmw3k/.snakemake/scripts/tmp7l63qsgu.predict_genes_of_genomes.py' returned non-zero exit status 1. File "/stor/work/Ochman/sean/projects/atlas-tutorial/atlas/atlas/rules/genomes.smk", line 451, in __rule_predict_genes_genomes File "/stor/home/spl552/miniconda3/envs/atlas/lib/python3.6/concurrent/futures/thread.py", line 56, in run Exiting because a job execution failed. Look above for error message Removing temporary output file genomes/alignments/SRR7986791.sam. Removing temporary output file genomes/alignments/SRR7986791_base_coverage.txt.gz. Removing temporary output file genomes/alignments/SRR7986791_coverage_histogram.txt. [Tue Oct 22 11:34:28 2019] Finished job 1768. 2 of 21 steps (10%) done [Tue Oct 22 11:36:16 2019] Finished job 2659. 3 of 21 steps (14%) done Shutting down, this might take some time. Exiting because a job execution failed. Look above for error message Note the path to the log file for debugging. Documentation is available at: https://metagenome-atlas.readthedocs.io Issues can be raised at: https://github.com/metagenome-atlas/atlas/issues Complete log: /stor/work/Ochman/sean/projects/bee_metagenomes/full_run_1/.snakemake/log/2019-10-22T113336.321339.snakemake.log [2019-10-22 11:36 CRITICAL] Command 'snakemake --snakefile /stor/work/Ochman/sean/projects/atlas-tutorial/atlas/atlas/Snakefile --directory /stor/work/Ochman/sean/projects/bee_metagenomes/full_run_1 --jobs 72 --rerun-incomplete --configfile '/stor/work/Ochman/sean/projects/bee_metagenomes/full_run_1/config.yaml' --nolock --use-conda --conda-prefix /stor/work/Ochman/sean/projects/bee_metagenomes/databases/conda_envs all ' returned non-zero exit status 1.

jmtsuji commented 4 years ago

@spleonard1 Which version of ATLAS are you using? I was recently involved in updating this rule to work across multiple threads, but this change is only in the development version thus far.

spleonard1 commented 4 years ago

I'm current on the master branch, version 2.2.1

jmtsuji commented 4 years ago

@spleonard1 I was able to reproduce it on my Linux server.

This might be a snakemake issue. I reverted to Snakemake 5.5.4 in the ATLAS conda env (instead of 5.7.1), and it fixes the issue.

@SilasK Any idea of what might be going on? It appears that snakemake is having trouble executing internal scripts (see the cryptic error message provided by @spleonard1 -- I got something similar).

SilasK commented 4 years ago

@jmtsuji Do you think it might be related to the multi threading? In the script, You are using pool.dummy which is only a wrapper for threading module, is there a reason why not using pool.pool

Or to the logging? do you get the same error when you remove the logging altogether?

jmtsuji commented 4 years ago

@SilasK Indeed, even when I reverted predict_genes_of_genomes.py to the version that ran in ATLAS 2.0.6 without multithreading, I get the same cryptic error. It works with snakemake 5.5.4, however (just not snakemake 5.7.1), which makes me think the issue is related to snakemake.

Regarding multiprocessing, the main difference is that multiprocessing spawns processes whereas multiprocessing.dummy spawns threads (e.g., see https://chriskiehl.com/article/parallelism-in-one-line). They work the same otherwise. You're correct that I think we can move to multiprocessing in this case.

spleonard1 commented 4 years ago

I updated my snakemake in conda and no longer get this error. Recommend closing

SilasK commented 4 years ago

Snakemake 5.7.4 ?

spleonard1 commented 4 years ago

Uhh, nevermind. Turns out it worked with Snakemake 5.5.4 (had my versioning confused).

SilasK commented 4 years ago

The bug should be solved in the last snakemake release