metagenome-atlas / atlas

ATLAS - Three commands to start analyzing your metagenome data
https://metagenome-atlas.github.io/
BSD 3-Clause "New" or "Revised" License
368 stars 97 forks source link

problem with pplacer #92

Closed mykophile closed 6 years ago

mykophile commented 6 years ago

i'm trying to start an assembly and keep getting an error that seems to indicate that pplacer is missing. I've tried to install pplacer through bioconda and it seems like the package is missing. any suggestions on how to proceed?

BIO-C02TX0H0HV2V:Bioinformatics kpeay$ atlas assemble configKP.yaml [2018-02-09 15:35 INFO] Executing: snakemake --snakefile /Users/kpeay/miniconda3/lib/python3.5/site-packages/atlas/Snakefile --directory /Users/kpeay/Documents/Kabir_Documents/Bioinformatics --printshellcmds --jobs 4 --rerun-incomplete --configfile '/Users/kpeay/Documents/Kabir_Documents/Bioinformatics/configKP.yaml' --nolock --use-conda --config workflow=complete -- Building DAG of jobs... Creating conda environment /Users/kpeay/miniconda3/lib/python3.5/site-packages/atlas/envs/optional_genome_binning.yaml... Downloading remote packages. CreateCondaEnvironmentException: Could not create conda environment from /Users/kpeay/miniconda3/lib/python3.5/site-packages/atlas/envs/optional_genome_binning.yaml: Fetching package metadata .................

ResolvePackageNotFound:

[2018-02-09 15:35 CRITICAL] Command 'snakemake --snakefile /Users/kpeay/miniconda3/lib/python3.5/site-packages/atlas/Snakefile --directory /Users/kpeay/Documents/Kabir_Documents/Bioinformatics --printshellcmds --jobs 4 --rerun-incomplete --configfile '/Users/kpeay/Documents/Kabir_Documents/Bioinformatics/configKP.yaml' --nolock --use-conda --config workflow=complete -- ' returned non-zero exit status 1

SilasK commented 6 years ago

What platform are you on?

Atlas create seperate conda env for different steps. Your log shows that you have problems installing the genome binning, which comes after the assembly.

You could skip this step in the config file to get the assembly running.

You can also try to install the environment by something like conda env install -n env_test --file /Users/kpeay/miniconda3/lib/python3.5/site-packages/atlas/envs/optional_genome_binning.yaml to debug it.

mykophile commented 6 years ago

I'm using Mac High Sierra. I'll try and see if I can skip the genome binning step. When I do try to use the code suggested to debug I get the same error (install is not an option):

BIO-C02TX0H0HV2V:Bioinformatics kpeay$ conda env create -n env_test --file /Users/kpeay/miniconda3/lib/python3.5/site-packages/atlas/envs/optional_genome_binning.yaml Fetching package metadata .................

ResolvePackageNotFound:

It's possible I'm missing something here as I'm new to conda.

mykophile commented 6 years ago

So the rest of the environments seem to install if I skip genome binning, however I did get a new error once I start with assembly (see below). Apparently the "reformat.sh" command is missing. I seem to have been able to fix this by manually installing bbmap with conda, although I'm not sure why it didn't install in the first place.

BIO-C02TX0H0HV2V:Bioinformatics kpeay$ conda env list -n env_test --file /Users/kpeay/miniconda3/lib/python3.5/site-packages/atlas/envs/optional_genome_binning.yaml usage: conda-env [-h] {attach,create,export,list,remove,upload,update} ... conda-env: error: unrecognized arguments: -n env_test --file /Users/kpeay/miniconda3/lib/python3.5/site-packages/atlas/envs/optional_genome_binning.yaml BIO-C02TX0H0HV2V:Bioinformatics kpeay$ atlas assemble configKP.yaml [2018-02-12 08:45 INFO] Executing: snakemake --snakefile /Users/kpeay/miniconda3/lib/python3.5/site-packages/atlas/Snakefile --directory /Users/kpeay/Documents/Kabir_Documents/Bioinformatics --printshellcmds --jobs 4 --rerun-incomplete --configfile '/Users/kpeay/Documents/Kabir_Documents/Bioinformatics/configKP.yaml' --nolock --use-conda --config workflow=complete -- Building DAG of jobs... Using shell: /bin/bash Provided cores: 4 Rules claiming more threads will be scaled down. Unlimited resources: mem Job counts: count jobs 1 QC_report 1 add_contig_metadata 1 align_reads_to_final_contigs 1 all 1 build_decontamination_db 2 calculate_contigs_stats 1 calculate_insert_size 1 calculate_prefiltered_contig_coverage_stats 1 combine_insert_stats 1 combine_read_counts 1 combine_read_length_stats 1 convert_gff_to_gtf 1 convert_sam_to_bam 1 decontamination 1 deduplicate 1 error_correction 1 filter_by_coverage 1 finalize_QC 1 finalize_contigs 1 find_counts_per_region 1 init_QC 1 merge_pairs 1 merge_sample_tables 1 normalize_coverage_across_kmers 1 parse_blastp 1 postprocess_after_decontamination 1 quality_filter 5 read_stats 1 rename_contigs 1 rename_megahit_output 1 run_diamond_blastp 1 run_megahit 1 run_prokka_annotation 1 sort_munged_blast_hits 1 update_prokka_tsv 40

rule init_QC: input: /Users/kpeay/Documents/Kabir_Documents/Bioinformatics/MASS/P_309_S13_L008_R1_001.fastq, /Users/kpeay/Documents/Kabir_Documents/Bioinformatics/MASS/P_309_S13_L008_R2_001.fastq output: P-309-S13-L008-001/sequence_quality_control/P-309-S13-L008-001_raw_R1.fastq.gz, P-309-S13-L008-001/sequence_quality_control/P-309-S13-L008-001_raw_R2.fastq.gz log: P-309-S13-L008-001/logs/P-309-S13-L008-001_init.log jobid: 26 wildcards: sample=P-309-S13-L008-001 priority: 80 threads: 4 resources: mem=32

reformat.sh in=/Users/kpeay/Documents/Kabir_Documents/Bioinformatics/MASS/P_309_S13_L008_R1_001.fastq in2=/Users/kpeay/Documents/Kabir_Documents/Bioinformatics/MASS/P_309_S13_L008_R2_001.fastq interleaved=f out1=P-309-S13-L008-001/sequence_quality_control/P-309-S13-L008-001_raw_R1.fastq.gz out2=P-309-S13-L008-001/sequence_quality_control/P-309-S13-L008-001_raw_R2.fastq.gz qout=33 overwrite=true verifypaired=t addslash=t trimreaddescription=t threads=4 -Xmx32G 2> P-309-S13-L008-001/logs/P-309-S13-L008-001_init.log

Activating conda environment /Users/kpeay/Documents/Kabir_Documents/Bioinformatics/.snakemake/conda/2ae81ce7. Finished job 26. 1 of 40 steps (2%) done

rule read_stats: input: P-309-S13-L008-001/sequence_quality_control/P-309-S13-L008-001_raw_R1.fastq.gz, P-309-S13-L008-001/sequence_quality_control/P-309-S13-L008-001_raw_R2.fastq.gz output: P-309-S13-L008-001/sequence_quality_control/read_stats/raw.zip, P-309-S13-L008-001/sequence_quality_control/read_stats/raw_read_counts.tsv log: P-309-S13-L008-001/logs/read_stats.log jobid: 9 wildcards: step=raw, sample=P-309-S13-L008-001 priority: 30 threads: 4 resources: mem=32

/bin/bash: line 3: reformat.sh: command not found Error in rule read_stats: jobid: 0 output: P-309-S13-L008-001/sequence_quality_control/read_stats/raw.zip, P-309-S13-L008-001/sequence_quality_control/read_stats/raw_read_counts.tsv log: P-309-S13-L008-001/logs/read_stats.log

RuleException: CalledProcessError in line 162 of /Users/kpeay/miniconda3/lib/python3.5/site-packages/atlas/rules/qc.snakefile: Command ' set -euo pipefail;
mkdir -p P-309-S13-L008-001/sequence_quality_control/read_stats/raw/pe

                reformat.sh in1=P-309-S13-L008-001/sequence_quality_control/P-309-S13-L008-001_raw_R1.fastq.gz in2=P-309-S13-L008-001/sequence_quality_control/P-309-S13-L008-001_raw_R2.fastq.gz                     bhist=P-309-S13-L008-001/sequence_quality_control/read_stats/raw/pe/base_hist.txt                     qhist=P-309-S13-L008-001/sequence_quality_control/read_stats/raw/pe/quality_by_pos.txt                     lhist=P-309-S13-L008-001/sequence_quality_control/read_stats/raw/pe/readlength.txt                     gchist=P-309-S13-L008-001/sequence_quality_control/read_stats/raw/pe/gc_hist.txt                     gcbins=auto                     bqhist=P-309-S13-L008-001/sequence_quality_control/read_stats/raw/pe/boxplot_quality.txt                     threads=4                     overwrite=true                     -Xmx32G                     2> >(tee -a P-309-S13-L008-001/logs/read_stats.log P-309-S13-L008-001/sequence_quality_control/read_stats/raw/pe/read_stats.tmp ) ' returned non-zero exit status 127

File "/Users/kpeay/miniconda3/lib/python3.5/site-packages/atlas/rules/qc.snakefile", line 178, in __rule_read_stats File "/Users/kpeay/miniconda3/lib/python3.5/site-packages/atlas/rules/qc.snakefile", line 162, in get_read_stats File "/Users/kpeay/miniconda3/lib/python3.5/concurrent/futures/thread.py", line 55, in run Exiting because a job execution failed. Look above for error message Will exit after finishing currently running jobs. Shutting down, this might take some time. Exiting because a job execution failed. Look above for error message Complete log: /Users/kpeay/Documents/Kabir_Documents/Bioinformatics/.snakemake/log/2018-02-12T084537.524673.snakemake.log [2018-02-12 08:47 CRITICAL] Command 'snakemake --snakefile /Users/kpeay/miniconda3/lib/python3.5/site-packages/atlas/Snakefile --directory /Users/kpeay/Documents/Kabir_Documents/Bioinformatics --printshellcmds --jobs 4 --rerun-incomplete --configfile '/Users/kpeay/Documents/Kabir_Documents/Bioinformatics/configKP.yaml' --nolock --use-conda --config workflow=complete -- ' returned non-zero exit status 1

SilasK commented 6 years ago

yes bbmap should be a requirement for atlas.

I will integrate that into a bioconda package.