metagenome-atlas / atlas

ATLAS - Three commands to start analyzing your metagenome data
https://metagenome-atlas.github.io/
BSD 3-Clause "New" or "Revised" License
368 stars 97 forks source link

Error in rule get_read_stats and run_decontamination during run qc #645

Closed lineroager closed 1 year ago

lineroager commented 1 year ago

Hello there,

I have been successfully setting up atlas on a Torque cluster, but ran into this error while trying to run qc on a single sample:

atlas run qc --working-dir data/atlas_workflow/ --profile cluster -c data/atlas_workflow/config.yaml --conda-prefix data/99_databases/conda_envs/

[Atlas] INFO: Atlas version: 2.15.1
Config file /services/tools/metagenome-atlas/2.15.1/lib/python3.10/site-packages/atlas/workflow/config/default_config.yaml is extended by additional config specified via the command line.
Building DAG of jobs...
Creating conda environment /services/tools/metagenome-atlas/2.15.1/lib/python3.10/site-packages/atlas/workflow/rules/../envs/required_packages.yaml...
Downloading and installing remote packages.
Environment for /services/tools/metagenome-atlas/2.15.1/lib/python3.10/site-packages/atlas/workflow/rules/../envs/required_packages.yaml created (location: data/99_databases/conda_envs/9549dec21e9f8371b353aebcd66776ec_)
Creating conda environment /services/tools/metagenome-atlas/2.15.1/lib/python3.10/site-packages/atlas/workflow/rules/../envs/report.yaml...
Downloading and installing remote packages.
Environment for /services/tools/metagenome-atlas/2.15.1/lib/python3.10/site-packages/atlas/workflow/rules/../envs/report.yaml created (location: data/99_databases/conda_envs/2a1e961ee3625284022303553f52c511_)
Using shell: /usr/bin/bash
Provided cluster nodes: 40
Singularity containers: ignored
Job stats:
job                          count    min threads    max threads
-------------------------  -------  -------------  -------------
build_qc_report                  1              1              1
calculate_insert_size            1             10             10
combine_insert_stats             1              1              1
combine_read_counts              1              1              1
combine_read_length_stats        1              1              1
finalize_sample_qc               1              1              1
get_read_stats                   3             10             10
qc                               1              1              1
qcreads                          1              1              1
run_decontamination              1             10             10
write_read_counts                1              1              1
total                           13              1             10

Select jobs to execute...

[Mon May  8 14:25:48 2023]
rule run_decontamination:
    input: CwM1/sequence_quality_control/CwM1_filtered_R1.fastq.gz, CwM1/sequence_quality_control/CwM1_filtered_R2.fastq.gz, CwM1/sequence_quality_control/CwM1_filtered_se.fastq.gz, ref/genome/1/summary.txt
    output: CwM1/sequence_quality_control/CwM1_clean_R1.fastq.gz, CwM1/sequence_quality_control/CwM1_clean_R2.fastq.gz, CwM1/sequence_quality_control/CwM1_clean_se.fastq.gz, CwM1/sequence_quality_control/contaminants/PhiX_R1.fastq.gz, CwM1/sequence_quality_control/contaminants/PhiX_R2.fastq.gz, CwM1/sequence_quality_control/contaminants/PhiX_se.fastq.gz, CwM1/sequence_quality_control/contaminants/C_weissflogii_R1.fastq.gz, CwM1/sequence_quality_control/contaminants/C_weissflogii_R2.fastq.gz, CwM1/sequence_quality_control/contaminants/C_weissflogii_se.fastq.gz, CwM1/sequence_quality_control/CwM1_decontamination_reference_stats.txt
    log: CwM1/logs/QC/decontamination.log
    jobid: 3
    benchmark: logs/benchmarks/QC/decontamination/CwM1.txt
    reason: Missing output files: CwM1/sequence_quality_control/CwM1_clean_R1.fastq.gz, CwM1/sequence_quality_control/CwM1_clean_R2.fastq.gz, CwM1/sequence_quality_control/CwM1_clean_se.fastq.gz
    wildcards: sample=CwM1
    threads: 10
    resources: mem_mb=120000, mem_mib=114441, disk_mb=9679, disk_mib=9231, tmpdir=<TBD>, mem=120, java_mem=102, time_min=60, runtime=3600

CLUSTER: 2023-05-08 14:25:48 Automatically choose best queue to submit
CLUSTER: 2023-05-08 14:25:50 Choose queue: thinnode
CLUSTER: 2023-05-08 14:25:50 parameter 'nodes' not in keymapping! It would be better if you add the key to the file: /home/projects/dtu_00055/people/linroa/cluster/key_mapping.yaml 
 I try without the key!
CLUSTER: 2023-05-08 14:25:50 submit command: qsub -A dtu_00055 -N run_decontamination -l nodes=1:ppn=10 -l walltime=6000 -l mem=120000mb -l nodes=1:thinnode /home/projects/dtu_00055/data/atlas_workflow/.snakemake/tmp.i1uolz6g/snakejob.run_decontamination.3.sh
Submitted job 3 with external jobid '42521368'.

[Mon May  8 14:25:50 2023]
rule get_read_stats:
    input: CwM1/sequence_quality_control/CwM1_filtered_R1.fastq.gz, CwM1/sequence_quality_control/CwM1_filtered_R2.fastq.gz
    output: CwM1/sequence_quality_control/read_stats/filtered.zip, CwM1/sequence_quality_control/read_stats/filtered_read_counts.tsv
    log: CwM1/logs/QC/read_stats/filtered.log
    jobid: 6
    reason: Missing output files: CwM1/sequence_quality_control/read_stats/filtered.zip, CwM1/sequence_quality_control/read_stats/filtered_read_counts.tsv
    wildcards: sample=CwM1, step=filtered
    priority: 30
    threads: 10
    resources: mem_mb=120000, mem_mib=114441, disk_mb=9664, disk_mib=9217, tmpdir=<TBD>, mem=120, java_mem=102, time_min=60, runtime=3600

CLUSTER: 2023-05-08 14:25:50 Automatically choose best queue to submit
CLUSTER: 2023-05-08 14:25:51 Choose queue: thinnode
CLUSTER: 2023-05-08 14:25:51 parameter 'nodes' not in keymapping! It would be better if you add the key to the file: /home/projects/dtu_00055/people/linroa/cluster/key_mapping.yaml 
 I try without the key!
CLUSTER: 2023-05-08 14:25:51 submit command: qsub -A dtu_00055 -N get_read_stats -l nodes=1:ppn=10 -l walltime=6000 -l mem=120000mb -l nodes=1:thinnode /home/projects/dtu_00055/data/atlas_workflow/.snakemake/tmp.i1uolz6g/snakejob.get_read_stats.6.sh
Submitted job 6 with external jobid '42521370'.
[Mon May  8 14:26:38 2023]
Error in rule run_decontamination:
    jobid: 3
    input: CwM1/sequence_quality_control/CwM1_filtered_R1.fastq.gz, CwM1/sequence_quality_control/CwM1_filtered_R2.fastq.gz, CwM1/sequence_quality_control/CwM1_filtered_se.fastq.gz, ref/genome/1/summary.txt
    output: CwM1/sequence_quality_control/CwM1_clean_R1.fastq.gz, CwM1/sequence_quality_control/CwM1_clean_R2.fastq.gz, CwM1/sequence_quality_control/CwM1_clean_se.fastq.gz, CwM1/sequence_quality_control/contaminants/PhiX_R1.fastq.gz, CwM1/sequence_quality_control/contaminants/PhiX_R2.fastq.gz, CwM1/sequence_quality_control/contaminants/PhiX_se.fastq.gz, CwM1/sequence_quality_control/contaminants/C_weissflogii_R1.fastq.gz, CwM1/sequence_quality_control/contaminants/C_weissflogii_R2.fastq.gz, CwM1/sequence_quality_control/contaminants/C_weissflogii_se.fastq.gz, CwM1/sequence_quality_control/CwM1_decontamination_reference_stats.txt
    log: CwM1/logs/QC/decontamination.log (check log file(s) for error details)
    conda-env: /home/projects/dtu_00055/data/atlas_workflow/data/99_databases/conda_envs/9549dec21e9f8371b353aebcd66776ec_
    shell:

                if [ "true" = true ] ; then
                    bbsplit.sh in1=CwM1/sequence_quality_control/CwM1_filtered_R1.fastq.gz in2=CwM1/sequence_quality_control/CwM1_filtered_R2.fastq.gz                         outu1=CwM1/sequence_quality_control/CwM1_clean_R1.fastq.gz outu2=CwM1/sequence_quality_control/CwM1_clean_R2.fastq.gz                         basename="CwM1/sequence_quality_control/contaminants/%_R#.fastq.gz"                         maxindel=20 minratio=0.65                         minhits=1 ambiguous=best refstats=CwM1/sequence_quality_control/CwM1_decontamination_reference_stats.txt                        threads=10 k=13 local=t                         pigz=t unpigz=t ziplevel=9                         -Xmx102G 2> CwM1/logs/QC/decontamination.log
                fi

                bbsplit.sh in=CwM1/sequence_quality_control/CwM1_filtered_se.fastq.gz                      outu=CwM1/sequence_quality_control/CwM1_clean_se.fastq.gz                     basename="CwM1/sequence_quality_control/contaminants/%_se.fastq.gz"                     maxindel=20 minratio=0.65                     minhits=1 ambiguous=best refstats=CwM1/sequence_quality_control/CwM1_decontamination_reference_stats.txt append                     interleaved=f threads=10 k=13 local=t                     pigz=t unpigz=t ziplevel=9                     -Xmx102G 2>> CwM1/logs/QC/decontamination.log

        (one of the commands exited with non-zero exit code; note that snakemake uses bash strict mode!)
    cluster_jobid: 42521368

Error executing rule run_decontamination on cluster (jobid: 3, external: 42521368, jobscript: /home/projects/dtu_00055/data/atlas_workflow/.snakemake/tmp.i1uolz6g/snakejob.run_decontamination.3.sh). For error details see the cluster log and the log files of the involved rule(s).
[Mon May  8 14:26:38 2023]
Error in rule get_read_stats:
    jobid: 6
    input: CwM1/sequence_quality_control/CwM1_filtered_R1.fastq.gz, CwM1/sequence_quality_control/CwM1_filtered_R2.fastq.gz
    output: CwM1/sequence_quality_control/read_stats/filtered.zip, CwM1/sequence_quality_control/read_stats/filtered_read_counts.tsv
    log: CwM1/logs/QC/read_stats/filtered.log (check log file(s) for error details)
    cluster_jobid: 42521370

Error executing rule get_read_stats on cluster (jobid: 6, external: 42521370, jobscript: /home/projects/dtu_00055/data/atlas_workflow/.snakemake/tmp.i1uolz6g/snakejob.get_read_stats.6.sh). For error details see the cluster log and the log files of the involved rule(s).
Shutting down, this might take some time.
Exiting because a job execution failed. Look above for error message
Note the path to the log file for debugging.
Documentation is available at: https://metagenome-atlas.readthedocs.io
Issues can be raised at: https://github.com/metagenome-atlas/atlas/issues
Complete log: .snakemake/log/2023-05-08T140746.785390.snakemake.log
[Atlas] CRITICAL: Command 'snakemake --snakefile /services/tools/metagenome-atlas/2.15.1/lib/python3.10/site-packages/atlas/workflow/Snakefile --directory /home/projects/dtu_00055/data/atlas_workflow  --rerun-triggers mtime --jobs 40 --rerun-incomplete --configfile '/home/projects/dtu_00055/data/atlas_workflow/config.yaml' --nolock  --profile /home/projects/dtu_00055/people/linroa/cluster/ --use-conda --conda-prefix /home/projects/dtu_00055/data/99_databases/conda_envs     --scheduler greedy  qc  --conda-prefix data/99_databases/conda_envs/ ' returned non-zero exit status 1.

The error log of get_read_stats looks like this:

Config file /services/tools/metagenome-atlas/2.15.1/lib/python3.10/site-packages
/atlas/workflow/config/default_config.yaml is extended by additional config spec
ified via the command line.
Building DAG of jobs...
Using shell: /bin/bash
Provided cores: 1 (use --cores to define parallelism)
Rules claiming more threads will be scaled down.
Provided resources: mem_mb=120000, mem_mib=114441, disk_mb=9664, disk_mib=9217, 
mem=120, java_mem=102, time_min=60
Singularity containers: ignored
Select jobs to execute...

[Mon May  8 14:26:31 2023]
rule get_read_stats:
    input: CwM1/sequence_quality_control/CwM1_filtered_R1.fastq.gz, CwM1/sequenc
e_quality_control/CwM1_filtered_R2.fastq.gz
    output: CwM1/sequence_quality_control/read_stats/filtered.zip, CwM1/sequence
_quality_control/read_stats/filtered_read_counts.tsv
    log: CwM1/logs/QC/read_stats/filtered.log
    jobid: 0
    reason: Missing output files: CwM1/sequence_quality_control/read_stats/filte
red_read_counts.tsv, CwM1/sequence_quality_control/read_stats/filtered.zip
    wildcards: sample=CwM1, step=filtered
    priority: 30
    resources: mem_mb=120000, mem_mib=114441, disk_mb=9664, disk_mib=9217, tmpdir=/scratch/42521370, mem=120, java_mem=102, time_min=60, runtime=3600

[Mon May  8 14:26:31 2023]
Error in rule get_read_stats:
    jobid: 0
    input: CwM1/sequence_quality_control/CwM1_filtered_R1.fastq.gz, CwM1/sequence_quality_control/CwM1_filtered_R2.fastq.gz
    output: CwM1/sequence_quality_control/read_stats/filtered.zip, CwM1/sequence_quality_control/read_stats/filtered_read_counts.tsv
    log: CwM1/logs/QC/read_stats/filtered.log (check log file(s) for error details)

RuleException:
CalledProcessError in file /services/tools/metagenome-atlas/2.15.1/lib/python3.10/site-packages/atlas/workflow/rules/qc.smk, line 148:
Command 'set -euo pipefail;   mkdir -p CwM1/sequence_quality_control/read_stats/filtered/pe 2> CwM1/logs/QC/read_stats/filtered.log  ;  reformat.sh in1=CwM1/sequence_quality_control/CwM1_filtered_R1.fastq.gz in2=CwM1/sequence_quality_control/CwM1_filtered_R2.fastq.gz  bhist=CwM1/sequence_quality_control/read_stats/filtered/pe/base_hist.txt  qhist=CwM1/sequence_quality_control/read_stats/filtered/pe/quality_by_pos.txt  lhist=CwM1/sequence_quality_control/read_stats/filtered/pe/readlength.txt  gchist=CwM1/sequence_quality_control/read_stats/filtered/pe/gc_hist.txt  gcbins=auto  bqhist=CwM1/sequence_quality_control/read_stats/filtered/pe/boxplot_quality.txt  threads=1  overwrite=true  -Xmx102G  2> >(tee -a CwM1/logs/QC/read_stats/filtered.log CwM1/sequence_quality_control/read_stats/filtered/pe/read_stats.tmp )' returned non-zero exit status 127.
  File "/services/tools/metagenome-atlas/2.15.1/lib/python3.10/site-packages/atlas/workflow/rules/qc.smk", line 178, in __rule_get_read_stats
  File "/services/tools/metagenome-atlas/2.15.1/lib/python3.10/site-packages/atlas/workflow/rules/qc.smk", line 148, in get_read_stats
  File "/services/tools/metagenome-atlas/2.15.1/lib/python3.10/concurrent/futures/thread.py", line 58, in run
Shutting down, this might take some time.
Exiting because a job execution failed. Look above for error message

The error log for run_decontamination looks like this:

Config file /services/tools/metagenome-atlas/2.15.1/lib/python3.10/site-packages
/atlas/workflow/config/default_config.yaml is extended by additional config spec
ified via the command line.
Building DAG of jobs...
/bin/bash: conda: command not found
Traceback (most recent call last):
  File "/services/tools/metagenome-atlas/2.15.1/lib/python3.10/site-packages/sna
kemake/__init__.py", line 757, in snakemake
    success = workflow.execute(
  File "/services/tools/metagenome-atlas/2.15.1/lib/python3.10/site-packages/sna
kemake/workflow.py", line 935, in execute
    dag.create_conda_envs(
  File "/services/tools/metagenome-atlas/2.15.1/lib/python3.10/site-packages/sna
kemake/dag.py", line 351, in create_conda_envs
    env.create(dryrun)
  File "/services/tools/metagenome-atlas/2.15.1/lib/python3.10/site-packages/sna
kemake/deployment/conda.py", line 405, in create
    pin_file = self.pin_file
  File "/services/tools/metagenome-atlas/2.15.1/lib/python3.10/site-packages/sna
kemake/common/__init__.py", line 221, in __get__
    value = self.method(instance)
  File "/services/tools/metagenome-atlas/2.15.1/lib/python3.10/site-packages/snakemake/deployment/conda.py", line 109, in pin_file
    f".{self.conda.platform}.pin.txt"
  File "/services/tools/metagenome-atlas/2.15.1/lib/python3.10/site-packages/snakemake/common/__init__.py", line 221, in __get__
    value = self.method(instance)
  File "/services/tools/metagenome-atlas/2.15.1/lib/python3.10/site-packages/snakemake/deployment/conda.py", line 102, in conda
    return Conda(
  File "/services/tools/metagenome-atlas/2.15.1/lib/python3.10/site-packages/snakemake/deployment/conda.py", line 667, in __init__
    shell.check_output(self._get_cmd(f"conda info --json"), text=True)
  File "/services/tools/metagenome-atlas/2.15.1/lib/python3.10/site-packages/snakemake/shell.py", line 63, in check_output
    return sp.check_output(cmd, shell=True, executable=executable, **kwargs)
  File "/services/tools/metagenome-atlas/2.15.1/lib/python3.10/subprocess.py", line 421, in check_output
    return run(*popenargs, stdout=PIPE, timeout=timeout, check=True,
  File "/services/tools/metagenome-atlas/2.15.1/lib/python3.10/subprocess.py", line 526, in run
    raise CalledProcessError(retcode, process.args,
subprocess.CalledProcessError: Command 'conda info --json' returned non-zero exit status 127.

The cluster log is empty.

I am using atlas version 2.15.1

I saw this issue which had errors in the same rules, but did not mention any solutions before the issue being solved.

I hope you can help, since I am quite stuck and was excited to use atlas on my samples!

Best regards,

SilasK commented 1 year ago

Both errors are related in that the software, e.g. conda is no longer found when submitted to the cluster.

But strange why this happens specifically for these rules.

Could you try to simply update to atlas v2.15.2 ?

lineroager commented 1 year ago

Yes, this is what I suspected too. It seems the software (Conda, atlas) that I load with my script is not loaded with the next jobs submitted by atlas maybe? I have very limited rights for software installation on the cluster if that would impact something like this.

I will try to update to atlas 2.15.2 and see if it helps. Will be back then!

Edit: Oh, also I think it is happening for these rules specifically because I ran part of the qc locally before moving onto the cluster so these are probably the next rules in the pipeline from where the local execution stopped. Sorry for not mentioning earlier, I kind of forgot I did this.

SilasK commented 1 year ago

There is also a anakemake argument that you can supply to the atlas command. it is intended for internal use but It might fix your problem.

--conda-base-path abs path of conda base installation (home of conda, mamba, activate)

lineroager commented 1 year ago

I updated to atlas 2.15.2 with no improvement and also tried adding the --conda-base-path to no avail.

In the end I managed to get it running in single machine execution mode, which is not ideal, but works. No conda environments or software is being transferred once atlas submits a new job to the queue, but I did not manage to find a way to fix it.

Going to close this issue since I found a workaround, but feel free to reopen with real solutions any time.