metagenome-atlas / atlas

ATLAS - Three commands to start analyzing your metagenome data
https://metagenome-atlas.github.io/
BSD 3-Clause "New" or "Revised" License
377 stars 98 forks source link

run_decontamination crash #583

Closed CMagnoBR closed 1 year ago

CMagnoBR commented 1 year ago

Hi ATLAS-dev team. I`m beginner in bioinformatic analysis.

I have trying to run the ATLAS, with some metagenomic samples, which it were sequenced in as single-end approach. Here, I have used a i7 8 core, with 32 GB RAM (~16 Gb RAM free), into a WSL-Ubuntu environment.

For assembly, I chose to use megahit. Please, help me understand what is going on! Below is the error message.

Error in rule run_decontamination: jobid: 31 output: SRR19257239/sequence_quality_control/SRR19257239_clean_se.fastq.gz, SRR19257239/sequence_quality_control/contaminants/PhiX_se.fastq.gz, SRR19257239/sequence_quality_control/SRR19257239_decontamination_reference_stats.txt log: SRR19257239/logs/QC/decontamination.log (check log file(s) for error message) conda-env: /mnt/d/Centroflora/database/condaenvs/ebfa74c6e7a4e6d90959161f1b32e514 shell:

            if [ "false" = true ] ; then
                bbsplit.sh in1=SRR19257239/sequence_quality_control/SRR19257239_filtered_se.fastq.gz in2=ref/genome/1/summary.txt                         outu1=SRR19257239/sequence_quality_control/SRR19257239_clean_se.fastq.gz outu2=SRR19257239/sequence_quality_control/contaminants/PhiX_se.fastq.gz                         basename="SRR19257239/sequence_quality_control/contaminants/%_R#.fastq.gz"                         maxindel=20 minratio=0.65                         minhits=1 ambiguous=best refstats=SRR19257239/sequence_quality_control/SRR19257239_decontamination_reference_stats.txt                        threads=8 k=13 local=t                         pigz=t unpigz=t ziplevel=9                         -Xmx20G 2> SRR19257239/logs/QC/decontamination.log
            fi

            bbsplit.sh in=SRR19257239/sequence_quality_control/SRR19257239_filtered_se.fastq.gz                      outu=SRR19257239/sequence_quality_control/SRR19257239_clean_se.fastq.gz                     basename="SRR19257239/sequence_quality_control/contaminants/%_se.fastq.gz"                     maxindel=20 minratio=0.65                     minhits=1 ambiguous=best refstats=SRR19257239/sequence_quality_control/SRR19257239_decontamination_reference_stats.txt append                     interleaved=f threads=8 k=13 local=t                     pigz=t unpigz=t ziplevel=9                     -Xmx20G 2>> SRR19257239/logs/QC/decontamination.log

    (one of the commands exited with non-zero exit code; note that snakemake uses bash strict mode!)

Removing output files of failed job run_decontamination since they might be corrupted: SRR19257239/sequence_quality_control/SRR19257239_clean_se.fastq.gz, SRR19257239/sequence_quality_control/contaminants/PhiX_se.fastq.gz, SRR19257239/sequence_quality_control/SRR19257239_decontamination_reference_stats.txt Shutting down, this might take some time. Exiting because a job execution failed. Look above for error message Note the path to the log file for debugging. Documentation is available at: https://metagenome-atlas.readthedocs.io Issues can be raised at: https://github.com/metagenome-atlas/atlas/issues Complete log: .snakemake/log/2022-11-10T155059.848861.snakemake.log [Atlas] CRITICAL: Command 'snakemake --snakefile /home/isiqv/anaconda3/envs/atlasenv/lib/python3.10/site-packages/atlas/workflow/Snakefile --directory /mnt/d/Centroflora/Amostras_teste --rerun-triggers mtime --jobs 8 --rerun-incomplete --configfile '/mnt/d/Centroflora/Amostras_teste/config.yaml' --nolock --use-conda --conda-prefix /mnt/d/Centroflora/database/conda_envs --resources mem=23 mem_mb=24222 java_mem=20 --scheduler greedy all ' returned non-zero exit status 1.

ATLAS version 2.12.0

Thank you so much!

Regards!

SilasK commented 1 year ago

Could you check the SRR19257239/logs/QC/decontamination.log

You try to run everithing with not more than 20gb I'm not sure if this would be enough for assembly. The log will tell. You don't have a cluster or something?

If you cannot do the assembly, would be mapping to an existing reference be an option? What microbiome (host) do you have?

philippbayer commented 1 year ago

I believe I have the the same error. My decontamination.log is empty, but my SLURM job-stdout has content:

  File "/scratch/pawsey0390/pbayer/seagrass_atlas/atlasenv/lib/python3.10/runpy.py", line 196, in _run_module_as_main
    return _run_code(code, main_globals, None,
  File "/scratch/pawsey0390/pbayer/seagrass_atlas/atlasenv/lib/python3.10/runpy.py", line 86, in _run_code
    exec(code, run_globals)
  File "/scratch/pawsey0390/pbayer/seagrass_atlas/atlasenv/lib/python3.10/site-packages/snakemake/__main__.py", line 4, in <module>
    main()
  File "/scratch/pawsey0390/pbayer/seagrass_atlas/atlasenv/lib/python3.10/site-packages/snakemake/__init__.py", line 2504, in main
    parser = get_argument_parser()
  File "/scratch/pawsey0390/pbayer/seagrass_atlas/atlasenv/lib/python3.10/site-packages/snakemake/__init__.py", line 1469, in get_argument_parser
    import pulp
  File "/scratch/pawsey0390/pbayer/seagrass_atlas/atlasenv/lib/python3.10/site-packages/pulp/__init__.py", line 34, in <module>
    from .pulp import *
  File "/scratch/pawsey0390/pbayer/seagrass_atlas/atlasenv/lib/python3.10/site-packages/pulp/pulp.py", line 99, in <module>
    from .apis import LpSolverDefault, PULP_CBC_CMD
  File "/scratch/pawsey0390/pbayer/seagrass_atlas/atlasenv/lib/python3.10/site-packages/pulp/apis/__init__.py", line 38, in <module>
    elif GLPK_CMD().available():
  File "/scratch/pawsey0390/pbayer/seagrass_atlas/atlasenv/lib/python3.10/site-packages/pulp/apis/glpk_api.py", line 70, in available
    return self.executable(self.path)
  File "/scratch/pawsey0390/pbayer/seagrass_atlas/atlasenv/lib/python3.10/site-packages/pulp/apis/core.py", line 497, in executable
    for path in os.environ.get("PATH", []).split(os.pathsep):
AttributeError: 'list' object has no attribute 'split'

Funny enough, running os.environ.get("PATH", []).split(os.pathsep) manually in the atlasenv' Python installation works fine so I have a hunch this is a Python version issue somewhere (I'm thinking in my case, SLURM is picking up the system Python, not the conda-Python - but I already tried with Python=2 and it still doesn't reproduce the error). I tried Python=3.10.8 (from atlasenv) and Python=2.7.18

@CMagnoBR do you see the same issue error in your logs? or is it a different error for you?

Edit: Oh I can replicate the bug now. It happens when $PATH is completely empty. To get around this error for now, a minor hack inside the core.py script. I changed the line to

        for path in os.environ.get("PATH", '').split(os.pathsep):

the old expression would return [] if PATH is unset, and then split crashes, this change returns an empty string instead which is splittable. No idea why PATH is unset in the first place.

SilasK commented 1 year ago

So there is an error in pulp?

SilasK commented 1 year ago

Do you mind to update to atlas 2.13 and try again. if you have errors you can write at #586