nf-core / mag

Assembly and binning of metagenomes
https://nf-co.re/mag
MIT License
192 stars 102 forks source link

CHECKM_LINEAGEWF failing with exit status 1 #600

Closed carleton-envbiotech closed 3 months ago

carleton-envbiotech commented 4 months ago

Description of the bug

When running nf-core/mag v2.5.4, I have run into an issue when including the --binqc_tool checkm flag that returns an error with an exit status of 1 and further indicates there is an unexpected error <class 'EOFError'>. I have provided the input code below in addition to an excerpt of the output. I can solve this issue when removing the flag and continuing with BUSCO instead, so it seems to be specific to the CheckM step.

Command used and terminal output

nextflow run nf-core/mag -r 2.5.4 -c mag-memory-increase.conf -profile apptainer  \
--input '/datastore/researchdata/sequencing_data_archive/nitrifying_consortia_illumina_short_read_data/NitrifyingPelletDNA*_R{1,2}*.fastq.gz' \
--outdir  Nitrifying_consortia_analyses \
--refine_bins_dastool \
--binqc_tool checkm \
--postbinning_input refined_bins_only \
--skip_spades \
--skip_concoct \
--gtdb_db "/datastore/researchdata/gtdbtk/gtdbtk_data.tar.gz" \
--busco_db "busco_nextflow/bacteria_odb10.2024-01-08.tar.gz"

#Terminal output
MEGAHIT-DASTool-unclassified-dastool_refined_unbinned-NitrifyingPelletDNA_Week4_Sulphatereduction_DNARNAkit_rep3_S14_wf

  cat <<-END_VERSIONS > versions.yml
  "NFCORE_MAG:MAG:CHECKM_QC:CHECKM_LINEAGEWF":
      checkm: $( checkm 2>&1 | grep '...:::' | sed 's/.*CheckM v//;s/ .*//' )
  END_VERSIONS

Command exit status:
  1

Command output:
  [2024-02-22 09:26:07] INFO: CheckM v1.2.1
  [2024-02-22 09:26:07] INFO: checkm lineage_wf -t 10 -f MEGAHIT-DASTool-unclassified-dastool_refined_unbinned-NitrifyingPelletDNA_Week4_Sulphatereduction_DNARNAkit_rep3_S14_wf.tsv --tab_table --pplacer_threads 10 -x fa input_bins/ MEGAHIT-DASTool-unclassified-dastool_refined_unbinned-NitrifyingPelletDNA_Week4_Sulphatereduction_DNARNAkit_rep3_S14_wf
  [2024-02-22 09:26:07] INFO: CheckM data: checkm_data_2015_01_16
  [2024-02-22 09:26:07] INFO: [CheckM - tree] Placing bins in reference genome tree.
  [2024-02-22 09:26:08] INFO: Identifying marker genes in 1 bins with 10 threads:

  Unexpected error: <class 'EOFError'>

Command error:
  Matplotlib created a temporary config/cache directory at /tmp/matplotlib-55r4atjf because the default path (/files/home/dgregoire/.config/matplotlib) is not a writable directory; it is highly recommended to set the MPLCONFIGDIR environment variable to a writable directory, in particular to speed up the import of Matplotlib and to better support multiprocessing.
  [2024-02-22 09:26:07] INFO: CheckM v1.2.1
  [2024-02-22 09:26:07] INFO: checkm lineage_wf -t 10 -f MEGAHIT-DASTool-unclassified-dastool_refined_unbinned-NitrifyingPelletDNA_Week4_Sulphatereduction_DNARNAkit_rep3_S14_wf.tsv --tab_table --pplacer_threads 10 -x fa input_bins/ MEGAHIT-DASTool-unclassified-dastool_refined_unbinned-NitrifyingPelletDNA_Week4_Sulphatereduction_DNARNAkit_rep3_S14_wf
  [2024-02-22 09:26:07] INFO: CheckM data: checkm_data_2015_01_16
  [2024-02-22 09:26:07] INFO: [CheckM - tree] Placing bins in reference genome tree.
  [2024-02-22 09:26:08] INFO: Identifying marker genes in 1 bins with 10 threads:
  Process SyncManager-1:
  Traceback (most recent call last):
    File "/usr/local/lib/python3.10/multiprocessing/process.py", line 315, in _bootstrap
      self.run()
    File "/usr/local/lib/python3.10/multiprocessing/process.py", line 108, in run
      self._target(*self._args, **self._kwargs)
    File "/usr/local/lib/python3.10/multiprocessing/managers.py", line 591, in _run_server
      server = cls._Server(registry, address, authkey, serializer)
    File "/usr/local/lib/python3.10/multiprocessing/managers.py", line 156, in __init__
      self.listener = Listener(address=address, backlog=16)
    File "/usr/local/lib/python3.10/multiprocessing/connection.py", line 453, in __init__
      self._listener = SocketListener(address, family, backlog)
    File "/usr/local/lib/python3.10/multiprocessing/connection.py", line 596, in __init__
      self._socket.bind(address)
  OSError: [Errno 98] Address already in use
  Traceback (most recent call last):
    File "/usr/local/bin/checkm", line 856, in <module>

  Unexpected error: <class 'EOFError'>
      checkmParser.parseOptions(args)
    File "/usr/local/lib/python3.10/site-packages/checkm/main.py", line 979, in parseOptions
      self.tree(options)
    File "/usr/local/lib/python3.10/site-packages/checkm/main.py", line 157, in tree
      binIdToModels = mgf.find(binFiles,
    File "/usr/local/lib/python3.10/site-packages/checkm/markerGeneFinder.py", line 67, in find
      binIdToModels = mp.Manager().dict()
    File "/usr/local/lib/python3.10/multiprocessing/context.py", line 57, in Manager
      m.start()
    File "/usr/local/lib/python3.10/multiprocessing/managers.py", line 566, in start
      self._address = reader.recv()
    File "/usr/local/lib/python3.10/multiprocessing/connection.py", line 255, in recv
      buf = self._recv_bytes()
    File "/usr/local/lib/python3.10/multiprocessing/connection.py", line 419, in _recv_bytes
      buf = self._recv(4)
    File "/usr/local/lib/python3.10/multiprocessing/connection.py", line 388, in _recv
      raise EOFError
  EOFError

Work dir:
  /datastore/userdata/daniel/work/7f/22e0bcdd238749fc9a385e696c1bcf

Tip: when you have fixed the problem you can continue the execution adding the option `-resume` to the run command line

 -- Check '.nextflow.log' file for details

Relevant files

No response

System information

Running on HPC with 1.0 Tb RAM and 48 CPU Executed locally Container engineer Apptainer CentOS Linux nf-core/mag v 2.5.4

jfy133 commented 4 months ago

Thanks for the report!

EOFerror implies to me that there is an empty input file or corrupted database somewhere...

If you go into the reported work directory, can you inspect the input files to see if they do have something in them?

carleton-envbiotech commented 4 months ago

Working through the work directory, I see the following:

[2024-02-22 09:26:07] INFO: CheckM v1.2.1 [2024-02-22 09:26:07] INFO: checkm lineage_wf -t 10 -f MEGAHIT-DASTool-unclassified-dastool_refined_unbinned-NitrifyingPelletDNA_Week4_Sulphatereduction_DNARNAkit_rep3_S14_wf.tsv --tab_table --pplacer_threads 10 -x fa input_bins/ MEGAHIT-DASTool-unclassified-dastool_refined_unbinned-NitrifyingPelletDNA_Week4_Sulphatereduction_DNARNAkit_rep3_S14_wf [2024-02-22 09:26:07] INFO: CheckM data: checkm_data_2015_01_16 [2024-02-22 09:26:07] INFO: [CheckM - tree] Placing bins in reference genome tree. [2024-02-22 09:26:08] INFO: Identifying marker genes in 1 bins with 10 threads:

jfy133 commented 4 months ago

Looking at the checkm issues, I think it maybe you have run out of memory for the checkm process.

You shoulf increase the memory for that errored process in your custom config file too, as you've already done for others it seems

carleton-envbiotech commented 4 months ago

I obtained this error message even after adjusting the configuration to look like the following excerpt:

process { withName: GTDBTK_CLASSIFYWF { cpus = 32 memory = 256.GB } withName: CHECKM_QC { cpus = 32 memory = 256.GB } }

jfy133 commented 4 months ago

Gah. Could you try running the command manually (.command.sh) with a local copy of checkM? That way we can isolate the error whether it's the pipeline doing something wrong or thetool...

jfy133 commented 3 months ago

@carleton-envbiotech my feeling is either still memory, this seems to be REALLY common issue with checkm, and results in very similar errors.

I note that your configuration in teh except woudn't work without new lines - was that just a quick type out?

process { 
        withName: GTDBTK_CLASSIFYWF { 
                cpus = 32
                memory = 256.GB 
        } 
        withName: CHECKM_QC { 
                cpus = 32 
                memory = 256.GB 
        }
}

Works for me for example

Otherwise, maybe it's the wrong database file being passed to it... the nf-core/mag docs for --checkm_db says it shoujld be this below, but looks like you have a different name in the command above (it might be the same contents, IDK)


default: https://data.ace.uq.edu.au/public/gtdb/data/releases/release214/214.1/auxillary_files/gtdbtk_r214_data.tar.gz```
jfy133 commented 3 months ago

Going to close for now, as I think ti's a memory issue rather than a pipeline error