Closed carleton-envbiotech closed 3 months ago
Thanks for the report!
EOFerror implies to me that there is an empty input file or corrupted database somewhere...
If you go into the reported work
directory, can you inspect the input files to see if they do have something in them?
Working through the work directory, I see the following:
[2024-02-22 09:26:07] INFO: CheckM v1.2.1 [2024-02-22 09:26:07] INFO: checkm lineage_wf -t 10 -f MEGAHIT-DASTool-unclassified-dastool_refined_unbinned-NitrifyingPelletDNA_Week4_Sulphatereduction_DNARNAkit_rep3_S14_wf.tsv --tab_table --pplacer_threads 10 -x fa input_bins/ MEGAHIT-DASTool-unclassified-dastool_refined_unbinned-NitrifyingPelletDNA_Week4_Sulphatereduction_DNARNAkit_rep3_S14_wf [2024-02-22 09:26:07] INFO: CheckM data: checkm_data_2015_01_16 [2024-02-22 09:26:07] INFO: [CheckM - tree] Placing bins in reference genome tree. [2024-02-22 09:26:08] INFO: Identifying marker genes in 1 bins with 10 threads:
Looking at the checkm issues, I think it maybe you have run out of memory for the checkm process.
You shoulf increase the memory for that errored process in your custom config file too, as you've already done for others it seems
I obtained this error message even after adjusting the configuration to look like the following excerpt:
process { withName: GTDBTK_CLASSIFYWF { cpus = 32 memory = 256.GB } withName: CHECKM_QC { cpus = 32 memory = 256.GB } }
Gah. Could you try running the command manually (.command.sh
) with a local copy of checkM? That way we can isolate the error whether it's the pipeline doing something wrong or thetool...
@carleton-envbiotech my feeling is either still memory, this seems to be REALLY common issue with checkm, and results in very similar errors.
I note that your configuration in teh except woudn't work without new lines - was that just a quick type out?
process {
withName: GTDBTK_CLASSIFYWF {
cpus = 32
memory = 256.GB
}
withName: CHECKM_QC {
cpus = 32
memory = 256.GB
}
}
Works for me for example
Otherwise, maybe it's the wrong database file being passed to it... the nf-core/mag docs for --checkm_db says it shoujld be this below, but looks like you have a different name in the command above (it might be the same contents, IDK)
default: https://data.ace.uq.edu.au/public/gtdb/data/releases/release214/214.1/auxillary_files/gtdbtk_r214_data.tar.gz```
Going to close for now, as I think ti's a memory issue rather than a pipeline error
Description of the bug
When running nf-core/mag v2.5.4, I have run into an issue when including the
--binqc_tool checkm
flag that returns an error with an exit status of 1 and further indicates there is an unexpected error <class 'EOFError'>. I have provided the input code below in addition to an excerpt of the output. I can solve this issue when removing the flag and continuing with BUSCO instead, so it seems to be specific to the CheckM step.Command used and terminal output
Relevant files
No response
System information
Running on HPC with 1.0 Tb RAM and 48 CPU Executed locally Container engineer Apptainer CentOS Linux nf-core/mag v 2.5.4