Clairmontl commented 3 years ago

Check Documentation

I have checked the following places for your error:

Description of the bug

.nextflow.log .nextflow.log

Steps to reproduce

Steps to reproduce the behaviour:

Command line: nextflow run nf-core/mag -profile test,conda --no_r -c nextflow.conf -r 2.0.0 -resume
See error: First (without --no_r)

Error executing process > 'MAG:BUSCO_QC:BUSCO_PLOT (SPAdes-test_minigut_sample2)'

Caused by: Process MAG:BUSCO_QC:BUSCO_PLOT (SPAdes-test_minigut_sample2) terminated with an error exit status (1)

Command executed:

if [ -n "short_summary.specific_lineage.bacteria_odb10.SPAdes-test_minigut_sample2.unbinned.2.fa.txt short_summary.specific_lineage.bacteria_odb10.SPAdes-test_minigut_sample2.unbinned.1.fa.txt" ] then

replace dots in bin names within summary file names by underscores

  # currently (BUSCO v5.1.0) generate_plot.py does not allow further dots
  for sum in short_summary.specific_lineage.bacteria_odb10.SPAdes-test_minigut_sample2.unbinned.2.fa.txt short_summary.specific_lineage.bacteria_odb10.SPAdes-test_minigut_sample2.unbinned.1.fa.txt; do
      [[ ${sum} =~ short_summary.([_[:alnum:]]+).([_[:alnum:]]+).SPAdes-test_minigut_sample2.(.+).txt ]];
      mode=${BASH_REMATCH[1]}
      db_name=${BASH_REMATCH[2]}
      bin="SPAdes-test_minigut_sample2.${BASH_REMATCH[3]}"
      bin_new="${bin//./_}"
      mv ${sum} short_summary.${mode}.${db_name}.${bin_new}.txt
  done
  generate_plot.py --working_directory .

  mv busco_figure.png "SPAdes-test_minigut_sample2.${mode}.${db_name}.busco_figure.png"
  mv busco_figure.R "SPAdes-test_minigut_sample2.${mode}.${db_name}.busco_figure.R"

fi

busco --version | sed "s/BUSCO //" > busco.version.txt

Command exit status: 1

Command output: INFO: ** Start plot generation at 06/16/2021 13:00:10 ** INFO: Load data ... INFO: Loaded ./short_summary.specific_lineage.bacteria_odb10.SPAdes-test_minigut_sample2_unbinned_2_fa.txt successfully INFO: Loaded ./short_summary.specific_lineage.bacteria_odb10.SPAdes-test_minigut_sample2_unbinned_1_fa.txt successfully INFO: Generate the R code ... INFO: Run the R code ... WARNING: Impossible to run R. The package ggplot2 does not seem to be installed. Please check your R installation. See also the --no_r option to avoid this message INFO: Plot generation done with WARNING(s). Total running time: 0.45272064208984375 seconds INFO: Results written in ./

Command error: mv: cannot stat ‘busco_figure.png’: No such file or directory

Work dir: /isilon/users/nextflow/work/c4/7b648e91e21f81add869c00ca8fbe8

Tip: you can try to figure out what's wrong by changing to the process work dir and showing the script file named .command.sh

Second Execution cancelled -- Finishing pending tasks before exit Creating Conda env: bioconda::multiqc=1.9 [cache /isilon/users/nextflow/work/conda/env-4a042df8307c1458fddcec83f7dc9ead] -[nf-core/mag] Pipeline completed with errors- Error executing process > 'MAG:GET_SOFTWARE_VERSIONS'

Caused by: Process MAG:GET_SOFTWARE_VERSIONS terminated with an error exit status (1)

Command executed:

echo 2.0.0 > pipeline.version.txt echo 21.04.1 > nextflow.version.txt scrape_software_versions.py &> software_versions_mqc.yaml

Command exit status: 1

Command output: (empty)

Expected behaviour

Successfully finish the test run through the pipeline. When the error said there was an issue with ggplot2 I tried to install it directly into the conda environment, but it did not fix the issue. When I tried to skip the BUSCO_PLOT step, I ended up with a different error in GET_SOFTWARE_VERSIONS.

Log files

Have you provided the following extra information/files:

[x] The command used to run the pipeline
[x] The .nextflow.log file

System

Hardware: HPC
Executor: SGE qsub
OS: CentOS Linux
Version: 7

Nextflow Installation

Version: 21.04.1 build 5556

Container engine

Engine: conda
version: 4.10.1
Image tag: nfcore/mag:2.0.0

Additional context

I have run this successfully before, I am not sure what is causing the issue. I get the same error using multiple different input files, not just running the test.

d4straub commented 3 years ago

Hi there!

I had a look at the attached log files and those are identical, but I had the impression these should differ?!
what is the --no_r parameter supposed to be doing? I cannot find it/remember it for this pipeline.
The test seems to fail because the software environment is not available, could you test with docker/singularity/... or such?
I have no idea whats in your nextflow.conf, so thats hard to troubleshoot.

When the error said there was an issue with ggplot2 I tried to install it directly into the conda environment, but it did not fix the issue.

This cannot work, the conda environment used by the pipeline is set up to work always, ggplot2 is definitely in there, thats not the issue. But it seems not accessible.

When I tried to skip the BUSCO_PLOT step, I ended up with a different error in GET_SOFTWARE_VERSIONS.

That indicates that all/many of the conda environments are not accessible at that time.

I have run this successfully before, I am not sure what is causing the issue. I get the same error using multiple different input files, not just running the test.

Yes, that seems unrelated to the pipeline code, the pipeline software or the input data. To me it looks like a computation resource (i.e. cluster) problem.

Clairmontl commented 3 years ago

.nextflow.log testscript2.sh.o1465804.txt testscript2.txt nextflow.conf.txt

I don't seem to have the old log file anymore, I must have deleted it after a few dozen additional tries. I have uploaded here the log from a new run I just did without the --no_r option (skips the busco plot in R). The second file is the output file, the third is the script I used to run the pipeline and the last is the conf file. Our cluster currently does not support singularity or docker unfortunately. If there is anything you can do to help me troubleshoot this I would be very greatful.

d4straub commented 3 years ago

This run seems quite fine except the busco plotting. Could you resume with adding --skip_busco? I am somewhat certain that --no_r doesnt work.

Clairmontl commented 3 years ago

.nextflow.log testscript2.sh.txt testscript2.sh.o1484814.txt .nextflow.log testscript.sh.txt testscript.sh.o1484815.txt

Good morning, I attempted resuming the run with --skip_busco on two different sets of data (first set was test dataset, second was a minimal dataset), I have attached the log files, script to run the program in qsub and the output files for both runs. Both times it did not let me proceed.

d4straub commented 3 years ago

The error messages are crystal clear, in the first case add a --busco_reference false.

script to run the program in qsub

That seems odd, nextflow is supposed to be run on the head node instead of a compute node (not with qsub). Could you please look into this?

Clairmontl commented 3 years ago

We are not allowed to run jobs in the biocluster head node, only using qlogin or qsub. So unfortunately that is not something I can change. With the added -- busco_reference false, I ended up with another get software versions error...sorry. I have attached the log file, output file and command file. .nextflow.log testscript2.sh.o1484828.txt testscript2.sh.txt

d4straub commented 3 years ago

Running nextflow is not a job, its the workflow management that is submitting jobs. I am surprised that the workflow runs at all like this. Speak with your system administrator to solve this. This might or might not solve your current software issue.

Edit: Just in this minute a colleague reported weird non-reproducible problems when submitting the nextflow command to a node instead of running it on the headnode. So please discuss this with your sys-admin!

ropolomx commented 3 years ago

Hi everyone! I will contribute to this issue thread as I work with the same cluster as @Clairmontl, and I have ran previous versions of mag submitting nextflow as a job in qsub. I see the comments about running it on the head node instead of submitting Nextflow as a job. We are following recommendations from the 5 Nextflow Tips for HPC Users blog in Nextflow's page, which encourages users to submit nextflow itself as a job.

I just tested launching Nextflow from the head node to run nf-core/mag -r 2.0.0 in that cluster, and it works. This is the command that I used:

nextflow run nf-core/mag -r 2.0.0 -profile test,conda -c sge.conf -bg

It is very important to note that there are two key elements for running nextflow itself safely in the head node:

As recommended in nf-core/mag's documentation, it is important to set nextflow's memory requirements to avoid that it hoards the head node. The suggestion from the documentation is:

In some cases, the Nextflow Java virtual machines can start to request a large amount of memory. We recommend adding the following line to your environment to limit this (typically in ~/.bashrc or ~./bash_profile):
NXF_OPTS='-Xms1g -Xmx4g'

If you run nextflow in the head node of our cluster, you must set the executor as sge. This can be done in the configuration file, or exporting a variable in your ~/.bashrc file. More information in the 5 Nextflow tips for HPC users blog.

I have been testing something else in the cluster that we use. I have made a Conda cache directory in a shared folder to which everyone has access. The goal of this cache is to centralize the conda environments for all the nf-core pipelines so that each user doesn't need to download environments every time, and this way we keep the same environments for all users running the same pipelines and version. We can chat more about this and I can share the path where I have this, and how to edit your ~/.bashrc to export this variable so that your installation of nextflow knows where it is.

ropolomx commented 3 years ago

@Clairmontl, @d4straub: that said, I was also able to run nextflow as a job in the cluster (also testing the mag test profile). nextflow still orchestrates the submission of jobs to the cluster with the sge executor (qsub):

With test data profile. Below is the output from qstat. The first line is the main nextflow job, which is launching the others. I've done this with nf-core/rnaseq in another cluster that has SLURM instead of SGE, and it works too.

1485126 0.00500 nf-core_ma ortegapoloro r     06/22/2021 00:37:06 all.q@biocomp-2-3.local            1
1485160 0.00627 nf-MAG_MET ortegapoloro r     06/22/2021 00:42:06 all.q@biocomp-2-4.local            2
1485161 0.00627 nf-MAG_MET ortegapoloro r     06/22/2021 00:42:06 all.q@biocomp-1-2.local            2
1485162 0.00627 nf-MAG_MET ortegapoloro r     06/22/2021 00:42:06 all.q@biocomp-2-3.local            2

ropolomx commented 3 years ago

If nothing else to add, I might close this issue. The nf-core pipeline run seems to be working fine when Nextflow is submitted as a job, just as it worked well running Nextflow in the head node with the correct executor.

d4straub commented 3 years ago

Happy to hear that submitting nextflow as a job works for you. In my and my colleagues experience it can make trouble, therefore I am wary about nextflow as a job when problems occur. But good to know that it can work indeed. Thanks for helping out here!

nf-core / mag

BUSCO_PLOT error while running TEST and cannot proceed #213

Check Documentation

Description of the bug

Steps to reproduce

replace dots in bin names within summary file names by underscores

Expected behaviour

Log files

System

Nextflow Installation

Container engine

Additional context