Closed Clairmontl closed 3 years ago
Hi there!
--no_r
parameter supposed to be doing? I cannot find it/remember it for this pipeline.nextflow.conf
, so thats hard to troubleshoot.When the error said there was an issue with ggplot2 I tried to install it directly into the conda environment, but it did not fix the issue.
This cannot work, the conda environment used by the pipeline is set up to work always, ggplot2 is definitely in there, thats not the issue. But it seems not accessible.
When I tried to skip the BUSCO_PLOT step, I ended up with a different error in GET_SOFTWARE_VERSIONS.
That indicates that all/many of the conda environments are not accessible at that time.
I have run this successfully before, I am not sure what is causing the issue. I get the same error using multiple different input files, not just running the test.
Yes, that seems unrelated to the pipeline code, the pipeline software or the input data. To me it looks like a computation resource (i.e. cluster) problem.
.nextflow.log testscript2.sh.o1465804.txt testscript2.txt nextflow.conf.txt
I don't seem to have the old log file anymore, I must have deleted it after a few dozen additional tries. I have uploaded here the log from a new run I just did without the --no_r option (skips the busco plot in R). The second file is the output file, the third is the script I used to run the pipeline and the last is the conf file. Our cluster currently does not support singularity or docker unfortunately. If there is anything you can do to help me troubleshoot this I would be very greatful.
This run seems quite fine except the busco plotting. Could you resume with adding --skip_busco
?
I am somewhat certain that --no_r
doesnt work.
.nextflow.log testscript2.sh.txt testscript2.sh.o1484814.txt .nextflow.log testscript.sh.txt testscript.sh.o1484815.txt
Good morning, I attempted resuming the run with --skip_busco on two different sets of data (first set was test dataset, second was a minimal dataset), I have attached the log files, script to run the program in qsub and the output files for both runs. Both times it did not let me proceed.
The error messages are crystal clear, in the first case add a --busco_reference false
.
script to run the program in qsub
That seems odd, nextflow is supposed to be run on the head node instead of a compute node (not with qsub
). Could you please look into this?
We are not allowed to run jobs in the biocluster head node, only using qlogin or qsub. So unfortunately that is not something I can change. With the added -- busco_reference false, I ended up with another get software versions error...sorry. I have attached the log file, output file and command file. .nextflow.log testscript2.sh.o1484828.txt testscript2.sh.txt
Running nextflow is not a job, its the workflow management that is submitting jobs. I am surprised that the workflow runs at all like this. Speak with your system administrator to solve this. This might or might not solve your current software issue.
Edit: Just in this minute a colleague reported weird non-reproducible problems when submitting the nextflow command to a node instead of running it on the headnode. So please discuss this with your sys-admin!
Hi everyone! I will contribute to this issue thread as I work with the same cluster as @Clairmontl, and I have ran previous versions of mag
submitting nextflow
as a job in qsub
. I see the comments about running it on the head node instead of submitting Nextflow as a job. We are following recommendations from the 5 Nextflow Tips for HPC Users blog in Nextflow's page, which encourages users to submit nextflow
itself as a job.
I just tested launching Nextflow from the head node to run nf-core/mag -r 2.0.0
in that cluster, and it works. This is the command that I used:
nextflow run nf-core/mag -r 2.0.0 -profile test,conda -c sge.conf -bg
It is very important to note that there are two key elements for running nextflow
itself safely in the head node:
nf-core/mag
's documentation, it is important to set nextflow
's memory requirements to avoid that it hoards the head node. The suggestion from the documentation is: In some cases, the Nextflow Java virtual machines can start to request a large amount of memory. We recommend adding the following line to your environment to limit this (typically in ~/.bashrc or ~./bash_profile):
NXF_OPTS='-Xms1g -Xmx4g'
nextflow
in the head node of our cluster, you must set the executor as sge
. This can be done in the configuration file, or exporting a variable in your ~/.bashrc
file. More information in the 5 Nextflow tips for HPC users blog.I have been testing something else in the cluster that we use. I have made a Conda cache directory in a shared folder to which everyone has access. The goal of this cache is to centralize the conda environments for all the nf-core pipelines so that each user doesn't need to download environments every time, and this way we keep the same environments for all users running the same pipelines and version. We can chat more about this and I can share the path where I have this, and how to edit your ~/.bashrc to export this variable so that your installation of nextflow
knows where it is.
@Clairmontl, @d4straub: that said, I was also able to run nextflow
as a job in the cluster (also testing the mag
test profile). nextflow
still orchestrates the submission of jobs to the cluster with the sge
executor (qsub
):
With test data profile. Below is the output from qstat
. The first line is the main nextflow
job, which is launching the others. I've done this with nf-core/rnaseq
in another cluster that has SLURM instead of SGE, and it works too.
1485126 0.00500 nf-core_ma ortegapoloro r 06/22/2021 00:37:06 all.q@biocomp-2-3.local 1
1485160 0.00627 nf-MAG_MET ortegapoloro r 06/22/2021 00:42:06 all.q@biocomp-2-4.local 2
1485161 0.00627 nf-MAG_MET ortegapoloro r 06/22/2021 00:42:06 all.q@biocomp-1-2.local 2
1485162 0.00627 nf-MAG_MET ortegapoloro r 06/22/2021 00:42:06 all.q@biocomp-2-3.local 2
If nothing else to add, I might close this issue. The nf-core pipeline run seems to be working fine when Nextflow is submitted as a job, just as it worked well running Nextflow in the head node with the correct executor.
Happy to hear that submitting nextflow as a job works for you. In my and my colleagues experience it can make trouble, therefore I am wary about nextflow as a job when problems occur. But good to know that it can work indeed. Thanks for helping out here!
Check Documentation
I have checked the following places for your error:
Description of the bug
.nextflow.log .nextflow.log
Steps to reproduce
Steps to reproduce the behaviour:
Command line: nextflow run nf-core/mag -profile test,conda --no_r -c nextflow.conf -r 2.0.0 -resume
See error: First (without --no_r)
Error executing process > 'MAG:BUSCO_QC:BUSCO_PLOT (SPAdes-test_minigut_sample2)'
Caused by: Process
MAG:BUSCO_QC:BUSCO_PLOT (SPAdes-test_minigut_sample2)
terminated with an error exit status (1)Command executed:
if [ -n "short_summary.specific_lineage.bacteria_odb10.SPAdes-test_minigut_sample2.unbinned.2.fa.txt short_summary.specific_lineage.bacteria_odb10.SPAdes-test_minigut_sample2.unbinned.1.fa.txt" ] then
replace dots in bin names within summary file names by underscores
fi
busco --version | sed "s/BUSCO //" > busco.version.txt
Command exit status: 1
Command output: INFO: ** Start plot generation at 06/16/2021 13:00:10 ** INFO: Load data ... INFO: Loaded ./short_summary.specific_lineage.bacteria_odb10.SPAdes-test_minigut_sample2_unbinned_2_fa.txt successfully INFO: Loaded ./short_summary.specific_lineage.bacteria_odb10.SPAdes-test_minigut_sample2_unbinned_1_fa.txt successfully INFO: Generate the R code ... INFO: Run the R code ... WARNING: Impossible to run R. The package ggplot2 does not seem to be installed. Please check your R installation. See also the --no_r option to avoid this message INFO: Plot generation done with WARNING(s). Total running time: 0.45272064208984375 seconds INFO: Results written in ./
Command error: mv: cannot stat ‘busco_figure.png’: No such file or directory
Work dir: /isilon/users/nextflow/work/c4/7b648e91e21f81add869c00ca8fbe8
Tip: you can try to figure out what's wrong by changing to the process work dir and showing the script file named
.command.sh
Second Execution cancelled -- Finishing pending tasks before exit Creating Conda env: bioconda::multiqc=1.9 [cache /isilon/users/nextflow/work/conda/env-4a042df8307c1458fddcec83f7dc9ead] -[nf-core/mag] Pipeline completed with errors- Error executing process > 'MAG:GET_SOFTWARE_VERSIONS'
Caused by: Process
MAG:GET_SOFTWARE_VERSIONS
terminated with an error exit status (1)Command executed:
echo 2.0.0 > pipeline.version.txt echo 21.04.1 > nextflow.version.txt scrape_software_versions.py &> software_versions_mqc.yaml
Command exit status: 1
Command output: (empty)
Expected behaviour
Successfully finish the test run through the pipeline. When the error said there was an issue with ggplot2 I tried to install it directly into the conda environment, but it did not fix the issue. When I tried to skip the BUSCO_PLOT step, I ended up with a different error in GET_SOFTWARE_VERSIONS.
Log files
Have you provided the following extra information/files:
.nextflow.log
fileSystem
Nextflow Installation
Container engine
Additional context
I have run this successfully before, I am not sure what is causing the issue. I get the same error using multiple different input files, not just running the test.