nextflow-io / nextflow

A DSL for data-driven computational pipelines
http://nextflow.io
Apache License 2.0
2.73k stars 626 forks source link

Detached / background processes #937

Closed toniher closed 5 years ago

toniher commented 5 years ago

I do not know if this might be something a bit against the general data flow philosophy, but I wonder if it might be interesting to have some kind of detached / background processes that would only end at the end of the pipeline but they would be kept continue working all the way along before...

A use case could be a database (a webserver, whatever, etc.) that is launched and it is used by the rest of the pipelined processes and it is then shutdown only at the end. Now I'm starting and stopping this kind of processes (using Singularity and SGE queue system) outside of Nextflow.

pditommaso commented 5 years ago

This opens an interesting point, if NF should be also used as services orchestrator. However I'm still not understanding what's the value to keep an external service up and running once the main workflow as ended ?

toniher commented 5 years ago

Sorry if stated little clearly... (External) Service (e.g. a DBMS) should be kept running after its process is triggered, but I think it would make sense to stop it at the end of the workflow...

pditommaso commented 5 years ago

May at some point there will be a NF back-end, however for the scenario you are proposing it looks to me that even a simple basic bash wrapper that 1) launch the DB service, 2) run the workflow and 3) stop the DB, should work.

toniher commented 5 years ago

Thanks @pditommaso , I will do that way for now, I will run a bash script with nohup with nextflow command inside...

pditommaso commented 5 years ago

👍

Heroico commented 4 years ago

Good day!

I tried running nextflow with nohup, and it didn't work for me. I try nohup because at my current cluster, ssh sessions terminate after 1 hour without user input, and I hoped nohup could keep the workflow running.

I have a bash script like the following:

#!/bin/bash

NXF_VER=19.10.0 nextflow run atacseq.nf -resume

And when I run with nohup, nextflow gets stopped:

$ nohup ./run.sh > log.txt 2>&1 &
[1] 3276
$ 
[1]+  Stopped                 nohup ./run.sh > log.txt 2>&1

I see nothing amiss in nextflow's log:

May-19 08:00:39.683 [main] DEBUG nextflow.cli.Launcher - $> nextflow run atacseq.nf -resume
May-19 08:00:39.790 [main] INFO  nextflow.cli.CmdRun - N E X T F L O W  ~  version 19.10.0
May-19 08:00:39.807 [main] INFO  nextflow.cli.CmdRun - Launching `atacseq.nf` [extravagant_jones] - revision: ee02651712
May-19 08:00:39.832 [main] DEBUG nextflow.config.ConfigBuilder - Found config local: /scratch/abarbeira3/kk/nextflow.config
May-19 08:00:39.833 [main] DEBUG nextflow.config.ConfigBuilder - Parsing config file: /scratch/abarbeira3/kk/nextflow.config
May-19 08:00:39.861 [main] DEBUG nextflow.config.ConfigBuilder - Applying config profile: `standard`
May-19 08:00:40.503 [main] WARN  nextflow.config.ConfigBuilder - It appears you have never run this project before -- Option `-resume` is ignored
May-19 08:00:40.546 [main] DEBUG nextflow.extension.OperatorEx - Dataflow extension methods: branch,buffer,chain,choice,collate,collect,collectFile,combine,concat,count,countBy,countFasta,countFastq,countLines,countText,cross,distinct,filter,first,flatMap,flatten,fork,groupBy,groupTuple,ifEmpty,into,join,last,map,max,mean,merge,min,mix,phase,print,println,randomSample,reduce,separate,set,splitCsv,splitFasta,splitFastq,splitText,spread,subscribe,sum,take,tap,toDouble,toFloat,toInteger,toList,toLong,toSortedList,transpose,unique,until,view
May-19 08:00:40.553 [main] DEBUG nextflow.Session - Session uuid: 2b520fae-6bf8-4476-ae87-4c4aef7b94bf
May-19 08:00:40.553 [main] DEBUG nextflow.Session - Run name: extravagant_jones
May-19 08:00:40.554 [main] DEBUG nextflow.Session - Executor pool size: 28
May-19 08:00:40.571 [main] DEBUG nextflow.cli.CmdRun - 
  Version: 19.10.0 build 5170
  Created: 21-10-2019 15:07 UTC (10:07 CDT)
  System: Linux 2.6.32-573.12.1.el6.x86_64
  Runtime: Groovy 2.5.8 on Java HotSpot(TM) 64-Bit Server VM 1.8.0_51-b16
  Encoding: UTF-8 (ANSI_X3.4-1968)
  Process: 3280@cri16in002 [10.50.84.251]
  CPUs: 28 - Mem: 125.9 GB (56.2 GB) - Swap: 128 GB (67.6 GB)
May-19 08:00:40.611 [main] DEBUG nextflow.Session - Work-dir: /scratch/abarbeira3/kk/work [gpfs]
May-19 08:00:40.612 [main] DEBUG nextflow.Session - Script base path does not exist or is not a directory: /scratch/abarbeira3/kk/bin
May-19 08:00:40.736 [main] DEBUG nextflow.Session - Observer factory: TowerFactory
May-19 08:00:40.738 [main] DEBUG nextflow.Session - Observer factory: DefaultObserverFactory
May-19 08:00:40.960 [main] DEBUG nextflow.Session - Session start invoked
May-19 08:00:41.303 [main] DEBUG nextflow.script.ScriptRunner - > Launching execution
May-19 08:00:41.351 [PathVisitor-1] DEBUG nextflow.file.PathVisitor - files for syntax: glob; folder: /gpfs/data/bioinformatics/abarbeira3/atac_seq/atac_seq_example/ATAC-seq-cfn-v1-NaturePaper/seqfiles/ATAC-seq_Testdata/; pattern: *_{1,2}.fastq.gz; options: [:]
May-19 08:00:41.548 [main] DEBUG nextflow.script.ProcessConfig - Config settings `withLabel:fqj` matches label `fqj` for process with name fastqc
May-19 08:00:41.553 [main] DEBUG nextflow.executor.ExecutorFactory - << taskConfig executor: pbs
May-19 08:00:41.553 [main] DEBUG nextflow.executor.ExecutorFactory - >> processorType: 'pbs'
May-19 08:00:41.563 [main] DEBUG nextflow.executor.Executor - [warm up] executor > pbs
May-19 08:00:41.570 [main] DEBUG n.processor.TaskPollingMonitor - Creating task monitor for executor 'pbs' > capacity: 10000; pollInterval: 5s; dumpInterval: 5m 
May-19 08:00:41.574 [main] DEBUG n.executor.AbstractGridExecutor - Creating executor 'pbs' > queue-stat-interval: 1m
May-19 08:00:41.614 [main] DEBUG nextflow.processor.TaskProcessor - Creating operator > fastqc -- maxForks: 28
May-19 08:00:41.650 [main] DEBUG nextflow.script.ProcessConfig - Config settings `withLabel:small_long` matches label `small_long` for process with name trim_galore
May-19 08:00:41.651 [main] DEBUG nextflow.executor.ExecutorFactory - << taskConfig executor: pbs
May-19 08:00:41.651 [main] DEBUG nextflow.executor.ExecutorFactory - >> processorType: 'pbs'
May-19 08:00:41.656 [main] DEBUG nextflow.processor.TaskProcessor - Creating operator > trim_galore -- maxForks: 28
May-19 08:00:41.676 [main] DEBUG nextflow.script.ProcessConfig - Config settings `withLabel:midf` matches label `midf` for process with name bowtie2_alignment
May-19 08:00:41.677 [main] DEBUG nextflow.executor.ExecutorFactory - << taskConfig executor: pbs
May-19 08:00:41.677 [main] DEBUG nextflow.executor.ExecutorFactory - >> processorType: 'pbs'
May-19 08:00:41.679 [main] DEBUG nextflow.processor.TaskProcessor - Creating operator > bowtie2_alignment -- maxForks: 28
May-19 08:00:41.738 [main] DEBUG nextflow.script.ProcessConfig - Config settings `withLabel:mid` matches label `mid` for process with name multiqc
May-19 08:00:41.748 [main] DEBUG nextflow.executor.ExecutorFactory - << taskConfig executor: pbs
May-19 08:00:41.749 [main] DEBUG nextflow.executor.ExecutorFactory - >> processorType: 'pbs'
May-19 08:00:41.755 [main] DEBUG nextflow.processor.TaskProcessor - Creating operator > multiqc -- maxForks: 28
May-19 08:00:41.785 [main] DEBUG nextflow.script.BaseScript - No entry workflow defined
May-19 08:00:41.788 [main] DEBUG nextflow.script.ScriptRunner - > Await termination 
May-19 08:00:41.788 [main] DEBUG nextflow.Session - Session await

Any suggestions to get nextflow running with nohup? Any other alternative that keeps nextflow running after session termination in a cluster would be appreciated too.

Thanks in advance!

pditommaso commented 4 years ago

Use the -bg option. See nextflow run -h for details.

Heroico commented 4 years ago

Thanks, -bg worked for me.

Unfortunately I don't see that option in the help:

nextflow run -h

Execute a pipeline project
Usage: run [options] Project name or repository url
  Options:
    -E
       Exports all current system environment
       Default: false
    -ansi-log
       Enable/disable ANSI console logging
    -bucket-dir
       Remote bucket where intermediate result files are stored
    -cache
       Enable/disable processes caching
    -dump-channels
       Dump channels for debugging purpose
    -dump-hashes
       Dump task hash keys for debugging purpose
       Default: false
    -e.
       Add the specified variable to execution environment
       Syntax: -e.key=value
       Default: {}
    -entry
       Entry workflow name to be executed
    -h, -help
       Print the command usage
       Default: false
    -hub
       Service hub where the project is hosted
    -latest
       Pull latest changes before run
       Default: false
    -lib
       Library extension path
    -name
       Assign a mnemonic name to the a pipeline run
    -offline
       Do not check for remote project updates
       Default: false
    -params-file
       Load script parameters from a JSON/YAML file
    -process.
       Set process options
       Syntax: -process.key=value
       Default: {}
    -profile
       Choose a configuration profile
    -qs, -queue-size
       Max number of processes that can be executed in parallel by each executor
    -resume
       Execute the script using the cached results, useful to continue
       executions that was stopped by an error
    -r, -revision
       Revision of the project to run (either a git branch, tag or commit SHA
       number)
    -test
       Test a script function with the name specified
    -user
       Private repository user name
    -with-conda
       Use the specified Conda environment package or file (must end with
       .yml|.yaml suffix)
    -with-dag
       Create pipeline DAG file
    -with-docker
       Enable process execution in a Docker container
    -N, -with-notification
       Send a notification email on workflow completion to the specified
       recipients
    -with-podman
       Enable process execution in a Podman container
    -with-report
       Create processes execution html report
    -with-singularity
       Enable process execution in a Singularity container
    -with-timeline
       Create processes execution timeline file
    -with-tower
       Monitor workflow execution with Seqera Tower service
    -with-trace
       Create processes execution tracing file
    -with-weblog
       Send workflow status messages via HTTP to target URL
    -without-docker
       Disable process execution with Docker
       Default: false
    -without-podman
       Disable process execution in a Podman container
    -w, -work-dir
       Directory where intermediate result files are stored
pditommaso commented 4 years ago

Oh, so we have found an issue then :D

jambler24 commented 3 years ago

Can confirm the -bg option is now found in the help :)

raf64flo commented 2 years ago

Is there any way to get the process re-attached again once -bg has been used to launch it? In a similar way than in a screen session?

Or any info on the process advancement?

pditommaso commented 2 years ago

Like any other Linux process => https://stackoverflow.com

cpommier commented 2 years ago

Hi, thanks @pditommaso . The problem we have is that once I launch nextflow with -bg the process doesn't end. I need to send a kill -15 to end it even if everything look to be completed (eg all files are generated as expected) . It migth be because I don't execute the last step using a when directive. Any hint ? Scrint haven't been very successfull , but I might need to play with it a little bit more.

pditommaso commented 2 years ago

Dont use -bg then and put in background with &

cpommier commented 2 years ago

OK, will give it a try