nextflow-io / nextflow

A DSL for data-driven computational pipelines
http://nextflow.io
Apache License 2.0
2.72k stars 623 forks source link

Changes process directives should invalidate task cache on execution resume #2382

Closed mahesh-panchal closed 2 years ago

mahesh-panchal commented 3 years ago

Bug report

Expected behavior and actual behavior

When providing configuration settings via process.ext.<something>, updating the config and using -resume continues to use the cached version instead of the updated ext.<something> value. The expected behaviour is that when the config changes, the appropriate process is rerun.

Steps to reproduce the problem

main.nf:

#! /usr/bin/env nextflow

nextflow.enable.dsl = 2

workflow {

    FOO(params.message)

}

process FOO {

    input:
    val str

    script:
    def prefix = task.ext.prefix ?: 'prefix'
    """
    printf "%s\n", $str > ${prefix}.txt
    """
}

nextflow.config:

params.message = 'hello world'
process {
    withName: FOO {
        ext.prefix = 'hello'
    }
}

Steps to reproduce:

  1. Run nextflow run main.nf
  2. Update the nextflow.config to use process.ext.prefix = hola
  3. Run nextflow run main.nf -resume

Program output

$ nextflow run main.nf 
N E X T F L O W  ~  version 21.04.0
Launching `main.nf` [astonishing_mayer] - revision: 9021b457ad
executor >  local (1)
[b6/ca37d6] process > FOO [100%] 1 of 1 ✔

(edit nextflow.config) then:

$ nextflow run main.nf -resume
N E X T F L O W  ~  version 21.04.0
Launching `main.nf` [nice_mandelbrot] - revision: 9021b457ad
[b6/ca37d6] process > FOO [100%] 1 of 1, cached: 1 ✔

Environment

Additional context

The aim was to try and use process.ext.args in nf-core workflows to provide tool-specific parameters. See https://github.com/nf-core/rnaseq/pull/701 for more info.

pditommaso commented 2 years ago

This is the point where task.xxx variables get ignored

https://github.com/nextflow-io/nextflow/blob/da2a5ff1a5fbfde3de4228150fb5482769bd5f9b/modules/nextflow/src/main/groovy/nextflow/processor/TaskRun.groovy#L778-L780

pditommaso commented 2 years ago

This has been implemented as an experimental feature 62ded3421.

The cache is invalidated when any task directive variable is modified by setting a new value via the nextflow config.

This feature is considered experimental and needs to be enabled by setting the following variable

export NXF_ENABLE_CACHE_INVALIDATION_ON_TASK_DIRECTIVE_CHANGE=true
mahesh-panchal commented 2 years ago

Is this in a new snapshot or how can I test this?

pditommaso commented 2 years ago

Yes, run first this command to refresh your snapshot copy

NXF_VER=21.10.0-SNAPSHOT CAPSULE_RESET=true nextflow info

then

NXF_VER=21.10.0-SNAPSHOT nextflow run .. etc
mahesh-panchal commented 2 years ago

This works for me using the script above.

$ NXF_VER=21.10.0-SNAPSHOT CAPSULE_RESET=true nextflow info
CAPSULE: Downloading dependency io.nextflow:nf-httpfs:jar:21.10.0-20211101.112009-4
CAPSULE: Downloading dependency io.nextflow:nextflow:jar:21.10.0-20211101.112009-4
CAPSULE: Downloading dependency io.nextflow:nf-commons:jar:21.10.0-20211101.112009-4
  Version: 21.10.0-SNAPSHOT build 5639
  Created: 01-11-2021 11:20 UTC (12:20 CEST)
  System: Mac OS X 10.16
  Runtime: Groovy 3.0.9 on OpenJDK 64-Bit Server VM 11.0.9.1+1-LTS
  Encoding: UTF-8 (UTF-8)

$ NXF_VER=21.10.0-SNAPSHOT nextflow run main.nf -resume
N E X T F L O W  ~  version 21.10.0-SNAPSHOT
Launching `main.nf` [sleepy_williams] - revision: 9021b457ad
[b6/62c24b] process > FOO [100%] 1 of 1, cached: 1 ✔

$ export NXF_ENABLE_CACHE_INVALIDATION_ON_TASK_DIRECTIVE_CHANGE=true
$ NXF_VER=21.10.0-SNAPSHOT nextflow run main.nf -resume
N E X T F L O W  ~  version 21.10.0-SNAPSHOT
Launching `main.nf` [dreamy_sax] - revision: 9021b457ad
executor >  local (1)
[fa/6bb5ed] process > FOO [100%] 1 of 1 ✔

Thank you.

pditommaso commented 2 years ago

Changed this to default behaviour in the upcoming 21.12.0-edge. Therefore the change of any directive invalidates the tasks cache on resume by default. see 967c1adff.

The NXF_ENABLE_CACHE_INVALIDATION_ON_TASK_DIRECTIVE_CHANGE can still be used to turn this behaviour off.