nextflow-io / nextflow

A DSL for data-driven computational pipelines
http://nextflow.io
Apache License 2.0
2.61k stars 605 forks source link

Allow dynamic `cache` directives using closures #5044

Open greenberga opened 3 weeks ago

greenberga commented 3 weeks ago

Closes #5022

netlify[bot] commented 3 weeks ago

Deploy Preview for nextflow-docs-staging canceled.

Name Link
Latest commit 23c5b647931dad07a75612638ae166d75e1791c4
Latest deploy log https://app.netlify.com/sites/nextflow-docs-staging/deploys/665e1123560c2e000878ef53
greenberga commented 3 weeks ago

@bentsherman I spent some time looking into this issue. There's a good chance I'm missing something, but as far as I can tell, the TaskConfig isn't aware of the hash mode. I can pull the cache property from task.config, but it's still a closure (i.e., it hasn't been evaluated).

Further, isn't the hash mode slightly different than whether or not to cache? The mode itself can be standard, lenient, deep, or sha256—but you can also set a boolean, in which case the mode will be null:

https://github.com/nextflow-io/nextflow/blob/4c54db6ac634cc80a820752d8f4edb959fc832e4/modules/nf-commons/src/main/nextflow/util/CacheHelper.java#L45-L50

Assuming there is some step where a script's process directives are merged with the directives from a config's process scope directives, it seems like you'd want to evaluate the closure at that point, in the context of the task. Let me know if that doesn't make sense. Thanks!

bentsherman commented 3 weeks ago

I think you just need to move the getHashMode() method from the process config to the task config, something like this:

    HashMode getHashMode() {
        HashMode.of(get('cache')) ?: HashMode.DEFAULT()
    }

The get('cache') will handle the closure evaluation.

greenberga commented 3 weeks ago

@bentsherman thanks for the point in the right direction. get('cache') does indeed seem to evaluate the closure properly. The only thing I can't quite figure out is why it's enough to move HashMode retrieval from the ProcessConfig to the TaskConfig. Whether get('cache') returns true or false, HashMode will be STANDARD.