nf-core / modules

Repository to host tool-specific module files for the Nextflow DSL2 community!
https://nf-co.re/modules
MIT License
258 stars 654 forks source link

[FEATURE] Add `stub` support to every module #4570

Open ewels opened 7 months ago

ewels commented 7 months ago

Using command stub blocks in Nextflow is useful when developing syntax and quickly testing pipelines. However, its use within nf-core is limited because the vast majority of nf-core modules do not have stub blocks.

My suggestion is to require that every module should have a stub block. All 1095 of them :trollface:

We can lint for the presence of this in linting and also potentially run with -stub-run in the module CI.

Although adding a stub block for one or two modules is not difficult, adding it for all will be a significant effort. This could be a nice project focus for a future hackathon(s).

### Tasks
GallVp commented 6 months ago

Hi @ewels

This is such a good idea and can be immensely helpful when implementing dataflows.

Here is something that I have been wondering about and I thought I'll share it here to get your feedback. With nf-test snapshot feature, it may be possible to partially automate stub generation and testing.

If there is a function in nf-test (snapshot.matchStub("test-name")) which compares the structure of process outputs against a snapshot while ignoring md5sums, it may be able to suggest stub code such as:

touch ${prefix}.gff3
touch ${prefix}.report.html

...versions.yml...

It won't be able to tackle complex situations but can provide starter code. Moreover, the stub testing will become more robust by expecting a complete match of output structure across script and stub for the same inputs and configuration.

Here is a pull request where I have tried to implement something similar but manually: https://github.com/nf-core/modules/pull/4627

ewels commented 6 months ago

That would be great 👌🏻 @mashehu / @mirpedrol - what do you think: could we make a #tools command to scaffold a stub from nf-test outputs / snapshots?

GallVp commented 6 months ago

I think if snapshot.matchStub("test-name") is implemented by nf-test, nf-core tools might not have to change at all. I am hesitant to commit time at this point but keen to contribute after March 2024. Probably someone will pick it up before that otherwise, I am keen to give it a go.

GallVp commented 6 months ago

I have tried to implement something similar in logic to snapshot.matchStub for the fastp module here: https://github.com/nf-core/modules/pull/4637

Each test creates a sorted list of outputs which should be matched by the same test with the -stub option.

{
    assert snapshot(
        (
            [process.out.reads[0][0].toString()] + // meta
            process.out.reads.collect { file(it[1]).getName() } +
            process.out.json.collect { file(it[1]).getName() } +
            process.out.html.collect { file(it[1]).getName() } +
            process.out.log.collect { file(it[1]).getName() } +
            process.out.reads_fail.collect { file(it[1]).getName() } +
            process.out.reads_merged.collect { file(it[1]).getName() }
        ).sort()
    ).match("test_fastp_single_end-for_stub_match")
}
GallVp commented 6 months ago

On @sateeshperi suggestion, also raised it on nf-test https://github.com/askimed/nf-test/issues/168

famosab commented 2 weeks ago

Can we create a nice comprehensive list where the stub is still missing (such as was done for other batch changes)?

GallVp commented 2 weeks ago

Good idea @famosab

As of https://github.com/nf-core/modules/tree/d5b47a24314cab9f64593f29cf97a64b0acc7dce, there are 512 modules without stub:

mods=($(find ./modules/nf-core -name main.nf)) 
for file in $mods; do grep -q 'stub:' "$file" || sed -n 's/process \(.*\) {/- \1/p' "$file"; done | sort -V