Open ewels opened 7 months ago
Hi @ewels
This is such a good idea and can be immensely helpful when implementing dataflows.
Here is something that I have been wondering about and I thought I'll share it here to get your feedback. With nf-test snapshot feature, it may be possible to partially automate stub
generation and testing.
If there is a function in nf-test (snapshot.matchStub("test-name")
) which compares the structure of process outputs against a snapshot while ignoring md5sums, it may be able to suggest stub code such as:
touch ${prefix}.gff3
touch ${prefix}.report.html
...versions.yml...
It won't be able to tackle complex situations but can provide starter code. Moreover, the stub
testing will become more robust by expecting a complete match of output structure across script
and stub
for the same inputs and configuration.
Here is a pull request where I have tried to implement something similar but manually: https://github.com/nf-core/modules/pull/4627
That would be great 👌🏻 @mashehu / @mirpedrol - what do you think: could we make a #tools command to scaffold a stub
from nf-test outputs / snapshots?
I think if snapshot.matchStub("test-name")
is implemented by nf-test, nf-core tools might not have to change at all. I am hesitant to commit time at this point but keen to contribute after March 2024. Probably someone will pick it up before that otherwise, I am keen to give it a go.
I have tried to implement something similar in logic to snapshot.matchStub
for the fastp module here: https://github.com/nf-core/modules/pull/4637
Each test creates a sorted list of outputs which should be matched by the same test with the -stub
option.
{
assert snapshot(
(
[process.out.reads[0][0].toString()] + // meta
process.out.reads.collect { file(it[1]).getName() } +
process.out.json.collect { file(it[1]).getName() } +
process.out.html.collect { file(it[1]).getName() } +
process.out.log.collect { file(it[1]).getName() } +
process.out.reads_fail.collect { file(it[1]).getName() } +
process.out.reads_merged.collect { file(it[1]).getName() }
).sort()
).match("test_fastp_single_end-for_stub_match")
}
On @sateeshperi suggestion, also raised it on nf-test https://github.com/askimed/nf-test/issues/168
Can we create a nice comprehensive list where the stub is still missing (such as was done for other batch changes)?
Good idea @famosab
As of https://github.com/nf-core/modules/tree/d5b47a24314cab9f64593f29cf97a64b0acc7dce, there are 512 modules without stub:
mods=($(find ./modules/nf-core -name main.nf))
for file in $mods; do grep -q 'stub:' "$file" || sed -n 's/process \(.*\) {/- \1/p' "$file"; done | sort -V
Using command
stub
blocks in Nextflow is useful when developing syntax and quickly testing pipelines. However, its use within nf-core is limited because the vast majority of nf-core modules do not havestub
blocks.My suggestion is to require that every module should have a
stub
block. All 1095 of them :trollface:We can lint for the presence of this in linting and also potentially run with
-stub-run
in the module CI.Although adding a stub block for one or two modules is not difficult, adding it for all will be a significant effort. This could be a nice project focus for a future hackathon(s).