Open ewels opened 5 years ago
My understanding of the user story:
A developer writes a workflow and wishes to add unit tests for one or more processes.
She prepares one or more sets of inputs for the processes being tested.
She then writes a test file that would look something like:
// First test
process firstTest {
// List of processes that are allowed to run
runProcesses 'bqsr bwa'
// A list of upstream process outputs that should be used
upstreamOutputs {
'fastqc': ['testdata/firstTest/input/data_R1.fastq.gz', 'testdata/firstTest/input/data_R2.fastq.gz'],
'multiqc': ['testdata/firstTest/input/multiqc.html']
}
// Code to run to test the output
test:
'''
#!/usr/bin/env python
# Insert test code here
'''
}
The developer can then either manually run the test, or incorporate into into CI, with a command like
nextflow test tests/firstTest.nft
I have a few suggested ideas for consideration:
Along with a superlenient
hash method, we could implement a new process executor, none
, that will cause the pipeline to fail if a process is launched that has it as an executor. This will allow us to use a Nextflow config file to assign this as the executor for all processes that should not be executed and thus prevent accidental run-away execution.
Above I suggested a Nextflow-based test script, but I think we can re-use an existing test framework. For example, we could extend Python's unittest classes to create an nfunittest class. This would then have a setup step that generates a nextflow config file and runs the pipeline according to the test specification, and then runs tests of the data in Python. For example:
import nfunittest
class FirstTest(nfunittest.TestCase):
def setUp(self):
self.runProcesses = ['bqsr', 'bwa']
upstreamOutputs = [
'fastqc': ['testdata/firstTest/input/data_R1.fastq.gz', 'testdata/firstTest/input/data_R2.fastq.gz'],
'multiqc': ['testdata/firstTest/input/multiqc.html']
]
def test_bwa(self):
out_file = self.outputDir + '/data.bam'
...
Such test scripts can make use of the large amount of existing test profiling tools and methodologies, rather than writing something new.
We would probably keep the nftest tools/code in a separate repo since it's not written in groovy.
I also don't think we're going to be able to get the correct hash name for each process, so prefer the injection of a storeDir
in a nextflow config file for all processes. We would, however, need a way for the processes' name to be included in the storeDir directive, and I can't see a way to do that currently with config files. Might need that to be added to Nextflow.
The user would then have to create empty/dummy files for all preceding, unused processes in the form
testdata/my_test/input/[PROCESS_NAME]/[OUTPUT_FILENAME]
The contents of testdata/my_test/input
would then be copied to a temporary working directory that will be injected as the storeDir
(+ process name) for each process via a config file.
All process that aren't to be executed should then have the none
executor (mentioned above) and the test script will run Nextflow. This will:
Thoughts?
Great! Makes a lot of sense :+1:
One nitpick: I love the executor: none
idea but maybe nextflow should exit successfully instead of with a failure? This would be more helpful for the test exit status check.
Is there a comparable unit testing framework in java? Nextflow already had unit tests, so I guess there must be. It would be nice to keep this inside nextflow and not a separate program if possible I think.
Phil
Also: instead of telling downstream processes not to run, it could be better to squash the output channels of the selected process that will run. Then we don’t need to know the shape of the DAG before writing the config - nextflow can just pick one process at a time and squash its output channels.
Note that I think executor: none
could still be a generally useful thing to have though. This would make it easy for people to write a custom config script that selectively disables parts of other people’s pipelines for example. At the moment we have tonnes of when: !params.skipProcessFoo
in a few pipelines which could be removed with this for example.
Has anyone given a thought how this looks with DSL-2 being on the table? I'd like to be able to unittest a process in a module
There was quite a bit of discussion around this at the 2019 meeting. However, I've not seen any working examples yet.
just cross-referencing as this popped up in the same search https://code.askimed.com/nf-test/getting-started/
Disclaimer: I had at best a 5 min glimpse at nf-test
Thanks @sfehrmann! This GitHub issue is documenting a Nextflow user meeting from 4 years ago, it's not an issue for active development :) nf-test didn't exist at the time, but you're absolutely right that it's a great tool 👍🏻 So good for future googlers..
There's also this: https://github.com/LUMC/pytest-workflow
Notes from our discussion about how a new generic unit testing module could work:
Testing would essentially act as a wrapper, with three steps:
storeDir
This tool could then be run, specifying which process should be tested. It could then be run in parallel for each process of interest.