nf-core / demultiplex

Demultiplexing pipeline for sequencing data
https://nf-co.re/demultiplex
MIT License
44 stars 37 forks source link

Add option to run CheckQC after demultiplexing with bcl2fastq #143

Closed matrulda closed 3 months ago

matrulda commented 1 year ago

Description of feature

Hi!

We have an application to check if our sequencing runs fulfill certain (customisable) QC criteria after demultiplexing: https://github.com/Molmed/checkQC. We are considering using the demultiplex pipeline to do our demultiplexing in the future, and I'm wondering if it would be in scope to include an option to run CheckQC as part of the pipeline. Right now it only supports bcl2fastq output, but will support BclConvert in the future. Me and my colleagues are willing to work on this feature.

Let me know what you think :)

edmundmiller commented 1 year ago

That sounds awesome! Supports more people, worse case if someone doesn't want to run checkQC then they just skip it? Am I missing a drawback?

Would you or your colleges be interested in creating a nf-core module and then creating the PR to add it to this pipeline?

matrulda commented 1 year ago

Yeah, exactly! Totally skippable. I don't see any drawbacks with it, my only concern was if you would find it to be out of scope of what should be included in this pipeline.

We can definitely look into making a nf-core module for checkQC :+1:

apeltzer commented 4 months ago

We will work on this being added to demultiplexing, stay tuned. Have it running internally already, will contribute that now upstream here in demultiplexing.

atrigila commented 3 months ago

There is a PR for the nf-core module: https://github.com/nf-core/modules/pull/4158 that depends on an update on Biocontainers. The last update of checkQC was yesterday and it includes changes to avoid incompatibility with interop.

atrigila commented 3 months ago

The new biocontainers update (checkqc 4.0.4) does not solve the issue with interop. I will test if a community wave container can solve this issue.

atrigila commented 3 months ago

The internal working checkqc module uses a custom dockerfile that installs checkqc=3.8.0 and interop==1.2.4 from pip.

I tested using an old biocontainers with checkqc==3.8 but the error observed in the module still persists. This is because the bioconda recipe for checkqc does not correctly install the interop module.

I used wave to build a working public community image that can be used instead of biocontainers: community.wave.seqera.io/library/python_pip_interop_checkqc:d76c912c8fadc561. This can be used in the module developed by @matrulda to overcome the biocontainers issue. Next, I will try to incorporate this container into the module and test it.

atrigila commented 3 months ago

The module was merged and I am now working on adding it to the pipeline. The test data for the pipeline needs to have more sequencing run metadata, such as runParameters.xml.

This file has been recently included in sample data from test-datasets. I've used that samplesheet with the bcl2fastq demultiplexer and collected results into a checkqc directory. However, I encountered an error when running checkqc:

INFO     ------------------------
INFO     Starting checkQC (3.8.2)
INFO     ------------------------
INFO     Runfolder is: checkqc_dir/
INFO     No config file specified, using default config from /opt/conda/lib/python3.12/site-packages/checkQC/default_config/config.yaml.
ERROR    No reagent version specified for this instrument type

Comparing the nf-core test-dataset runParameters.xml (left) file to the runParameters.xml (right) resources file provided by checkqc, I think the issue might be in this line:

Image

I will see if we can get some other test files with the correct specification.