Open grst opened 1 year ago
While MultiQC is indeed a natural choice for aggregation of QC metrics and statistics, I wonder if it's appropriate here. MultiQC usually takes the pre-defined output of specific tools; the output of the reports and their content is not from a single tool and might change in the future. I'm thinking that an additional Quarto report might be more appropriate here, but this can certainly be discussed. Thoughts, @cavenel ?
I'm open to both solutions. It will definitely be quicker to get something up and running with another quarto report. It might also need quite some custom code for parsing summary statistics, and at some point better to have a separate package.
For additional context, for single-cell data, there's the checkatlas package which generates a MultiQC report, and we're considering to add it to the #scrnaseq pipeline (https://github.com/nf-core/scrnaseq/issues/80). Here's an example report: https://checkatlas.readthedocs.io/en/stable/CheckAtlas_example_2/CheckAtlas_example_2.html
So one option could be to make a similar package, or add spatial support to checkatlas. CC @drbecavin
I do like the general idea of using MultiQC, if only because it's so common for nf-core pipelines, but it's probably a lot more effort. While getting another Quarto report might be a good solution for now, it can certainly be discussed whether to spend the effort to create some interface for MultiQC in the future. As it is, the current Quarto reports could also get some additional work and prettifying in addition to what we already have.
I hadn't seen checkatlas before, thanks for sharing! Using something that already exists is always a good solution, if it can interface with spatial stuff.
I like the idea of using checkatlas, as it can generate the MultiQC output directly from the AnnData outputs of all samples. I wonder what kind of extra QC we could add from spatial though. Maybe as a first step, adding checkatlas as is would already be nice!
I wonder, if the multiqc cellranger module could directly work with spaceranger html reports as well. This multiqc module focues more on alignment statistics, so it would be valuable to have in addition to checkatlas.
I wonder, if the multiqc cellranger module could directly work with spaceranger html reports as well
It doesn't, but it should be relatively straightfoward to adapt its code to add a separate spaceranger module to multiqc. Maybe I can look into that.
My plan for improving QC for spatialtranscriptomics:
Getting a Space Ranger module for MultiQC would be nice! That coupled with some custom content from the reports would be a nice starting point to build on.
MultiQC module is ready, waiting for review: https://github.com/ewels/MultiQC/pull/1945
I have recently tried your spaceranger multiqc module. The multiqc report looks very nice. I just want to ask that is it possible to also include the "Gene and UMI Distribution" violin plot in the 10X report and parse it into multiqc_report as a normal boxplot? Thank you. @grst
Multiqc module now released: https://github.com/MultiQC/MultiQC/releases/tag/v1.21
Where are we on this issue at the moment? The MultiQC Space Ranger module works great for that, but obviously only contains the Space Ranger-specific QC metrics. Given that checkatlas does not seem to be a simple solution plus seemingly not being maintained (last update was 9 months ago) that's probably not the way to go. Adding another Quarto report to collect some of the plots could be something that might help provide an overview of the downstream analyses. Other ideas?
I've seen some activity on checkatlas in the issue tracker lately, but I agree it's not a short-term solution. I think it would be nice if any other QC metrics would be part of the multiqc report, to have a single location to check. This would be possible via a custom script + multiqc custom content.
Overall I think the spaceranger metrics are already quite helpful, more would be nice, but not sure it's high priority.
I have now added functionality to get the QC metrics from the quality control report into MultiQC as custom content (see mention above), so now the question becomes if we're happy with this for now or if we want to add more.
Adding other interesting metrics should be easy to do in a similar manner if we want to, but I was wondering about figures. Since we can't know how many samples somebody would run at the same time, I'm not sure if adding images (e.g. QC violin plots, UMAPs or spatial visualisations) would be scalable.
Description of feature
Having individual QC reports for each sample is nice, but it would be cool to have one aggregated report that gives an overview of all samples to quickly identify problematic ones.
MultiQC is an obvious choice here, but depending on how much customization is necessary, a custom notebook would also be an option.