nf-core / spatialvi

Pipeline for processing spatially-resolved gene counts with spatial coordinates and image data. Designed for 10x Genomics Visium transcriptomics.
https://nf-co.re/spatialvi
MIT License
49 stars 18 forks source link

Add MultiQC report #40

Open grst opened 1 year ago

grst commented 1 year ago

Description of feature

Having individual QC reports for each sample is nice, but it would be cool to have one aggregated report that gives an overview of all samples to quickly identify problematic ones.

MultiQC is an obvious choice here, but depending on how much customization is necessary, a custom notebook would also be an option.

fasterius commented 1 year ago

While MultiQC is indeed a natural choice for aggregation of QC metrics and statistics, I wonder if it's appropriate here. MultiQC usually takes the pre-defined output of specific tools; the output of the reports and their content is not from a single tool and might change in the future. I'm thinking that an additional Quarto report might be more appropriate here, but this can certainly be discussed. Thoughts, @cavenel ?

grst commented 1 year ago

I'm open to both solutions. It will definitely be quicker to get something up and running with another quarto report. It might also need quite some custom code for parsing summary statistics, and at some point better to have a separate package.

For additional context, for single-cell data, there's the checkatlas package which generates a MultiQC report, and we're considering to add it to the #scrnaseq pipeline (https://github.com/nf-core/scrnaseq/issues/80). Here's an example report: https://checkatlas.readthedocs.io/en/stable/CheckAtlas_example_2/CheckAtlas_example_2.html

So one option could be to make a similar package, or add spatial support to checkatlas. CC @drbecavin

fasterius commented 1 year ago

I do like the general idea of using MultiQC, if only because it's so common for nf-core pipelines, but it's probably a lot more effort. While getting another Quarto report might be a good solution for now, it can certainly be discussed whether to spend the effort to create some interface for MultiQC in the future. As it is, the current Quarto reports could also get some additional work and prettifying in addition to what we already have.

I hadn't seen checkatlas before, thanks for sharing! Using something that already exists is always a good solution, if it can interface with spatial stuff.

cavenel commented 1 year ago

I like the idea of using checkatlas, as it can generate the MultiQC output directly from the AnnData outputs of all samples. I wonder what kind of extra QC we could add from spatial though. Maybe as a first step, adding checkatlas as is would already be nice!

grst commented 1 year ago

I wonder, if the multiqc cellranger module could directly work with spaceranger html reports as well. This multiqc module focues more on alignment statistics, so it would be valuable to have in addition to checkatlas.

grst commented 1 year ago

I wonder, if the multiqc cellranger module could directly work with spaceranger html reports as well

It doesn't, but it should be relatively straightfoward to adapt its code to add a separate spaceranger module to multiqc. Maybe I can look into that.

My plan for improving QC for spatialtranscriptomics:

  1. Add MultiQC and FastQC to setup a basic QC workflow
  2. implement spaceranger module in multiqc
  3. look into custom content for multiqc reports or checkatlas
fasterius commented 1 year ago

Getting a Space Ranger module for MultiQC would be nice! That coupled with some custom content from the reports would be a nice starting point to build on.

grst commented 1 year ago

MultiQC module is ready, waiting for review: https://github.com/ewels/MultiQC/pull/1945

ducminhnguyenle commented 11 months ago

I have recently tried your spaceranger multiqc module. The multiqc report looks very nice. I just want to ask that is it possible to also include the "Gene and UMI Distribution" violin plot in the 10X report and parse it into multiqc_report as a normal boxplot? Thank you. @grst

grst commented 7 months ago

Multiqc module now released: https://github.com/MultiQC/MultiQC/releases/tag/v1.21

fasterius commented 4 months ago

Where are we on this issue at the moment? The MultiQC Space Ranger module works great for that, but obviously only contains the Space Ranger-specific QC metrics. Given that checkatlas does not seem to be a simple solution plus seemingly not being maintained (last update was 9 months ago) that's probably not the way to go. Adding another Quarto report to collect some of the plots could be something that might help provide an overview of the downstream analyses. Other ideas?

grst commented 4 months ago

I've seen some activity on checkatlas in the issue tracker lately, but I agree it's not a short-term solution. I think it would be nice if any other QC metrics would be part of the multiqc report, to have a single location to check. This would be possible via a custom script + multiqc custom content.

Overall I think the spaceranger metrics are already quite helpful, more would be nice, but not sure it's high priority.

fasterius commented 3 weeks ago

I have now added functionality to get the QC metrics from the quality control report into MultiQC as custom content (see mention above), so now the question becomes if we're happy with this for now or if we want to add more.

Adding other interesting metrics should be easy to do in a similar manner if we want to, but I was wondering about figures. Since we can't know how many samples somebody would run at the same time, I'm not sure if adding images (e.g. QC violin plots, UMAPs or spatial visualisations) would be scalable.