nf-core / scrnaseq

A single-cell RNAseq pipeline for 10X genomics data
https://nf-co.re/scrnaseq
MIT License
178 stars 154 forks source link

Clean up mtx conversion code #310

Open grst opened 3 months ago

grst commented 3 months ago

Description of feature

The mtx conversion has grown organically over time and seems error-prone and overly complex.

Also with the new emptydrops #301 cell calling, matrices need to be transposed and back which is very annoying.

I suggest to streamline this by creating one "standardized" single-sample output per aligner (either mtx or h5 format), that is then read in by the downstream processes, such as empty drops, and conversion processes. Since this pipeline is going multimodal, we'll also want to support MuData output in addition to AnnData.

We can also explore if it's easier to obtain Seurat and SingleCellExperiment objects by converting from AnnData using e.g. anndataR instead of building them from scratch in different languages.

Related issues: