Open gerbeldo opened 1 year ago
I think the latter approach (set row.names
as gene symbols for downloaded Seurat object) would be preferred for the following reasons:
It improves the experience for users that download the Seurat object:
@misc$gene_annotations
row.names
to be gene symbols (this is the default for Seurat)It keeps Seurat object upload quite a bit simpler::
@misc$gene_annotations
The unfortunate aspect of the above approach is that it is not straightforward to rename row.names
in a Seurat object. There is a function Seurat.utils::RenameGenesSeurat
that would be worth trying out. Could also consider re-creating the Seurat object prior to download and transferring things over (same as Seurat object upload).
Background
rds objects downloaded from Cellenics contain EnsemblIDs as rownames (as default and most prevalent case), with the gene symbols stored in the
@misc$gene_annotations
slot.This implies that if a user downloads a Seurat object from Cellenics, and uploads it again, the gene symbols will not be present. The only workaround is for the user to set the gene symbols as the row names of the matrix before uploading, but that could prove complicated due to Seurat limitations.
Since the Seurat pipeline essentially re-creates a Seurat object, if the
@misc$gene_annotations
slot were present, it could be used to set the EnsemblIDs as row names and have the symbols available to use by the rest of the platform as it currently happens.Another approach is to have the rds download worker task replace the ensemblIDs with the gene symbols before downloading, but that could prove complicated due to Seurat limitations.
Goal