Support gene symbols for rds files downloaded from Cellenics and uploaded using the Seurat object pipeline

Background

rds objects downloaded from Cellenics contain EnsemblIDs as rownames (as default and most prevalent case), with the gene symbols stored in the @misc$gene_annotations slot.

This implies that if a user downloads a Seurat object from Cellenics, and uploads it again, the gene symbols will not be present. The only workaround is for the user to set the gene symbols as the row names of the matrix before uploading, but that could prove complicated due to Seurat limitations.

Since the Seurat pipeline essentially re-creates a Seurat object, if the @misc$gene_annotations slot were present, it could be used to set the EnsemblIDs as row names and have the symbols available to use by the rest of the platform as it currently happens.

Another approach is to have the rds download worker task replace the ensemblIDs with the gene symbols before downloading, but that could prove complicated due to Seurat limitations.

I think the latter approach (set row.names as gene symbols for downloaded Seurat object) would be preferred for the following reasons:

It improves the experience for users that download the Seurat object:

No one outside of us will expect to find feature metadata in @misc$gene_annotations
Most programmers that interact with a Seurat objects expect row.names to be gene symbols (this is the default for Seurat)

It keeps Seurat object upload quite a bit simpler::

it will "just work"
no need to check for presence, safety, and correctness of @misc$gene_annotations
we would also need to create the final Seurat object with ensembl gene ids instead of gene symbols for this specific case only

The unfortunate aspect of the above approach is that it is not straightforward to rename row.names in a Seurat object. There is a function Seurat.utils::RenameGenesSeurat that would be worth trying out. Could also consider re-creating the Seurat object prior to download and transferring things over (same as Seurat object upload).

hms-dbmi-cellenics / issues

Support gene symbols for rds files downloaded from Cellenics and uploaded using the Seurat object pipeline #41

Background

Goal