hms-dbmi-cellenics / issues

This repository is used to report and track issues
1 stars 0 forks source link

Support gene symbols for rds files downloaded from Cellenics and uploaded using the Seurat object pipeline #41

Open gerbeldo opened 1 year ago

gerbeldo commented 1 year ago

Background

rds objects downloaded from Cellenics contain EnsemblIDs as rownames (as default and most prevalent case), with the gene symbols stored in the @misc$gene_annotations slot.

This implies that if a user downloads a Seurat object from Cellenics, and uploads it again, the gene symbols will not be present. The only workaround is for the user to set the gene symbols as the row names of the matrix before uploading, but that could prove complicated due to Seurat limitations.

Since the Seurat pipeline essentially re-creates a Seurat object, if the @misc$gene_annotations slot were present, it could be used to set the EnsemblIDs as row names and have the symbols available to use by the rest of the platform as it currently happens.

Another approach is to have the rds download worker task replace the ensemblIDs with the gene symbols before downloading, but that could prove complicated due to Seurat limitations.

Goal

alexvpickering commented 1 year ago

I think the latter approach (set row.names as gene symbols for downloaded Seurat object) would be preferred for the following reasons:

It improves the experience for users that download the Seurat object:

It keeps Seurat object upload quite a bit simpler::

The unfortunate aspect of the above approach is that it is not straightforward to rename row.names in a Seurat object. There is a function Seurat.utils::RenameGenesSeurat that would be worth trying out. Could also consider re-creating the Seurat object prior to download and transferring things over (same as Seurat object upload).