satijalab / seurat

R toolkit for single cell genomics
http://www.satijalab.org/seurat
Other
2.26k stars 905 forks source link

Renaming features in a Seurat Object #2617

Closed saeedfc closed 4 years ago

saeedfc commented 4 years ago

Hi, https://github.com/satijalab/seurat/issues/1201#issue-417843748 In reference to the above issue. I was wondering if there is a way to rename all the genes of a seurat object with mouse data to human orthologs to intergate it with a seurat object with human data. A convenient function to get the orthologs is avaiable in nichenetr package to get a vector of ortholog gene names(will include NAs for those genes without orthologs). So indeed a subsetting and renaming is required before integrating. However, I couldn't see a function to rename all features for the whole seurat object. (Something similar to 'RenameCells' function)

Any solutions apart from integrating both human and mouse data from scratch with only available ortholog genes in the case of mice (by subsetting and renaming genes in the matrices before creating seurat objects)?

Or simply if I want to transfer the labels from ahuman data to mouse data, I will have to subset and rename the features in one of the datasets.

Thanks

timoast commented 4 years ago

We don't support renaming features, you should instead create a new assay with the renamed features you want to use in the integration and add it to the object, then perform the integration or label transfer using this assay

Sophia409 commented 4 years ago

@saeedfc Hi, I also met the same problem as you, how did you solve it? I modifed the counts, data and scale.data slots for every assay I have in my Seurat object.

saeedfc commented 4 years ago

@Sophia409 You can make a new assay Hum_gene_assay <- CreateAssayObject(counts) SeuratObj[[‘hum_gene_assay’]] <- Hum_gene_assay

MinjieHu commented 3 years ago

@saeedfc @timoast Hi, I have very similar situation. I have two integrated seurat objects from two different species. If I want to integrate them by creating new assay for the orthologs, which slot/assay should I use? The raw count in the RNA assay or the data slot in the integrated assay? Thanks!

vertesy commented 3 years ago

For renaming in place, you can modify the function that I added to #1049 (read more there).

# RenameGenesSeurat  ------------------------------------------------------------------------------------
RenameGenesSeurat <- function(obj = ls.Seurat[[i]], newnames = HGNC.updated[[i]]$Suggested.Symbol) { # Replace gene names in different slots of a Seurat object. Run this before integration. Run this before integration. It only changes obj@assays$RNA@counts, @data and @scale.data.
  print("Run this before integration. It only changes obj@assays$RNA@counts, @data and @scale.data.")
  RNA <- obj@assays$RNA

  if (nrow(RNA) == length(newnames)) {
    if (length(RNA@counts)) RNA@counts@Dimnames[[1]]            <- newnames
    if (length(RNA@data)) RNA@data@Dimnames[[1]]                <- newnames
    if (length(RNA@scale.data)) RNA@scale.data@Dimnames[[1]]    <- newnames
  } else {"Unequal gene sets: nrow(RNA) != nrow(newnames)"}
  obj@assays$RNA <- RNA
  return(obj)
}
# RenameGenesSeurat(obj = SeuratObj, newnames = HGNC.updated.genes)

*Edit: bug fixed, sry.

Pinpin-u1111 commented 3 years ago

I found the solution by directly modiying the original Seurat object (I mean with only the RNA assay and no transformation, normalisation dimension reduction nor visualization, ...) My list of new gene names is in a data.frame (mgi) of 1 variable (genes) ans the Seuratoobjecct is filt. The code is: filt@assays$RNA@counts@Dimnames[[1]] <- mgi$genes filt@assays$RNA@data@Dimnames[[1]] <- mgi$genes filt@assays$RNA@meta.features <- mgi Then you can re-run any work downstream. For me it was much less than rebuilding the object from the matrices.

Ivalyne commented 2 years ago

@Pinpin-u1111 Hi. Would you tell me how to make the data.frame (mgi) ?

Elo-mars commented 2 years ago

I found the solution by directly modiying the original Seurat object (I mean with only the RNA assay and no transformation, normalisation dimension reduction nor visualization, ...) My list of new gene names is in a data.frame (mgi) of 1 variable (genes) ans the Seuratoobjecct is filt. The code is: filt@assays$RNA@counts@Dimnames[[1]] <- mgi$genes filt@assays$RNA@data@Dimnames[[1]] <- mgi$genes filt@assays$RNA@meta.features <- mgi Then you can re-run any work downstream. For me it was much less than rebuilding the object from the matrices.

@Pinpin-u1111 Hello, thanks for that, but I don't get how it knows which "old" gene names to transform ...

thanks

yloskove commented 2 years ago

I found the solution by directly modiying the original Seurat object (I mean with only the RNA assay and no transformation, normalisation dimension reduction nor visualization, ...) My list of new gene names is in a data.frame (mgi) of 1 variable (genes) ans the Seuratoobjecct is filt. The code is: filt@assays$RNA@counts@Dimnames[[1]] <- mgi$genes filt@assays$RNA@data@Dimnames[[1]] <- mgi$genes filt@assays$RNA@meta.features <- mgi Then you can re-run any work downstream. For me it was much less than rebuilding the object from the matrices.

Hi @Pinpin-u1111 Thanks for the code. I tried using this to modify the rownames of the RNA assay of my Seurat Object, but when I tried implementing SCTransform normalization with the edited object, it wouldn't work. Have you come across this issue/ have any suggestions on how to change rownames and successfully run SCTransform?

acihanckr commented 1 year ago

I found the solution by directly modiying the original Seurat object (I mean with only the RNA assay and no transformation, normalisation dimension reduction nor visualization, ...) My list of new gene names is in a data.frame (mgi) of 1 variable (genes) ans the Seuratoobjecct is filt. The code is: filt@assays$RNA@counts@Dimnames[[1]] <- mgi$genes filt@assays$RNA@data@Dimnames[[1]] <- mgi$genes filt@assays$RNA@meta.features <- mgi Then you can re-run any work downstream. For me it was much less than rebuilding the object from the matrices.

Hi @Pinpin-u1111 Thanks for the code. I tried using this to modify the rownames of the RNA assay of my Seurat Object, but when I tried implementing SCTransform normalization with the edited object, it wouldn't work. Have you come across this issue/ have any suggestions on how to change rownames and successfully run SCTransform?

I am getting the same issue with SCTransform, were you able to find a fix for this?

kusunoky commented 1 year ago

@Pinpin-u1111 Hi. Would you tell me how to make the data.frame (mgi) ?

I am also stuck at this step. Could anyone please suggest how to get a one-to-one gene list for conversion?

saeedfc commented 1 year ago

Hi All,

I keep seeing notifications on this thread. If it helps anyone; Here is a script that you can use to convert human Seurat Object to mouse. Here I use a function from nichenetr package to do conversion. If you want other conversions, you may have to use biomartr package to obtain a 'con_df' dataframe as demonstrated below.

library(nichenetr)
library(Seurat)
egc <-  readRDS("SeuObj_EGC.rds")
# > dim(egc)
# [1] 33538   513
exp_mtx <- as.matrix(egc@assays$RNA@data)

### Make a data frame of mouse genes and human genes mapping;
### Here we use the function 'convert_human_to_mouse_symbols' from 
  # nichenetr package to easily convert human to mouse.
  # For details, you can type in ?geneinfo_2022 after loading nichenetr package 
con_df <- data.frame(hum_orig = rownames(exp_mtx),
                     mouse = convert_human_to_mouse_symbols(rownames(exp_mtx)),
                     stringsAsFactors = F)
# > dim(con_df)
# [1] 33538     2

# > head(con_df,10)
# hum_orig    mouse
# 1  MIR1302-2HG     <NA>
#   2      FAM138A     <NA>
#   3        OR4F5     <NA>
#   4   AL627309.1     <NA>
#   5   AL627309.3     <NA>
#   6   AL627309.2     <NA>
#   7   AL627309.4     <NA>
#   8   AL732372.1     <NA>
#   9       OR4F29 Olfr1303
#   10  AC114498.1     <NA>

## As you can see there are a lot of NAs for which there are no mouse genes matching
## Remove NAs

con_df <- con_df[!is.na(con_df$mouse),,F]
# > head(con_df)
# hum_orig    mouse
# 9    OR4F29 Olfr1303
# 21   SAMD11   Samd11
# 22    NOC2L    Noc2l
# 23   KLHL17   Klhl17
# 24  PLEKHN1  Plekhn1
# 25    PERM1    Perm1
# > dim(con_df)
# [1] 17172     2
## Filter the expression matrix for genes which a mouse counterpart is available
exp_mtx <- exp_mtx[con_df$hum_orig,]
## Now chnage the rownames of the matrix to the mouse gene names
rownames(exp_mtx) <- con_df$mouse

## Create the seurat object with mouse genes.
mouse_SO <- CreateSeuratObject(counts = exp_mtx, meta.data = egc@meta.data )

Of course, you can add other details to the object such as dimensionality reductions

eg:-

mouse_SO[["umap"]] <- egc[["umap"]]
woffe-chen commented 1 year ago

For renaming in place, you can modify the function that I added to #1049 (read more there).

# RenameGenesSeurat  ------------------------------------------------------------------------------------
RenameGenesSeurat <- function(obj = ls.Seurat[[i]], newnames = HGNC.updated[[i]]$Suggested.Symbol) { # Replace gene names in different slots of a Seurat object. Run this before integration. Run this before integration. It only changes obj@assays$RNA@counts, @data and @scale.data.
  print("Run this before integration. It only changes obj@assays$RNA@counts, @data and @scale.data.")
  RNA <- obj@assays$RNA

  if (nrow(RNA) == length(newnames)) {
    if (length(RNA@counts)) RNA@counts@Dimnames[[1]]            <- newnames
    if (length(RNA@data)) RNA@data@Dimnames[[1]]                <- newnames
    if (length(RNA@scale.data)) RNA@scale.data@Dimnames[[1]]    <- newnames
  } else {"Unequal gene sets: nrow(RNA) != nrow(newnames)"}
  obj@assays$RNA <- RNA
  return(obj)
}
# RenameGenesSeurat(obj = SeuratObj, newnames = HGNC.updated.genes)

*Edit: bug fixed, sry.

I use this and find there is a error as in my seuratobj,seurat@assays$RNA@scale.data is a matrix and nolonger a slot so there is no dimnames in this. I changed a little: `RenameGenesSeurat <- function(obj = ls.Seurat[[i]], newnames = HGNC.updated[[i]]$Suggested.Symbol) { # Replace gene names in different slots of a Seurat object. Run this before integration. Run this before integration. It only changes obj@assays$RNA@counts, @data and @scale.data. print("Run this before integration. It only changes obj@assays$RNA@counts, @data and @scale.data.") RNA <- obj@assays$RNA

if (nrow(RNA) == length(newnames)) { if (length(RNA@counts)) RNA@counts@Dimnames[[1]] <- newnames if (length(RNA@data)) RNA@data@Dimnames[[1]] <- newnames if (length(RNA@scale.data)) rownames( RNA@scale.data) <- newnames } else {"Unequal gene sets: nrow(RNA) != nrow(newnames)"} obj@assays$RNA <- RNA return(obj) }`