satijalab / azimuth

A Shiny web app for mapping datasets using Seurat v4
https://satijalab.org/azimuth
GNU General Public License v3.0
106 stars 30 forks source link

ConvertEnsembleToSymbol #204

Open GischD opened 7 months ago

GischD commented 7 months ago

Dear Azimuth team,

I am facing issues with the function

Azimuth:::ConvertEnsembleToSymbol(mat = mat, species = "human") on Seurat integration using BP cell. I got this error:

Error: Your query has been redirected to http://status.ensembl.org indicating this Ensembl service is unavailable.
Look at ?useEnsembl for details on how to try a mirror site.

Can you add an option to change the Ensembl mirror? Right now the "asia" is working.

Thank you, Debora

GischD commented 7 months ago

For now, I did this

ConvertEnsembleToSymbol2 <- function(
        mat,
        mirror,
        species = c('human', 'mouse')
 ) {
species <- match.arg(arg = species)
if (species == 'human') {
  database <- 'hsapiens_gene_ensembl'
  symbol <- 'hgnc_symbol'

} else if (species == 'mouse') {
  database <- 'mmusculus_gene_ensembl'
  symbol <- 'mgi_symbol'

} else {
  stop('species name not found')
}

library("biomaRt")
library("dplyr")

name_df <- data.frame(gene_id = c(rownames(mat)))
name_df$orig.id <- name_df$gene_id
#make this a character, otherwise it will throw errors with left_join
name_df$gene_id <- as.character(name_df$gene_id)
# in case it's gencode, this mostly works
#if ensembl, will leave it alone
name_df$gene_id <- sub("[.][0-9]*","",name_df$gene_id)
ensembl <- useEnsembl(biomart = "ensembl", 
                      dataset = database, 
                      mirror = mirror)
mart <- useDataset(dataset = database, ensembl,)
genes <-  name_df$gene_id
gene_IDs <- getBM(filters= "ensembl_gene_id",
                  attributes= c("ensembl_gene_id", symbol),
                  values = genes,
                  mart= mart)
gene.df <- left_join(name_df, gene_IDs, by = c("gene_id"="ensembl_gene_id"))
rownames(gene.df) <- make.unique(gene.df$orig.id)
gene.df <- gene.df[rownames(mat),]
gene.df <-gene.df[gene.df[,symbol] != '',]
gene.df <- gene.df[ !is.na(gene.df$orig.id),]
mat.filter <- mat[gene.df$orig.id,]
rownames(mat.filter) <- make.unique(gene.df[,symbol])
return(mat.filter)
}
annaborchers commented 5 months ago

Hi @GischD could you please explain how to use this function as in what are the inputs? I'm new to this. Thanks!