lianos / multiGSEA

NOTE: This package has been renamed to sparrow and will be submitted to Bioconductor 3.14. Please use that package instead. This is kept here for posterity.
https://lianos.github.io/multiGSEA/
MIT License
21 stars 2 forks source link

Support gene identifier conversion in getReactomeGeneSetDb #16

Open tomsing1 opened 5 years ago

tomsing1 commented 5 years ago

https://github.com/lianos/multiGSEA/blob/19006d1053db8de807b80faa2a967a49d0b2ab38/R/get-reactome.R#L15

It looks like the id.col is not used, e.g. specifying ensembl as the desired featureType doesn't have an effect?

lianos commented 5 years ago

Indeed!

If the reactome.db has ensembl identifiers in there, this should be a straightforward fix ...

In the meantime, if you want to put some elbow grease into this, something like the below should work:

library(multiGSEA)
library(dplyr)
gdb.entrez <- getReactomeGeneSetDb(...)
gdb.ens <-  gdb.entrez %>%
  as.data.frame() %>%
  mutate(ensembl = entrez2ensembl(featureId)) %>%
  filter(!is.na(ensembl)) %>%
  distinct(name, featureId, .keep_all = TRUE) %>%
  transmute(collection, name, featureId = ensembl) %>%
  GeneSetDb()

Where you provide your favorite implementation for something like theentrez2ensembl function which provides a 1:1 mapping from entrez id vector to an ensembl id's.

... "simple" :-)

tomsing1 commented 5 years ago

Thanks a lot for the workaround!