igordot / msigdbr

MSigDB gene sets for multiple organisms in a tidy data format
https://igordot.github.io/msigdbr
Other
70 stars 14 forks source link

Problem with loading several categories #5

Closed laurie-tonon closed 5 years ago

laurie-tonon commented 5 years ago

In our work we often want to test our gene lists against several categories of gene sets at once. Until now we would load the gene sets like this:

msigdb.genes.sets <-msigdbr(species="Homo sapiens", category=c("H","C2"))

We noticed that in doing so, the gene sets are truncated, with a remaining number of genes in a gene set varying with the number of categories or their order. After looking at the R code it seems the problem is that the categories are filtered with an "==" and not a "%in%, which means we cannot use an array in our command. But no warning or error is thrown and everything downstream works, with background ratio values wrong obviously.

Would it be possible to correct this or to forbid requesting more than one category in the command?

igordot commented 5 years ago

Thank you for pointing this out. I did not expect multiple categories to be requested, but that scenario should definitely be handled better.

igordot commented 5 years ago

I disabled the ability to request more than one category/subcategory in the latest release, which is now on CRAN.

You can still use dplyr::filter() or dplyr::bind_rows() to combine any subsets.