Open jokergoo opened 1 year ago
Thank you for the suggestion. I don't usually work with Entrez IDs. I didn't notice that Bioconductor annotation packages treat them as characters. I assume it's to be consistent with all other ID types. Do you know if they are treated as integers in other packages?
Since Entrez IDs have always been integers in msigdbr, I hope the change doesn't wouldn't any unintended consequences for older users.
Yeah, they are digits and R by default reads them as numbers.
I just have checked:
These standard bioc packages store Entrez IDs as characters:
These standard bioc packages store them as integers:
I guess there won't be any conflict to other code. People won't use it for math calculations, e.g. id1 + id2 or mean(entrez_id) :)
Thank you for looking into it. It's reassuring to know that there is not a clear standard, so neither option is "wrong". However, org.*.db and TxDb.* packages are probably more authoritative.
First thanks for this great package! Especially it directly outputs three different gene ID types, which saves a lot of time when switching between different gene ID types.
I have a small suggestion. Here in the output table, columns related to "entrez_gene" are stored as integers. I would suggest to change to characters, as what other Bioconducror annotation package does (e.g. org.Hs.eg.db).
Imagine we want to convert Entrez IDs to Refseq IDs, and we have a mapping vector (
map
) where Entrez IDs are the names and Refseq IDs are the values. Then naturally, to convert, we can do:This causes the problem because
gene_sets$entrez_gene
are integers and it is actually treated as numeric indices for themap
vector, while not to match to the names inmap
.To do it correctly, we need to explicitly convert
gene_sets$entrez_gene
to characters:The more severe consequence is, if the maximal numeric value in
gene_sets$entrez_gene
is smaller than the length ofmap
, executingmap[gene_sets$entrez_gene]
actually will not generate any warning or error message. And it would generate wrong results silently.