hbctraining / DGE_workshop_salmon_online

https://hbctraining.github.io/DGE_workshop_salmon_online/
163 stars 75 forks source link

Orgdb note didn't work in Gene annotation #53

Open Gammerdinger opened 1 month ago

Gammerdinger commented 1 month ago

This is a draft of how to fix it:

check available updated database

query(ah,'org.Hs.eg.db.sqlite') human_orgdb <- query(ah, c("Homo sapiens", "OrgDb")) test <- human_orgdb[["AH111575"]] test

Gammerdinger commented 1 month ago

query(ah,'org.Hs.eg.db.sqlite') is key. different versions of bioc impact this

hwick commented 1 month ago

Unfortunately while this works, the following line from the original code, which uses select() will not work on it without specifying the package select() is from:

human_orgdb <- human_orgdb[["AH111575"]]
annotations_orgdb <- select(human_orgdb, res_tableOE_tb$gene, c("SYMBOL", "GENENAME", "ENTREZID"), "ENSEMBL")
Error in UseMethod("select") : 
  no applicable method for 'select' applied to an object of class "c('OrgDb', 'AnnotationDb', 'envRefClass', '.environment', 'refClass', 'environment', 'refObject', 'AssayData')"

This, however, will work (with a warning):

human_orgdb <- human_orgdb[["AH111575"]]
annotations_orgdb <- AnnotationDbi::select(human_orgdb, res_tableOE_tb$gene, c("SYMBOL", "GENENAME", "ENTREZID"), "ENSEMBL")
'select()' returned 1:many mapping between keys and columns

Which also implies that the note itself is wrong, since the note mentions these 1:many mappings are automatically removed.

And in fact there are duplicate ENTREZ IDs

> annotations_orgdb<-annotations_orgdb[!is.na(annotations_orgdb$ENTREZID),]
> dim(annotations_orgdb)
[1] 36282     4
> sum(is.na(annotations_orgdb$ENTREZID))
[1] 0
> sum(duplicated(annotations_orgdb$ENTREZID))
[1] 272

Unless there is a different select() that produces a working result