Closed ixxmu closed 9 months ago
在日常分析中,我们经常需要对同一物种基因ID和symbol进行互转,或者查找不同物种之间的同源基因。通常我们会借助R包,例如biomaRt,easyConvert和org.Hs.eg.db等,或者借助在线工具,例如DAVIA,或是进入Ensemble、ncbi等数据库查询。但是这些方法都存在各种各样的问题,如biomaRt对网络依赖很大,org.Hs.eg.db会丢失很多基因等,很多在线工具也是如此。这里记录两种使用轻便的R包:
orthologs
。install.packages("babelgene")
packageVersion("babelgene")#[1] ‘22.9’
library(babelgene)
orthologs(genes = c("TP53", "EGFR", "IL6", "TGFB1", "CD4"), species = "mouse")
# human_symbol human_entrez human_ensembl taxon_id symbol entrez ensembl
# 1 CD4 920 ENSG00000010610 10090 Cd4 12504 ENSMUSG00000023274
# 2 EGFR 1956 ENSG00000146648 10090 Egfr 13649 ENSMUSG00000020122
# 3 IL6 3569 ENSG00000136244 10090 Il6 16193 ENSMUSG00000025746
# 4 TGFB1 7040 ENSG00000105329 10090 Tgfb1 21803 ENSMUSG00000002603
# 5 TP53 7157 ENSG00000141510 10090 Trp53 22059 ENSMUSG00000059552
# support support_n
# 1 EggNOG|Ensembl|HGNC|HomoloGene|Inparanoid|NCBI|OMA|OrthoDB|OrthoMCL|Panther|PhylomeDB|Treefam 12
# 2 EggNOG|Ensembl|HGNC|HomoloGene|Inparanoid|NCBI|OMA|OrthoDB|OrthoMCL|Panther|PhylomeDB|Treefam 12
# 3 Ensembl|HGNC|HomoloGene|Inparanoid|NCBI|OMA|OrthoMCL|Panther|PhylomeDB|Treefam 10
# 4 EggNOG|Ensembl|HGNC|HomoloGene|Inparanoid|NCBI|OMA|OrthoDB|OrthoMCL|Panther|PhylomeDB|Treefam 12
# 5 EggNOG|Ensembl|HGNC|HomoloGene|Inparanoid|NCBI|OMA|OrthoDB|OrthoMCL|Panther|PhylomeDB|Treefam 12
orthologs(genes = "actc1b", species = "fruit fly", human = FALSE)
# human_symbol human_entrez human_ensembl taxon_id symbol entrez ensembl
#1 GCH1 2643 ENSG00000131979 7227 Pu 37415 FBgn0003162
# support support_n
#1 EggNOG|Ensembl|HomoloGene|Inparanoid|OMA|OrthoDB|OrthoMCL|Panther|PhylomeDB|Treefam 10
orthologs(genes = "gapdh", species = "fruit fly", human = FALSE)
#Error in orthologs(genes = "GAPDH", species = "fruit fly", human = FALSE) :
# no orthologs found or the genes are not valid
orthologs(genes = "Gapdh", species = "mouse", human = FALSE)
#human_symbol human_entrez human_ensembl taxon_id symbol entrez ensembl
#1 GAPDH 2597 ENSG00000111640 10090 Gapdh 14433 ENSMUSG00000057666
# support support_n
#1 Ensembl|HGNC|HomoloGene|NCBI|OMA|OrthoDB|OrthoMCL|Panther|Treefam 9
orthologs(genes = "ENSG00000111640", species = "mouse", human = TRUE)
#human_symbol human_entrez human_ensembl taxon_id symbol entrez ensembl
#1 GAPDH 2597 ENSG00000111640 10090 Gapdh 14433 ENSMUSG00000057666
# support support_n
#1 Ensembl|HGNC|HomoloGene|NCBI|OMA|OrthoDB|OrthoMCL|Panther|Treefam 9
species()
# taxon_id scientific_name common_name
# 1 28377 Anolis carolinensis Carolina anole, green anole
# 2 9913 Bos taurus bovine, cattle, cow, dairy cow, domestic cattle, domestic cow, ox, oxen
# 3 6239 Caenorhabditis elegans <NA>
# 4 9615 Canis lupus familiaris dog, dogs
# 5 7955 Danio rerio leopard danio, zebra danio, zebra fish, zebrafish
# 6 7227 Drosophila melanogaster fruit fly
# 7 9796 Equus caballus domestic horse, equine, horse
# 8 9685 Felis catus cat, cats, domestic cat
# 9 9031 Gallus gallus bantam, chicken, chickens, Gallus domesticus
# 10 9544 Macaca mulatta rhesus macaque, rhesus macaques, Rhesus monkey, rhesus monkeys
# 11 13616 Monodelphis domestica gray short-tailed opossum
# 12 10090 Mus musculus house mouse, mouse
# 13 9258 Ornithorhynchus anatinus duck-billed platypus, duckbill platypus, platypus
# 14 9598 Pan troglodytes chimpanzee
# 15 10116 Rattus norvegicus brown rat, Norway rat, rat, rats
# 16 4932 Saccharomyces cerevisiae baker's yeast, brewer's yeast, S. cerevisiae
# 17 284812 Schizosaccharomyces pombe 972h- <NA>
# 18 9823 Sus scrofa pig, pigs, swine, wild boar
# 19 8364 Xenopus tropicalis tropical clawed frog, western clawed frog
如果是想进行其他物种的同源基因转换(不仅是转换为人源基因),可以使用homologene包进行批量转换:
install.packages('homologene')
packageVersion("homologene")#[1] ‘1.4.68.19.3.27’
library(homologene)
#可支持的物种如下:
homologene::taxData
# tax_id name_txt
#1 10090 Mus musculus
#2 10116 Rattus norvegicus
#3 28985 Kluyveromyces lactis
#4 318829 Magnaporthe oryzae
#5 33169 Eremothecium gossypii
#6 3702 Arabidopsis thaliana
#7 4530 Oryza sativa
#8 4896 Schizosaccharomyces pombe
#9 4932 Saccharomyces cerevisiae
#10 5141 Neurospora crassa
#11 6239 Caenorhabditis elegans
#12 7165 Anopheles gambiae
#13 7227 Drosophila melanogaster
#14 7955 Danio rerio
#15 8364 Xenopus (Silurana) tropicalis
#16 9031 Gallus gallus
#17 9544 Macaca mulatta
#18 9598 Pan troglodytes
#19 9606 Homo sapiens
#20 9615 Canis lupus familiaris
#21 9913 Bos taurus
使用homologene函数进行转换:第一个参数是要转换的基因,inTax
是输入的基因列表所属的物种号,outTax
是要转换成的物种号,db
用于指定转换数据库,如果需要使用新版本的数据库,这个参数很重要。
homologene(genes, inTax, outTax, db = homologene::homologeneData)
以小鼠的基因为例。查找上表,10090是小鼠,9606是人:
genelist <- c("Acadm","Eno2","Acadvl")
homologene(genelist, inTax = 10090, outTax = 9606)
## 10090 9606 10090_ID 9606_ID
##1 Acadm ACADM 11364 34
##2 Eno2 ENO2 13807 2026
##3 Acadvl ACADVL 11370 37
对于像小鼠和人这种常见的物种,也可以使用函数mouse2human()
和human2mouse()
进行互相转换:
mouse2human(genelist)
# mouseGene humanGene mouseID humanID
# 1 Acadm ACADM 11364 34
# 2 Eno2 ENO2 13807 2026
# 3 Acadvl ACADVL 11370 37
human2mouse(c("ACADM","ENO2","ACADVL"))
# humanGene mouseGene humanID mouseID
# 1 ACADM Acadm 34 11364
# 2 ENO2 Eno2 2026 13807
# 3 ACADVL Acadvl 37 11370
使用更新的版本可以帮助匹配由于注释过时而无法匹配的基因。
mouse2human(c('Mesd','Trp53rka','Cstdc4','Ifit3b'))
## [1] mouseGene humanGene mouseID humanID
## <0 rows> (or 0-length row.names)
db
mouse2human(c('Mesd','Trp53rka','Cstdc4','Ifit3b'),db = homologeneData2)
## mouseGene humanGene mouseID humanID
## 1 Mesd MESD 67943 23184
## 2 Trp53rka TP53RK 381406 112858
## 3 Cstdc4 CSTA 433016 1475
## 4 Ifit3b IFIT3 667370 3437
updateHomologene()
获取最新信息。options(timeout = 10000)
homologeneDataVeryNew = updateHomologene() # update the homologene database with the latest identifiers
mouse2human(c('Mesd','Trp53rka','Cstdc4','Ifit3b'),db = homologeneDataVeryNew)
mouseGene humanGene mouseID humanID
##1 Mesd MESD 67943 23184
##2 Trp53rka TP53RK 381406 112858
##3 Cstdc4 CSTA 433016 1475
##4 Ifit3b IFIT3 667370 3437
此外,该homologene包也提供了相应函数用于更新过时的基因Symbol和identifier、访问DIOPT数据库来查找基因同源物/直系同源物等功能,详细介绍见github主页。
https://mp.weixin.qq.com/s/gjMjnfoWj9rQjiFWqEnz1w