grimbough / biomaRt

R package providing query functionality to BioMart instances like Ensembl
https://bioconductor.org/packages/biomaRt/
34 stars 13 forks source link

the issue is about converting human gene list to pig #51

Closed labixiaoxin1234 closed 2 years ago

labixiaoxin1234 commented 2 years ago

hi, biomaRt team,

I am doing a project about finding the cell cycle gene of the pig genome, and using the biomaRt package to converting human gene list to pig. Howerver, there is an error that I do not know how to do.

my code is as folloing:

require(Seurat) data(cc.genes) ConvertHumanGeneListTopig <- function(x){

Load human ensembl attributes

human = biomaRt::useMart("ensembl", dataset = "hsapiens_gene_ensembl")

Load pig ensembl attributes

pig = biomaRt::useMart("ensembl", dataset = "sscrofa_gene_ensembl")

Link both datasets and retrieve pig genes from the human genes

genes.list = biomaRt::getLDS(attributes = c("hgnc_symbol","chromosome_name","start_position"), filters = "hgnc_symbol", values = x , mart = human, attributesL = c("chromosome_name","start_position"), martL = pig, uniqueRows = T)

Get unique names of genes (in case gene names are duplicated)

pig.gene.list <- unique(genes.list[, 2])

Print the first 6 genes found to the screen

print(head(pig.gene.list))

return(pig.gene.list) } pig.cc.genes <- lapply(X = cc.genes, ConvertHumanGeneListTopig)

Save list in .rds format

saveRDS(object = pig.cc.genes, file = "pig_cell_cycle_genes.rds")

the result is that:

pig_gene <- readRDS("pig_cell_cycle_genes/pig_cell_cycle_genes.rds") $s.genes [1] "1" "2" "6" "11" "17" "15" "18" "8" "16" "3" "X" "10" "12" "7" "22" "19" "20" "21"

$g2m.genes [1] 12 5 4 7 15 1 13 18 22 10 9 20 6 17 11 2 8 3 14 16

these result is not gene name, so how to adjust my code to convert human gene list to pig.

thank you! Looking forward to your reply.

grimbough commented 2 years ago

I think the issue here is with the line pig.gene.list <- unique(genes.list[, 2]). This command is asking for the second column in the output of of getLDS().

However you're asking for the following 5 attributes: "hgnc_symbol","chromosome_name","start_position" from the human dataset and "chromosome_name","start_position" from the pig dataset.

By selecting the second column, you're only returning the human chromosomes.

Perhaps something like the following would work for your purposes:

library(biomaRt)
library(Seurat)
data(cc.genes)

## this can be really slow, lets do it once outside the function
human = biomaRt::useEnsembl("ensembl", dataset = "hsapiens_gene_ensembl")
pig = biomaRt::useEnsembl("ensembl", dataset = "sscrofa_gene_ensembl")

ConvertHumanGeneListToPig <- function(x, humanMart, pigMart){

    genes.list = biomaRt::getLDS(attributes = c("hgnc_symbol","ensembl_gene_id", "chromosome_name","start_position"), 
                             filters = "hgnc_symbol", 
                             values = x , 
                             mart = humanMart, 
                             attributesL = c("ensembl_gene_id", "chromosome_name","start_position"), 
                             martL = pigMart, 
                             uniqueRows = T)

    return(genes.list)
}

conversionTable <- lapply(cc.genes, FUN = ConvertHumanGeneListToPig, humanMart = human, pigMart = pig)

## lets look at the first 5 results
head(conversionTable$s.genes)
#>   HGNC.symbol  Gene.stable.ID Chromosome.scaffold.name Gene.start..bp.
#> 1    CASP8AP2 ENSG00000118412                        6        89829894
#> 2       CDCA7 ENSG00000144354                        2       173354820
#> 3        RPA2 ENSG00000117748                        1        27891524
#> 4       CLSPN ENSG00000092853                        1        35720218
#> 5       POLA1 ENSG00000101868                        X        24693873
#> 6        NASP ENSG00000132780                        1        45583846
#>     Gene.stable.ID.1 Chromosome.scaffold.name.1 Gene.start..bp..1
#> 1 ENSSSCG00000004331                          1          57834116
#> 2 ENSSSCG00000015961                         15          79432257
#> 3 ENSSSCG00000003583                          6          85031578
#> 4 ENSSSCG00000003628                          6          91906318
#> 5 ENSSSCG00000022240                          X          20756201
#> 6 ENSSSCG00000029573                          6         165770055