Closed peranti closed 3 years ago
Hi Pradeep,
Thank you for your question! Let me look into that to see if that piece of information is provided by the GWAS REST API.
Hi again Pradeep,
It seems that that information about the genes is only provided at the GWAS Catalog website, not via REST API at the moment.
I looked a bit though how the front-end website retrieves that information, and I found the URL being requested. So I wrote a bit of a hack to get that info into R: the get_gene()
function. Then get_genes()
is just a vectorised version of get_gene()
.
Finally I exemplify how you could use these functions to augment the genes
table in the associations object. I hope this solution is helpful for the time being.
library(dplyr, warn.conflicts = FALSE)
library(glue, warn.conflicts = FALSE)
library(httr, warn.conflicts = FALSE)
library(tidyjson, warn.conflicts = FALSE)
library(purrr)
library(gwasrapidd, warn.conflicts = FALSE)
# For retrieving info about one gene only.
get_gene <- function(gene_name) {
gene_name <- URLencode(gene_name)
url <- glue::glue("https://www.ebi.ac.uk/gwas/api/search?q=title%3A+%22{gene_name}%22+AND+resourcename%3Agene")
response <- httr::GET(url)
json <- httr::content(response, 'text', encoding = 'UTF-8')
tbl <-
json %>%
tidyjson::enter_object('response', 'docs') %>%
tidyjson::gather_array() %>%
tidyjson::spread_values(
gene_name = tidyjson::jstring('title'),
description = tidyjson::jstring('description'),
chromosome_name = tidyjson::jstring('chromosomeName'),
chromosome_start = tidyjson::jnumber('chromosomeStart'),
chromosome_end = tidyjson::jnumber('chromosomeEnd'),
cytogenetic_band = tidyjson::jstring('cytobands'),
biotype = tidyjson::jstring('biotype')
) %>%
tidyjson::as_tibble() %>%
dplyr::select(-c('document.id', 'array.index'))
return(tbl)
}
# Maps over genes
get_genes <- function(gene_names) {
purrr::map_dfr(gene_names, get_gene)
}
# As an example, let us get some studies about rheumatoid arthritis.
my_studies <- get_studies(efo_trait = 'rheumatoid arthritis')
# Going after those associations related to the first study found only (for the
# sake of simplicity)
one_study <- my_studies[1]
my_associations <- get_associations(study_id = one_study@studies$study_id)
gene_info <- get_genes(my_associations@genes$gene_name)
my_associations@genes <- dplyr::left_join(my_associations@genes, gene_info, by = 'gene_name')
# The slot `genes` of the `associations` object is now augmented to include
# those extra details about each gene.
my_associations@genes
#> # A tibble: 5 × 9
#> association_id locus_id gene_name description chromosome_name chromosome_start
#> <chr> <int> <chr> <chr> <chr> <dbl>
#> 1 17729 1 BLK BLK proto-… 8 11486894
#> 2 17731 1 ARHGEF3 Rho guanin… 3 56727418
#> 3 17883 1 HLA-DRB1 major hist… 6 32578769
#> 4 17730 1 TRHDE thyrotropi… 12 72087266
#> 5 40454 1 PADI4 peptidyl a… 1 17308195
#> # … with 3 more variables: chromosome_end <dbl>, cytogenetic_band <chr>,
#> # biotype <chr>
Thanks a lot, @ramiromagno, for this quick hack.
I have tested these functions briefly and made the following observation. If the input contains HLA to the get_genes()
function, it fetches the information for even HLA-E, HLA-F etc. Similarly, if the input contains C10orf95, then it will fetch the C10orf95-AS1.
Indeed! It seems that the gene being queried is used as a keyword that will match any gene name containing that keyword. But do not worry, when you do the dplyr::left_join()
by gene_name
, it has to match exactly. So the final table obtained in my_associations@genes
should be as expected.
Hi Ramiro, Thanks for this amazing package.
Would it be possible to add Biotype in the association object?
For example, GATA3 is of protein_coding type. Source: https://www.ebi.ac.uk/gwas/genes/GATA3
Thanks.