Closed jemunro closed 11 months ago
Hello Jacob,
Thank you for reaching out, and I appreciate your kind words about the package.
Your suggestion to collaborate on the {hgnc}
package sounds like a great idea!
Feel free to make a pull request, and add your name to the authors in the DESCRIPTION file.
What do you say to an approach based on a left join?
hgnc_id_to_ensembl_id <- function(hgnc_id, file = latest_archive_url()) {
hgnc_dataset <- import_hgnc_dataset(url)
hgnc_tbl <- tibble::tibble(hgnc_id)
dplyr::left_join(x = hgnc_tbl, y = hgnc_dataset[c("hgnc_id", "ensembl_gene_id")], by = "hgnc_id")
}
hgnc_id_to_ensembl_id(c("HGNC:37133", "HGNC:23336"))
import_hgnc_dataset(url)
or use a more explicit disk cache as you proposedensembl_gene_id
to ensembl_id
...hgnc_id_to_ensembl_id()
Hi Ramiro,
Thanks for your response.
I added a PR to add disk caching - see https://github.com/ramiromagno/hgnc/pull/3. Very happy to make changes if you have suggestions.
I like your left_join approach, but sometimes it's useful to have a vector based approach rather than a data.frame based approach. How about something like this which implements both:
get_hgnc_key_table <- function(key, columns, unique=TRUE) {
key_table <-
import_hgnc_dataset() %>%
dplyr::select(key = all_of(key),
dplyr::all_of(columns)) %>%
dplyr::filter(!is.na(key)) %>%
dplyr::distinct()
if (unique) {
key_table <-
key_table %>%
dplyr::add_count(key) %>%
dplyr::filter(n == 1) %>%
dplyr::select(-n)
} else {
key_table <-
key_table %>%
tidyr::chop(-key)
}
key_table <-
key_table %>%
dplyr::rename_with(.fn = ~ key, .cols = key)
}
hgnc_join <- function(.data,
by = 'hgnc_id',
columns = 'symbol',
unique = TRUE) {
key_table <- get_hgnc_key_table(
key = unname(by),
columns = columns,
unique = unique
)
dplyr::left_join(
.data,
key_table,
by = by
)
}
hgnc_convert <- function(x,
from = 'hgnc_id',
to = 'symbol',
unique = TRUE) {
key_table <-
get_hgnc_key_table(
key = from,
columns = to,
unique = unique
) %>%
dplyr::rename(
from = all_of(from),
to = all_of(to)
)
with(key_table, to[match(x, from)])
}
for example:
my_ids <- c('HGNC:15284', 'HGNC:2470', 'HGNC:6505', 'HGNC:36858', 'HGNC:45641')
my_symbols <- hgnc_convert(my_ids, from = 'hgnc_id', to = 'symbol')
my_symbols
[1] "OR5D17P" "CSRP2" "RPSAP13" "RPLP1P12" "RNU7-107P"
dplyr::tibble(symbol = my_symbols) %>%
hgnc_join(by = 'symbol',
columns = c('hgnc_id', 'ensembl_gene_id', 'entrez_id'))
# A tibble: 5 × 4
symbol hgnc_id ensembl_gene_id entrez_id
<chr> <chr> <chr> <int>
1 OR5D17P HGNC:15284 ENSG00000181837 81196
2 CSRP2 HGNC:2470 ENSG00000175183 1466
3 RPSAP13 HGNC:6505 ENSG00000233924 3923
4 RPLP1P12 HGNC:36858 NA 646566
5 RNU7-107P HGNC:45641 ENSG00000238523 106479067
Let me know your thoughts, happy to make a pull request for this.
I have commented on your pull request.
Hello,
Thanks for putting together this package, it's very nicely done.
I have an unreleased hgnc package that I was thinking of releasing it to CRAN. However, itmight be a good idea to pool our efforts rather than having multiple hgnc packages with different functionalities.
Are you open to contributions?
Some of the features implemented in my package that could be ported over are:
hgnc2ensembl
&ensembl2hgnc
(converts hgnc_id to ensemble_gene_id and vice versa)sym2hgnc
&hgnc2sym
(converts symbol to hgnc_id and vice versa)use_cache(dir = '~/.hgnc_cache')
use_hgnc_version(ver = latest_monthly())