Issues with import_hgnc_dataset(): HTTP error 404

nehawali21 commented 1 month ago

Thank you so much for making such a handy package to extract the latest gene information from the HGNC! I'm a novice in R and data analysis, so this has been a wonderful way to get the newest details about genes that I'm evaluating in my data.

I didn't encounter issues when I last used this package a few months ago. However, I've run into some trouble this week despite not having edited the code.

When I try to run the following code to just initially load HGNC data:

hgnc <- import_hgnc_dataset(latest_archive_url()) %>% 
  subset(status == "Approved") %>% 
  dplyr::select(c(symbol,
                  name,
                  locus_group,
                  locus_type,
                  ensembl_gene_id,
                  entrez_id))

I receive this error message:

Error in h(simpleError(msg, call)) : 
  error in evaluating the argument 'x' in selecting a method for function 'subset': HTTP error 404.

To troubleshoot line by line, when I just run import_hgnc_dataset(latest_archive_url()), I receive the below error message, which suggested to me that perhaps the URL is an issue:

Error in open.connection(5L, "rb") : HTTP error 404.

To test this, running latest_archive_url() yields the following without issues:

[1] "https://ftp.ebi.ac.uk/pub/databases/genenames/hgnc/tsv/hgnc_complete_set.txt"

However, when I try to load this page into Chrome, I get an error that the URL was not found.

Trying to manually recreate the URL in Chrome, I can only get as far as https://ftp.ebi.ac.uk/pub/databases/genenames/, as I don't see a sub-folder or file just called hgnc.

The closest file path I can generate on my side is https://ftp.ebi.ac.uk/pub/databases/genenames/out_of_date_hgnc/tsv/hgnc_complete_set.txt. However, as the package should procure the latest HGNC gene names that are continually updated, I'm not sure if this would be the ideal file.

The code worked as of mid-August 2024, so perhaps hgnc_complete_set.txt was moved to the out_of_date_hgnc folder just afterwards which is why the URL changed and now the code doesn't work on my side.

Does the URL in the package needs to be updated to this link, or to something else altogether? Or am I doing something wrong on my side?

I'd be most grateful for your kind help with this! Thank you so much once again!