Open nehawali21 opened 1 month ago
hi @nehawali21:
Thanks for reporting this issue and taking the extra effort of looking into the root cause. Like you say, it could be that the URL changed. I will take a look.
Great, thank you so much for your prompt response! I've been looking at the HUGO site too (https://www.genenames.org/download/archive/) to see if there's a different link for the newest names. They list https://storage.googleapis.com/public-download-files/hgnc/tsv/tsv/hgnc_complete_set.txt as the current release file, but at least in my window this appears blank with just column names.
I'm still a beginner with this all, so I'll leave this in the hands of a far more qualified person than me! Thank you so much again!
It seems the new URL is: https://storage.googleapis.com/public-download-files/hgnc/tsv/tsv/hgnc_complete_set.txt. But the file seems to have only the header...
Thanks, our messages just crossed :) Indeed, I already sent a message to the HGNC team about the empty file.
They seem to be aware of the issue. Seemingly, by tomorrow it should be fixed.
Thank you so much again for kindly looking into this! Looks like the HGNC have indeed updated their initially empty file.
Just a quick follow up on this - I reloaded the hgnc
package just now and still get an Error in open.connection(3L, "rb") : HTTP error 404.
issue when I try to run my initial loading HGNC data code. Does the link used in the package need to be updated?
hi @nehawali21:
Yes, indeed, the package needs to be updated. I will do it later today so by tomorrow you should be able to use the updated version.
Thank you so much! I haven't seen an updated package in RStudio so far; perhaps I'm looking in the wrong location?
@nehawali21
proabably you fixed it already, but for anyone else still having this issue:
you can import the data set using this code:
import_hgnc_dataset(file="http://ftp.ebi.ac.uk/pub/databases/genenames/out_of_date_hgnc/tsv/hgnc_complete_set.txt")
Thank you so much for making such a handy package to extract the latest gene information from the HGNC! I'm a novice in R and data analysis, so this has been a wonderful way to get the newest details about genes that I'm evaluating in my data.
I didn't encounter issues when I last used this package a few months ago. However, I've run into some trouble this week despite not having edited the code.
When I try to run the following code to just initially load HGNC data:
I receive this error message:
To troubleshoot line by line, when I just run
import_hgnc_dataset(latest_archive_url())
, I receive the below error message, which suggested to me that perhaps the URL is an issue:To test this, running
latest_archive_url()
yields the following without issues:However, when I try to load this page into Chrome, I get an error that the URL was not found.
Trying to manually recreate the URL in Chrome, I can only get as far as https://ftp.ebi.ac.uk/pub/databases/genenames/, as I don't see a sub-folder or file just called hgnc.
The closest file path I can generate on my side is https://ftp.ebi.ac.uk/pub/databases/genenames/out_of_date_hgnc/tsv/hgnc_complete_set.txt. However, as the package should procure the latest HGNC gene names that are continually updated, I'm not sure if this would be the ideal file.
The code worked as of mid-August 2024, so perhaps
hgnc_complete_set.txt
was moved to theout_of_date_hgnc
folder just afterwards which is why the URL changed and now the code doesn't work on my side.Does the URL in the package needs to be updated to this link, or to something else altogether? Or am I doing something wrong on my side?
I'd be most grateful for your kind help with this! Thank you so much once again!