ramiromagno / hgnc

Download and import HGNC gene data into R
https://rmagno.eu/hgnc
Other
3 stars 2 forks source link

Issues with import_hgnc_dataset(): HTTP error 404 #6

Open nehawali21 opened 1 month ago

nehawali21 commented 1 month ago

Thank you so much for making such a handy package to extract the latest gene information from the HGNC! I'm a novice in R and data analysis, so this has been a wonderful way to get the newest details about genes that I'm evaluating in my data.

I didn't encounter issues when I last used this package a few months ago. However, I've run into some trouble this week despite not having edited the code.

When I try to run the following code to just initially load HGNC data:

hgnc <- import_hgnc_dataset(latest_archive_url()) %>% 
  subset(status == "Approved") %>% 
  dplyr::select(c(symbol,
                  name,
                  locus_group,
                  locus_type,
                  ensembl_gene_id,
                  entrez_id))

I receive this error message:

Error in h(simpleError(msg, call)) : 
  error in evaluating the argument 'x' in selecting a method for function 'subset': HTTP error 404.

To troubleshoot line by line, when I just run import_hgnc_dataset(latest_archive_url()), I receive the below error message, which suggested to me that perhaps the URL is an issue:

Error in open.connection(5L, "rb") : HTTP error 404.

To test this, running latest_archive_url() yields the following without issues:

[1] "https://ftp.ebi.ac.uk/pub/databases/genenames/hgnc/tsv/hgnc_complete_set.txt"

However, when I try to load this page into Chrome, I get an error that the URL was not found.

Trying to manually recreate the URL in Chrome, I can only get as far as https://ftp.ebi.ac.uk/pub/databases/genenames/, as I don't see a sub-folder or file just called hgnc.

The closest file path I can generate on my side is https://ftp.ebi.ac.uk/pub/databases/genenames/out_of_date_hgnc/tsv/hgnc_complete_set.txt. However, as the package should procure the latest HGNC gene names that are continually updated, I'm not sure if this would be the ideal file.

The code worked as of mid-August 2024, so perhaps hgnc_complete_set.txt was moved to the out_of_date_hgnc folder just afterwards which is why the URL changed and now the code doesn't work on my side.

Does the URL in the package needs to be updated to this link, or to something else altogether? Or am I doing something wrong on my side?

I'd be most grateful for your kind help with this! Thank you so much once again!

ramiromagno commented 1 month ago

hi @nehawali21:

Thanks for reporting this issue and taking the extra effort of looking into the root cause. Like you say, it could be that the URL changed. I will take a look.

nehawali21 commented 1 month ago

Great, thank you so much for your prompt response! I've been looking at the HUGO site too (https://www.genenames.org/download/archive/) to see if there's a different link for the newest names. They list https://storage.googleapis.com/public-download-files/hgnc/tsv/tsv/hgnc_complete_set.txt as the current release file, but at least in my window this appears blank with just column names.

I'm still a beginner with this all, so I'll leave this in the hands of a far more qualified person than me! Thank you so much again!

ramiromagno commented 1 month ago

It seems the new URL is: https://storage.googleapis.com/public-download-files/hgnc/tsv/tsv/hgnc_complete_set.txt. But the file seems to have only the header...

ramiromagno commented 1 month ago

Thanks, our messages just crossed :) Indeed, I already sent a message to the HGNC team about the empty file.

ramiromagno commented 1 month ago

They seem to be aware of the issue. Seemingly, by tomorrow it should be fixed.

nehawali21 commented 1 month ago

Thank you so much again for kindly looking into this! Looks like the HGNC have indeed updated their initially empty file.

Just a quick follow up on this - I reloaded the hgnc package just now and still get an Error in open.connection(3L, "rb") : HTTP error 404. issue when I try to run my initial loading HGNC data code. Does the link used in the package need to be updated?

ramiromagno commented 1 month ago

hi @nehawali21:

Yes, indeed, the package needs to be updated. I will do it later today so by tomorrow you should be able to use the updated version.

nehawali21 commented 1 month ago

Thank you so much! I haven't seen an updated package in RStudio so far; perhaps I'm looking in the wrong location?

Charly776 commented 1 week ago

@nehawali21

proabably you fixed it already, but for anyone else still having this issue:

you can import the data set using this code:

import_hgnc_dataset(file="http://ftp.ebi.ac.uk/pub/databases/genenames/out_of_date_hgnc/tsv/hgnc_complete_set.txt")