ramiromagno / gwasrapidd

gwasrapidd: an R package to query, download and wrangle GWAS Catalog data
https://rmagno.eu/gwasrapidd/
Other
89 stars 15 forks source link

parsing issue while using get_variants() #28

Closed peranti closed 2 years ago

peranti commented 2 years ago

Hi @ramiromagno

I used the get_variants() function using the gene list and got the following error.

variant_data <- get_variants(gene_name = gene_list)
#  downloading [===================>-----------------------------------------------------]  28% eta:  6mErreur : parse error: premature EOF
#                                       {   "_embedded" : {     "single
#                     (right here) ------^

Are you aware of this issue and have any workaround?

Meanwhile, I will try finding a workaround. Thanks for this fantastic package.

ramiromagno commented 2 years ago

Hi @peranti:

It is the first time I am seeing this type of error. It seems that the response from the GWAS Catalog server was not completed, hence the End of File (EOF) error.

I noticed that GWAS Catalog REST API service has been pretty bumpy today. I am not sure what's going on. I have sent a couple of emails to the GWAS Dev Team, but I haven't got a response yet.

I am not sure your problem is related to these other issues. It could be.

If you want, I am glad to look more carefully into your example but I will need the value of the gene_list variable.

peranti commented 2 years ago

Thanks, @ramiromagno, for the quick reply.

I am now checking the command with ten genes and will update you in case of any issues. I can already tell you that eta: 28m.

It is better to run the command later once the GWAS Catalog REST API service is normal. But how can I check it? 🤔

Otherwise, rerun using loop with wait time in between two genes?

ramiromagno commented 2 years ago

I don't know how long your list of genes is. But I am guessing that if it's in the order of thousands, you might be hammering their server too strongly and they might have increased the time to respond to your queries.

As you said, if that is the case, it might be better to wait a few hours and try again. But then, split your queries such that each call to get_variants() has only a few genes each time (~ 10 genes). Make sure to pause between requests, something like ~5 sec. At the end if you have a list of variants objects, you can always use the function union() to join everything together into one single object.

peranti commented 2 years ago

Sure, I have around 200 genes, and I suppose it is acceptable in one go after some time.

ramiromagno commented 2 years ago

I would still split your queries though. The GWAS Catalog team does not have any details regarding API throttling, but I am guessing this might be your problem.

peranti commented 2 years ago

I encounter the following error when working with a single gene:

Notice: The request for https://www.ebi.ac.uk/gwas/rest/api/singleNucleotidePolymorphisms/search/findByGene?geneName=PSMD5 failed: response code was 500.

Notice messages:
In gc_request_all(resource_url = resource_url, base_url = base_url,  :
The request for https://www.ebi.ac.uk/gwas/rest/api/singleNucleotidePolymorphisms/search/findByGene?geneName=PSMD5 failed: response code was 500.

The server is reachable, though.

is_ebi_reachable()
[1] TRUE
ramiromagno commented 2 years ago

Hi @peranti:

I am sorry you are having all these issues. I have been experiencing them myself too. I've just sent another email to the GWAS Catalog team. There is nothing else I can do I am afraid.

If you want to put some extra pressure, here's their email: gwas-info@ebi.ac.uk. :)

ramiromagno commented 2 years ago

Hi Pradeep,

I got an update from the GWAS team. It seems that these downtime issues are now resolved. Please let me know if it is working for you.

peranti commented 2 years ago

Thanks, @ramiromagno, for getting in touch with the GWAS team.

I can now interact with the website using the functionalities of gwasrapidd, albeit very slow. However, this situation is not an issue and hence closing it.