lozuponelab / AMON

Annotation of Metabolite Origin via Networks: A tool for predicting putative metabolite origins for microbes or between microbes and host with or without metabolomics data
MIT License
21 stars 11 forks source link

KEGG API timeouts #21

Open sterrettJD opened 4 months ago

sterrettJD commented 4 months ago

Documenting email conversation with @acolorado1 in case others run into this

Sofia:

Currently trying to run 3 pairwise comparisons with AMON (e.g., 3 files total and AMON takes 2 at a time). I have manged to run 2 of the comparisons but the third keeps failing out with a connection error. Does this have to do with the limits of the KEGG API? I thought that limitation had more to do with the file size, which would not apply in this case as the files have already been used in a previous comparison without any issue. I completely understand if this is not enough information (or if I explained it poorly), I was just hoping you might have some thoughts on the issue.

The issue miraculously resolved itself. I guess it needed time between queries? Not sure, would still be interested in your thoughts on this.

John:

What exactly was the connection error? Sometimes the KEGG API will boot you out for like an hour and then you have to wait to get back in. This "time out" used to be shorter but they've been gradually making it longer.

Sofia:

I definitely think that is what happened. There was a long message of code and pathway information that ended in a connection error that I unfortunately did not document. This was only after I had been running AMON for over 3 hours and once I returned about an hour later, and it worked again. Maybe this would be something good to mention in the AMON README as it was a bit unexpected.

I don't necessarily think the issue is file size, just seems like KEGG API requests are limited on a roughly per minute basis and maybe per hour as well (so larger files are more susceptible, but if we can slow the request rate it wouldn't be an issue). It used to just time users out for a few minutes and then you could start again, but it seems now like it may be timing users out for 30+ minutes, so longer wait periods are sometimes necessary. This issue only happens if AMON users rely on making requests from KEGG because they haven't paid for a local copy of the database itself.

I do agree that mentioning this either in the README or here would be good. For quality of life and robustness, I'm also thinking that if a user passes --save_entries, AMON should export/save any parsed entries if the connection times out, and we could add something like a --resume flag to pick up where that left off, so that a user doesn't have to start from scratch after a failed connection issue.

What do you think, @acolorado1?