shurcooL / Go-Package-Store

An app that displays updates for the Go packages in your GOPATH.
MIT License
900 stars 29 forks source link

"API rate limit exceeded" errors #54

Closed rtfb closed 8 years ago

rtfb commented 8 years ago

I'm getting 403 API rate limit exceeded errors for all GH accesses after I use Go-Package-Store for a while:

2016/01/23 10:40:51 warning: gh.Repositories.CompareCommits:
GET https://api.github.com/repos/fragmenta/fragmenta-cms/compare/1693ff3207b534dfa0c30caebf85a83faa69ecd9...a7331ff3dbf26c0b59c7b067fe001a8880eb6245:
403 API rate limit exceeded for <IP address>. (But here's the good news: Authenticated requests get a higher rate limit. Check out the documentation for more details.) []

Is this a glitch or is it some limitation in what GH allows? Anybody else seen this?

dmitshur commented 8 years ago

This is very normal and expected. However, it might not be very well explained and handled, so I need to improve that.

Go Package Store currently uses a GitHub presenter that does unauthenticated API calls. They have a pretty low rate limit/quota.

That means if you run Go Package Store with many, many updates available, you'll likely exhaust the unauthenticated rate limit/quota quite fast.

It's usually not a problem for when I use it because I have few updates at a time, but the first time a new user runs it (if they have many packages that are out of date), they're almost guaranteed to run into this.

I didn't want to try to hardcode some auth into the code since it's open source and couldn't keep the secret token a secret... Any suggestions?

slimsag commented 8 years ago

How about asking users to generate their own auth token if they run into it, and setting via CLi flag or environment variable ?

On Sat, Jan 23, 2016 at 3:09 AM, Dmitri Shuralyov notifications@github.com wrote:

This is very normal and expected. However, it might not be very well explained and handled, so I need to improve that.

Go Package Store currently uses a GitHub presenter that does unauthenticated API calls. They have a pretty low rate limit.

That means if you run Go Package Store with many, many updates available, you'll likely exhaust the unauthenticated rate limit quite fast.

It's usually not a problem for when I use it because I have few updates at a time, but the first time a new user runs it (if they have many packages that are out of date), they're almost guaranteed to run into this.

I didn't want to try to hardcode some auth into the code since it's open source and couldn't keep the secret token a secret... Any suggestions?

— Reply to this email directly or view it on GitHub https://github.com/shurcooL/Go-Package-Store/issues/54#issuecomment-174167263 .

Follow me on twitter @slimsag https://twitter.com/slimsag.

dmitshur commented 8 years ago

Yeah, I could have an env var or CLI flag (maybe less preferable because it might show up in ps) to set an auth token for GitHub client.

rtfb commented 8 years ago

Thanks for explanation! I don't see any other general solution than providing an auth token. I guess the problem might be somewhat alleviated by caching the responses, but not sure if that would be worth the effort.

dmitshur commented 8 years ago

I guess the problem might be somewhat alleviated by caching the responses, but not sure if that would be worth the effort.

Do you mean caching responses within one session (this is already done), or caching across multiple runs of G-P-S (this is not done and I can consider this option)?

rtfb commented 8 years ago

Yes, I meant across multiple runs. That's exactly how I stumbled upon it, but not sure if that would be common for others (or for me outside of tinkering mode).

dmitshur commented 8 years ago

Right. I think it's definitely worth considering doing that. That way, if you run G-P-S a second time after 5 minutes, it will be faster and not use up your unauthenticated rate limit/quota.

It will still not help for initial runs when there are many, many packages. It will also not help when you're on a public or shared IP that happens to have the unauthenticated quota used up (because other people made many unauthenticated GH API requests). So an option to provide a GitHub token might still be desirable. But that's an orthogonal feature.

About saving state/cache between runs of G-P-S, the main blocker for that is I don't know what's a good location on one's disk to use for such data. It will also vary per operating system. If anyone has suggestions or pointers, that'd be great.

I am guessing I'd need to read up on how OSes do this and use the folders they recommend. For example, on Windows, it might be %appdata%. On OS X, it might be $HOME/Library, etc. It'd be great if there's a good Go package that abstracts all that behind a common interface.

dmitshur commented 8 years ago

I've experimented with using a caching http.RoundTripper implementation that can be provided by a library like github.com/gregjones/httpcache with good success. That library has multiple cache backends, including in-memory and on-disk ones.

The on-disk cache is a great fit for improving this situation, and when I tried it, it worked correctly, not using up additional rate limit quota after repeated runs.

So, I think I know how to resolve this, but this is now blocking on finding a way to come up with a good base path (on disk) for cache. In my testing, I just wrote to the current working directory, but that's not acceptable for production.

dmitshur commented 8 years ago

For reference, here's a sample run with a disk cache and some debug logging adding.

There are two runs. The first run has no cache, so all requests are cache misses.

The second run has the cache from previous run, so all requests happen to be cache hits. Note that the Rate response is cached too, that's why it appears to be the same as in first run (the real rate limit is at 1 of 60 for the hour, so without caching, the second run would run into API rate limit exceeded errors).

Go-Package-Store $ go build -tags=dev -o /tmp/o && /tmp/o
Using all Go packages in GOPATH.
Go Package Store server is running at http://localhost:7043/index.html.
2016/01/25 00:32:27 cache MISS for 2ea9d01f615042a65b46791da0eddc29 of 0 bytes
2016/01/25 00:32:27 Rate: 14 / 60
2016/01/25 00:32:27 cache MISS for 55d59090c36cc4a362c872dc8dc1426a of 0 bytes
2016/01/25 00:32:29 cache MISS for 90a52dc6ca488638cace5f89a38c7355 of 0 bytes
2016/01/25 00:32:29 Rate: 12 / 60
2016/01/25 00:32:29 cache MISS for c6f891a86a0890d662a804e960a54beb of 0 bytes
2016/01/25 00:32:29 cache MISS for 2b5c288ddb6ec7f56748573610c52a6e of 0 bytes
2016/01/25 00:32:29 Rate: 10 / 60
2016/01/25 00:32:29 cache MISS for 9ece98fd062eab1bce61137e6d10d63b of 0 bytes
2016/01/25 00:32:29 cache MISS for bd7447eafcc0e8037338cbeda2abc93c of 0 bytes
2016/01/25 00:32:29 Rate: 8 / 60
2016/01/25 00:32:29 cache MISS for 8e6ec64505d7651810f1542662acea46 of 0 bytes
2016/01/25 00:32:30 cache MISS for 38ccf8f8c36197c00a84e520edee42db of 0 bytes
2016/01/25 00:32:30 Rate: 6 / 60
2016/01/25 00:32:30 cache hit for 55d59090c36cc4a362c872dc8dc1426a of 2186 bytes
2016/01/25 00:32:33 cache MISS for 273f447e6c7a9878b404528def5748ea of 0 bytes
2016/01/25 00:32:33 Rate: 5 / 60
2016/01/25 00:32:33 cache hit for 55d59090c36cc4a362c872dc8dc1426a of 2186 bytes
2016/01/25 00:32:35 cache MISS for 1c44f01f09e68917db589796f5271898 of 0 bytes
2016/01/25 00:32:35 Rate: 4 / 60
2016/01/25 00:32:35 cache MISS for 6dabe2dce8923895f2758b6a27c6eddf of 0 bytes
2016/01/25 00:32:38 cache MISS for 20eef9b6727a61f5ada37ecf41fb12a8 of 0 bytes
2016/01/25 00:32:38 Rate: 2 / 60
2016/01/25 00:32:38 cache hit for 55d59090c36cc4a362c872dc8dc1426a of 2186 bytes
2016/01/25 00:32:38 cache MISS for 9e79196ce3978baf31b082fed7a9f682 of 0 bytes
2016/01/25 00:32:38 Rate: 1 / 60
2016/01/25 00:32:38 cache MISS for d7b6517a21fa10f90fd2dc3a2cceb219 of 0 bytes
REBUILDING SOURCE for: script.js using [main.go]
goReadersToJs taken: 1.162505362s
^CGo-Package-Store $ go build -tags=dev -o /tmp/o && /tmp/o
Using all Go packages in GOPATH.
Go Package Store server is running at http://localhost:7043/index.html.
2016/01/25 00:33:15 cache hit for 2ea9d01f615042a65b46791da0eddc29 of 12303 bytes
2016/01/25 00:33:15 Rate: 14 / 60
2016/01/25 00:33:15 cache hit for 55d59090c36cc4a362c872dc8dc1426a of 2186 bytes
2016/01/25 00:33:16 cache hit for 90a52dc6ca488638cace5f89a38c7355 of 12061 bytes
2016/01/25 00:33:16 Rate: 12 / 60
2016/01/25 00:33:16 cache hit for c6f891a86a0890d662a804e960a54beb of 2155 bytes
2016/01/25 00:33:17 cache hit for 2b5c288ddb6ec7f56748573610c52a6e of 12056 bytes
2016/01/25 00:33:17 Rate: 10 / 60
2016/01/25 00:33:17 cache hit for 9ece98fd062eab1bce61137e6d10d63b of 2184 bytes
2016/01/25 00:33:17 cache hit for bd7447eafcc0e8037338cbeda2abc93c of 14071 bytes
2016/01/25 00:33:17 Rate: 8 / 60
2016/01/25 00:33:17 cache hit for 8e6ec64505d7651810f1542662acea46 of 2169 bytes
2016/01/25 00:33:18 cache hit for 38ccf8f8c36197c00a84e520edee42db of 20597 bytes
2016/01/25 00:33:18 Rate: 6 / 60
2016/01/25 00:33:18 cache hit for 55d59090c36cc4a362c872dc8dc1426a of 2186 bytes
2016/01/25 00:33:20 cache hit for 273f447e6c7a9878b404528def5748ea of 15826 bytes
2016/01/25 00:33:20 Rate: 5 / 60
2016/01/25 00:33:20 cache hit for 55d59090c36cc4a362c872dc8dc1426a of 2186 bytes
2016/01/25 00:33:22 cache hit for 1c44f01f09e68917db589796f5271898 of 15521 bytes
2016/01/25 00:33:22 Rate: 4 / 60
2016/01/25 00:33:22 cache hit for 6dabe2dce8923895f2758b6a27c6eddf of 2224 bytes
2016/01/25 00:33:25 cache hit for 20eef9b6727a61f5ada37ecf41fb12a8 of 51014 bytes
2016/01/25 00:33:25 Rate: 2 / 60
2016/01/25 00:33:25 cache hit for 55d59090c36cc4a362c872dc8dc1426a of 2186 bytes
2016/01/25 00:33:25 cache hit for 9e79196ce3978baf31b082fed7a9f682 of 37283 bytes
2016/01/25 00:33:25 Rate: 1 / 60
2016/01/25 00:33:25 cache hit for d7b6517a21fa10f90fd2dc3a2cceb219 of 2292 bytes
REBUILDING SOURCE for: script.js using [main.go]
goReadersToJs taken: 1.098184313s
^CGo-Package-Store $ 
Go-Package-Store $ 
Go-Package-Store $ 
dmitshur commented 8 years ago

One potential security consideration to be aware of. If we add both on-disk cache AND a way to provide a secret GitHub API token, then the on-disk cache will contain sensitive information (the secret token).

Therefore, we should either not use cache when authentication is provided, or store the cache in a secure location so other users can't access it.

rtfb commented 8 years ago

That sounds like good news.

The locations for cache might be the ones you mentioned on Win and OSX, plus ~/.Go-Package-Store/ on Linux. I'm not sure how robust these choices would be, but it seems a decent start.

The only bit I'm unsure of is the %appdata%, I checked on a Windows-7 box today and it resolved to C:\Users\<user>\AppData\Roaming\, while I would expect ...\Local\. But that might not really matter too much.

dmitshur commented 8 years ago

The only bit I'm unsure of is the %appdata%

Perhaps it should be %LOCALAPPDATA% then.

rtfb commented 8 years ago

I happened to need the same kind of location for one of my projects, so I wrote down what we discussed here. Feel free to use if you find it helpful: https://github.com/rtfb/cachedir/blob/master/cachedir.go

No docs yet, but I've tested it on Linux, OSX and Windows. Assumed all flavours of BSD and Solaris behave same as Linux.

dmitshur commented 8 years ago

Thanks @rtfb, that will come in handy.

I'm working on a few things that will significantly improve the situation here.

First Phase

Better display of errors in the web UI (and not in terminal). It currently looks something like this:

image

Second Phase

Use an application data directory on local disk to store cache for github presenter. This will help avoid burning through the API rate limit quota when running Go Package Store multiple times, when not much has changed.

Third Phase

Add supprot for an env var to set a GitHub API token in order to do authenticated requests and have a quota of 5000 API calls per hour rather than just 60.

I've learned that, luckily, when using authenticated GitHub API clients, the cached responses don't really contain any sensitive data (it's only setting some headers on the requests, not on responses, which are cached). So doing this should be quite accessible without security concerns.

dmitshur commented 8 years ago

First phase is mostly complete in #56, but waiting on an external PR to be merged.

dmitshur commented 8 years ago

@rtfb, I've created a similar package for Go Package Store's current needs, see ospath. I've opted to start with OS X support only for now, since I can test it well. I plan to expand coverage of other OSes later, with careful testing.

Go Package Store will only use the cache dir if it's acquired successfully, otherwise it will fall back to current behavior of not having a persistent cache.

dmitshur commented 8 years ago

I've added support for providing a GitHub auth token in ee9b46d (it doesn't need any scopes, it's only used to be authenticated and receive a higher API rate limit). That's phase 3, and it should resolve this issue completely.

Please let me know if anything doesn't work as expected.

rtfb commented 8 years ago

Thanks, @shurcooL!