src-d / ghsync

GitHub API v3 > PostgreSQL
https://sourced.tech
GNU General Public License v3.0
9 stars 8 forks source link

Using multiple tokens #58

Open se7entyse7en opened 5 years ago

se7entyse7en commented 5 years ago

As we can see from here, ghsync slept 11 hours in a total time of 18 hours, and that's a lot. Does it make sense to use a pool of tokens? Does it violate some policy?

carlosms commented 5 years ago

Apparently from this part of the docs, the tokens would share the same limit if they belong to the same user. So it would need to be done with tokens from different users.

Authenticated requests are associated with the authenticated user, regardless of whether Basic Authentication or an OAuth token was used. This means that all OAuth applications authorized by a user share the same quota of 5000 requests per hour when they authenticate with different tokens owned by the same user.

https://developer.github.com/v3/#rate-limiting

se7entyse7en commented 5 years ago

I'm thinking about an org that could use different tokens from different users (let's say, for example, different managers).

se7entyse7en commented 5 years ago

Another idea could be to pass requests through a pool of proxies and change ip and use non-auth api. Or at least use that as a fallback when limit is reached.

smola commented 5 years ago

Using a pool of proxies is not something that we can get people to do. Also the non-auth API has a ridiculously low quota, you'd need to burn hundreds (thousands?) of proxies.

Sharing multiple users tokens wouldn't be crazy, but it's a bit awkward.

se7entyse7en commented 5 years ago

I forgot that the non-auth quota was that low 😕5000/h vs 60/h it's around x80, so it would need ~80 proxies (it would have been fine), but I didn't think that it has also to be multiplied by the number of users, so yes it's crazy.