ipfs-inactive / project-repos

[ARCHIVED] Project health metrics
http://project-repos.ipfs.io/
MIT License
7 stars 8 forks source link

github api throttles requests #5

Closed harlantwood closed 8 years ago

harlantwood commented 8 years ago

github api throttles requests with or without a username/password...

solution may be to use oauth? but may have the same issues with that.

could get the readme.md from the raw url, instead of API, would cut down requests about 30x from current load.

dignifiedquire commented 8 years ago

Maybe update the repo list only on explicit request/every 24h?

harlantwood commented 8 years ago

Maybe update the repo list only on explicit request/every 24h?

hm, good idea.... and since I want to serve this ultimately from IPFS / IPNS / dnslink... there is no server to cache the repo list... we could of course cache it in IPFS, especially with the recent (or soon) opening of write API... but I can't see how do do this writing to IPFS. I could write a cache every 24 hours to IPNS, but that implies server/cron/etc, which I'm trying to avoid... maybe CRDTs hold a possible answer, but I don't get that enough to go there, though I'm interested in learning...

any thoughts here @jbenet @whyrusleeping @diasdavid?

dignifiedquire commented 8 years ago

You could have a deploy script which pulls the repo list down and stores it in a file before you push to github

harlantwood commented 8 years ago

That sounds good... but don't want to have to push to github to update the repo list...

Sounds like some kind of cron would be needed, on CI or a heroku instance ... maybe (notes to self):

harlantwood commented 8 years ago

I am planning a fallback cascade:

  1. Try to fetch the fresh repo list from github api anonymously. Throttling is per IP address, so this should succeed most of the time, especially if we cut down the API calls from 31 to 1, which is easy.
  2. Fall back on a repo list that we fetch and save as json at deploy time. Ideally trigger a nighly build so this list is fairly fresh.
harlantwood commented 8 years ago

More info on rate limiting from https://developer.github.com/v3/#rate-limiting:

For requests using Basic Authentication or OAuth, you can make up to 5,000 requests per hour. For unauthenticated requests, the rate limit allows you to make up to 60 requests per hour.

RichardLitt commented 8 years ago

Could you pull the repo list from the project-directory.md file in ipfs/ifps? That way we could include repositories that aren't in the IPFS organisation, too. And why wouldn't we pull directly from the raw URL for each repo?

Switching to OAuth or just using Basic Auth should work - are we doing more than 5k an hour?

harlantwood commented 8 years ago

Could you pull the repo list from the project-directory.md file in ipfs/ifps? That way we could include repositories that aren't in the IPFS organisation, too.

We could, certainly. Not sure we want to though. Are all repos we point to in that list, even outside of ipfs org, repos that we want/expect to follow our guidelines? ( There are quite a few already, and likely to be more: https://github.com/ipfs/ci-status/issues/1#issue-115979968 )

And why wouldn't we pull directly from the raw URL for each repo?

Raw URLs or API fetches both work fine, and both are throttled if not logged in.

Switching to OAuth or just using Basic Auth should work - are we doing more than 5k an hour?

Nope! Should be plenty.

harlantwood commented 8 years ago

Using raw URLs anonymously is working well. We also have a counter of remaining github calls. We run out sometimes in dev, but probably not so much IRL. Closing, please reopen if experiencing pain around this.