zendframework / modules.zendframework.com

Home for ZF2 module distribution
BSD 3-Clause "New" or "Revised" License
189 stars 159 forks source link

Strategies to avoid hitting the rate limit? #349

Closed adamlundrigan closed 9 years ago

adamlundrigan commented 9 years ago

At the very least we should be caching the result of API calls used to generate the module view page. A better solution might be to follow Packagist's lead: indefinitely cache the initial result during module add, then listen for repository changes via a webhook handler. Of course, this will require us to instruct the module owner on how to enable it.

ins0 commented 9 years ago

allready implement as the github client uses etags to avoid increasing the current api limit if a repository content did not changed.

https://developer.github.com/v3/#conditional-requests https://github.com/EvanDotPro/EdpGithub/blob/master/src/EdpGithub/Listener/Cache.php#L42

authenticated request to github had a rate limit of 5000 calls per hour and 50 for unauthenticated requests. i would wonder if we hit this rate limit the next time again after we use this caching and now authenticated requests on production. no?

adamlundrigan commented 9 years ago

I haven't looked into it too closely yet, but I set up my local dev env, logged in, and tried to sync my modules and after a sizable wait received the "API limit has been exceeded" message. My GitHub account has 56 repositories.

ins0 commented 9 years ago

now i got your point, there is one request to collect all the user repositorys but the old function https://github.com/zendframework/modules.zendframework.com/blob/master/module/ZfModule/src/ZfModule/Service/Module.php#L71 will use one api call against every single github repository in order to filter only zf2 modules.

i would love to see this check moved to the register process where the check will be executed. or we need to tell module owners how to name there repositorys in order to display it in the module listening eg. zf2- reponame. of course we could also save user repositorys in the db or other storage/cache

also if a user only access the user panel the ajax request will fired and force to load this heavy request machine.

other suggestions?

adamlundrigan commented 9 years ago

One optimization could be to use the search instead of loading a list of all the user's repositories. As an example, with this query:

user:adamlundrigan filename:Module.php "class Module"

I get only the repositories that match our criteria for being a module (see my result). This also works for organization names.

ins0 commented 9 years ago

@adamlundrigan awesome :+1: like

adamlundrigan commented 9 years ago

Something's fishy with the rate limit on my local clone. Invoking my awesome powers of VDD I added this to EdpGithub\Listener\Cache::postSend:

var_dump($response->getHeaders()->get('X-RateLimit-Remaining')->getFieldValue());

And this is the progression of X-RateLimit-Remaining when I try to sync my repositories:

image

ins0 commented 9 years ago

woot call decreases the rate limit from 4992 to 29 :fearful: VDD made my day - thanks :laughing:

ins0 commented 9 years ago

ok now it makes sense the search api had a rate limit from only 30 requests per minute. so i think there is no other option as to leave the search api and looking for a different way to display modules - simple hit url field?

adamlundrigan commented 9 years ago

Ah yes, a different rate limit for searches would explain that. I agree that it may be best to take the same approach as Packagist: enter your repository URL to add a module to the site (this would also make supporting sources other than GitHub easier as well). If we do add a webhook receiver to catch updates from GitHub we could have it import the module data if the module doesn't already exist on the site.

If nobody is currently working on refactoring/rewriting the module import and sync I can work on implementing the repository url and webhook method.

ins0 commented 9 years ago

:+1:

gianarb commented 9 years ago

@adamlundrigan can you try if you authenticate the request the problem of rate persist? :mag: Thanks..

ins0 commented 9 years ago

@gianarb with unauthenticated requests the limit will drop to 5 requests/minute

gianarb commented 9 years ago

Ok, thanks.. Sorry :)

ins0 commented 9 years ago

:smile: as fiddler pops up the response headers i was thinking this is a bad joke and unauthenticated too ^^ but unfortunately not :)

gianarb commented 9 years ago

I don't know but I'm building a little application to manage Github PR status. Login from Github and resume all my repos (103) and I have not this problem.. After the first request (list of repositories) I save this list into the database and exist a button to resume it from github.. mmm

ins0 commented 9 years ago

the search api make the problem as it had a different (and looooowwer) rate limit as the repository api.

main problem is this zf2 module check function https://github.com/zendframework/modules.zendframework.com/blob/8af567eaa08f2f67ecf6c0f55998065201ff0cf0/module/ZfModule/src/ZfModule/Service/Module.php#L74

gianarb commented 9 years ago

Ok.. Maybe we can change actual flow.. We can check module.php if he tries to enable repository

ins0 commented 9 years ago

https://github.com/zendframework/modules.zendframework.com/issues/349#issuecomment-71758484 - i like @adamlundrigan suggestion like packagist handels the "add repository" flow but some name standards for repository names would have positive aspects too without forced to look at the repository file content.

clearly identify a repository for zf2- or in the future for zf3- and users would see ontop "this is a zfX- module i can use for zf" e.g. when browsing github

/cc @localheinz @Ocramius @GeeH

adamlundrigan commented 9 years ago

Perhaps we could support setting the Composer package type to "zf2-module"? (see Composer docs, here). We would still need to fallback to searching for Module class for all those packages that don't have the type set properly. It also doesn't help the rate limit issue, but following a Packagist-like module add procedure should negate those issues anyway

adamlundrigan commented 9 years ago

I've moved the discussion around the module add procedure to #352

GeeH commented 9 years ago

I believe this is fixed. I added authentication. If it's not please re-open.