rdev-hackaton / GitHubTimeTracker

MIT License
5 stars 1 forks source link

Github API requests are SLOW #49

Closed ffigiel closed 8 years ago

ffigiel commented 8 years ago

We should do something about it

ppeczek commented 8 years ago

IMO we need to divide it for usecases. When you call it in celery as cyclic task it doesn't matter this much. When you want it often, it does. But you don't want very general data quick. Single questions are quick. On the other hand we may as well download whole repo in some middle point and do the calculations there sending ready result.

ffigiel commented 8 years ago

What about issues and PRs? You can't clone that

ffigiel commented 8 years ago

I discovered that we have a ton of duplicate requests Enable github.enable_console_debug_logging()

In my case there were 193 requests to 21 unique urls.

:clock1: 1h

lhaze commented 8 years ago

In fact, that's a great info, cause duplicate requests that's something that we can actually fight with. On the other hand, with bad API from the GitHub we couldn't do anything.

ffigiel commented 8 years ago

Even if the collection took around 5-10s, it's will be a dramatic improvement compared to the current >50s

Here's the full log in case you want to play with the data. Edit: lines were sorted alphabetically gh.txt

ppeczek commented 8 years ago

https://github.com/rdev-hackaton/GitHubTimeTracker/blob/master/time_tracker/backends/sources/github_source.py Here is why: all group requests are written in forloops. :clock10: 10m

ffigiel commented 8 years ago

Yeah, it should do self.get_commits() instead of self._repo.get_commits(). Easy fix.

ffigiel commented 8 years ago

Forloops aren't the problem. I patched github.Repository class to print names in __getattribute__ method and there's no duplication.

Repository: rdev-hackaton/GitHubTimeTracker
Loading...
get_issues
get_commits
    Time: 1:00:00    Comment: None
    Time: 0:10:00    Comment: None
    ...
Loading took 167.28886127471924

PyGithub is the problem :disappointed:

Edit: the issue persists on current PyGithub master branch.

ffigiel commented 8 years ago

github3.py looks very promising, I'll try to adapt our existing source to use it