ponder-lab / GitHub-Issue-Classifier

Python script to mine for GitHub issues + comments and classify them.
MIT License
6 stars 0 forks source link

Process all returned issues #19

Closed khatchad closed 3 years ago

khatchad commented 3 years ago

If we are querying 1000 issues, why are we only processing the "top 3?" Also, what does "top" mean?

y3pio commented 3 years ago

So I picked the top 3 results to query so that we won't hit GitHub's public API query limit (roughly 1 query/sec), since we will need to query for the comments of each of the issues that we will be looking at, it can sometimes cause GitHub to temporarily block any additional calls being made from our IP address.

Since we query for the issue with a order=desc param, here the "top" 3 issue basically means we're slicing off the top 3 issues that we got back (post processing for string match on title/body).

I'm thinking of implementing an interval/delay for querying for comments that might help with preventing us from hitting GitHub's public API query limit.

y3pio commented 3 years ago

Per their documentation, looks like we are limited to 60 request per hour on the public, unauthenticated API. https://docs.github.com/en/rest/overview/resources-in-the-rest-api#rate-limiting

Will look to implementing an authenticated API call, either by supplying OAUTH tokens or asking the user to log into GitHub.

UPDATE: Added this to the QA/testing issue lists: #14

y3pio commented 3 years ago

Created issue to implement authenticated query: https://github.com/ponder-lab/GitHub-Issue-Mining/issues/27 This issue will be address in that enhancement effort above.

Closing this issue.