turbot / steampipe-plugin-github

Use SQL to instantly query repositories, users, gists and more from GitHub. Open source CLI. No DB required.
https://hub.steampipe.io/plugins/turbot/github
Apache License 2.0
72 stars 28 forks source link

(Performance) issue with `github_outside_collaborators` table #413

Closed olafz closed 5 months ago

olafz commented 5 months ago

Describe the bug This is a follow-up of https://github.com/turbot/steampipe-plugin-github/issues/404 as discussed with @ParthaI. In addition to the comments in that issue, I've provided more context and code below.

Additional context

Are you aware of the rough number of collaborators within your organization?

The organization where this issue appears, has ~550 collaborators (both users and outside collaborators).

Could you describe how the query SELECT * FROM github_organization_collaborator WHERE organization = '...' and affiliation = 'OUTSIDE' behaves, both with and without the limit clause?

Actually, this specific query works in all cases; with and without LIMIT. I took the opportunity to run a COUNT to give an idea of the size:

> SELECT COUNT(*) FROM github_organization_collaborator WHERE organization = '...' AND affiliation = 'OUTSIDE';
+-------+
| count |
+-------+
| 635   |
+-------+

I notice that in the case with this query, progress is shown:

Screenshot 2024-04-02 at 19 56 43

Without the additional AND affiliation = 'OUTSIDE', the behavior as described in https://github.com/turbot/steampipe-plugin-github/issues/404 is observed.

ParthaI commented 5 months ago

Hello @olafz, after further investigation into the problem, I find myself unable to reproduce the error when working with a smaller dataset. It appears that reaching the secondary rate limit is less likely with a limited number of results. To accurately reproduce the issue, accessing real-time data is essential, as recursive API calls have proven to be of limited assistance.

From what I've observed:

By the way, I've made an update to reduce the query's page size in the issue-413 branch. Could you test it out in the PR branch and share if it makes any difference?

Thank you!

olafz commented 5 months ago

By the way, I've made an update to reduce the query's page size in the issue-413 branch. Could you test it out in the PR branch and share if it makes any difference?

It does! The query

> SELECT * FROM github_organization_collaborator WHERE organization = '...' and affiliation = 'OUTSIDE';

Now runs fine without LIMIT clause πŸŽ‰ Thanks for this fix πŸ™