Closed olafz closed 5 months ago
Thank you, @olafz, for bringing this matter to our attention 👍. We'll investigate it further.
Hello, @olafz, upon reviewing our current plugin code, I noticed we already have a github_repository_collaborator
table. This should help us gather information on collaborators (OUTSIDE
, DIRECT
, and ALL
) for a repository. You can specify the type of collaborators you're interested in by using the affiliation option in the where clause, with ALL
being the default setting. For example: select * from github_repository_collaborator where repository_full_name = 'turbot/steampipe-plugin-aws' and affiliation = 'outside'
. When querying github_repository_collaborator
, providing the repository_full_name
is mandatory, while specifying the affiliation
parameter is optional (with possible values: direct
, outside
, and all
).
Please try this approach and let me know if it's beneficial.
Thank you!
Hi @ParthaI,
Thanks for your feedback. Your suggestion of using the github_repository_collaborator
does work, but I have an use case which makes using this table extremely inefficient.
I would like to create an overview of all external collaborators in a GitHub organization. If I were to use the github_repository_collaborator
table, I would have to create a query that iterates (join) over all repositories in the GitHub organization (since repository_full_name
is a mandatory parameter).
In our case - because of the size of the organization - that would mean going over thousands and thousands of repositories, where most of them don't even have an outside collaborator. It would be much more efficient if I were able to do it the other way around: first query for all outside collaborators and then go from there.
Thanks again for looking into this issue and I hope this creates some clarity. Also: if there is another way to achieve this efficiently, let me know :)
Hello, @olafz, I was trying to achieve the goal as per the comments provided in the in the issue description.
However, I am facing a few issues while achieving the goal. I found some articles and community discussion related to my issue. I will give it a try once more.
I constructed a query to get outside collaborators at the organization level. Could you please give it a try in Github GraphQL Explorer and share your feedback with us if the following GraphQL query works as expected?
query {
organization(login: "<YOUR ORG NAME>") {
url
login
repositories(first: 100) {
pageInfo {
endCursor
hasNextPage
}
nodes {
name
collaborators(affiliation: OUTSIDE) {
edges {
permission
}
nodes {
url
login
name
email
company
}
}
}
}
}
}
Thank You!
Hello @ParthaI,
Thanks for the follow-up. I tested the query you provided and it seems to do the job; it looks pretty good.
See below the example from the first response for one repository with external collaborators:
{
"data": {
"organization": {
"url": "https://github.com/<YOUR ORG NAME>",
"login": "<YOUR ORG NAME>",
"repositories": {
"pageInfo": {
"endCursor": "...",
"hasNextPage": true
},
"nodes": [
{
"name": "<REPOSITORY NAME>",
"collaborators": {
"edges": [
{
"permission": "READ"
},
{
"permission": "WRITE"
}
],
"nodes": [
{
"url": "https://github.com/<SOME OUTSIDE COLLABORATOR>",
"login": "<SOME OUTSIDE COLLABORATOR>",
"name": "...",
"email": "...",
"company": "..."
},
{
"url": "https://github.com/<SOME OTHER OUTSIDE COLLABORATOR>",
"login": "<SOME OTHER OUTSIDE COLLABORATOR>",
"name": "...",
"email": "...",
"company": null
}
]
}
},
...
Thank you, @olafz, for verifying that the GraphQL query works well in the GitHub GraphQL Explorer. I'll proceed with the required adjustments on the plugin side and inform you as soon as the modifications are complete.
I appreciate your cooperation!
Hello @olafz, I've introduced a new table called github_organization_collaborator
in the issue-404
branch.
Table Usage:
organization
value (the name of the organization) in the where
clause when querying this table.affiliation
value in the where clause.affiliation
is ALL, with possible values being OUTSIDE, DIRECT, or ALL.Unfortunately, I was unable to test the changes due to missing organization-level permissions. Your assistance in testing these changes on the PR branch and providing feedback would be greatly appreciated.
Testing Steps in the PR Branch:
git clone https://github.com/turbot/steampipe-plugin-github.git
cd steampipe-plugin-github
issue-404
branch: git checkout issue-404
make
Thank you!
Hi @ParthaI,
Awesome; thanks! I'm going to give it a try somewhere in the next couple of days.
Hello, @olafz, have you had the opportunity to try it out yet?
Hello @ParthaI,
Thank you for your patience. I've been testing this branch, but unfortunately it did not work. The following query:
> SELECT * FROM github_organization_collaborator WHERE organization = '...';
Consistently results in
Error: github: non-200 OK status code: 502 Bad Gateway body:
{
"data": null,
"errors":
[
{
"message":"Something went wrong while executing your query. This may be the result of a timeout, or it could be a GitHub bug. Please include `86E8:...:...:...:...` when reporting this issue."
}
]
}
The GraphQL results still work, so I suspect Steampipe to do something inefficient? I've contacted GitHub support; maybe they can help figuring out what the cause is.
I've gotten a response from GitHub:
Our logs indicate that you're hitting our timeout threshold when fetching the data. All API requests, both for the REST API and GraphQL API, have a 10 second limit on execution time. If that limit is reached for a request, the request is terminated and you get back that error.
One of the things I notice, is that in other GitHub related Steampipe queries, I see a Loading results
quickly after the query has been issued, increasing in number:
This goes on until there is an output. I don't see anything like that in the github_organization_collaborator
query. In an attempt to further debug the issue, I ran the earlier query, but this time with a limit:
> SELECT * FROM github_organization_collaborator WHERE organization = '...' LIMIT 1;
That gave a clue:
Error: github: unable to parse hydrate item {0xc0006529f0 ADMIN {{992 MDQ6VXNlcjk5Mg== Olaf van Zandwijk}
olafz 2008-02-26 21:22:23 +0000 UTC 2024-02-05 15:43:50 +0000 UTC https://github.com/olafz}}
as an User (SQLSTATE HV000)
Hi @olafz, apologies for the late reply, and I greatly appreciate your patience.
I've submitted a PR to address this issue. The table should now return the results correctly.
Thanks!
Hi @ParthaI,
The new version works better, but still not good :) The query now returns results as long as I limit the results to a relatively low amount:
> SELECT * FROM github_organization_collaborator WHERE organization = '...' LIMIT 50;
Without a limit, or with a larger limit (> 100), I still see the same behavior as before:
I still observe the progress for this query differently compared to other GitHub queries. They show a progress indicator:
Other queries:
This query just shows Executing query...
for a while and then fails:
> SELECT * FROM github_organization_collaborator WHERE organization = '...' LIMIT 100;
Error: github: non-200 OK status code: 502 Bad Gateway body: "..." (SQLSTATE HV000)
Hello @olafz, apologies for the snag!!
Could you please raise a separate issue as per the issue comment? I will take a look.
Please include the following details in the new issue body:
Are you aware of the rough number of collaborators within your organization? The issue might stem from the extensive datasets in the org. I'll attempt to replicate it locally by tweaking the plugin code.
Could you describe how the query SELECT * FROM github_organization_collaborator WHERE organization = '...' and affiliation = 'OUTSIDE'
behaves, both with and without the limit clause?
Thanks!
I've opened a new issue here: https://github.com/turbot/steampipe-plugin-github/issues/413
Thanks for the assistance!
Steampipe already supports querying regular members of an organization through the
github_organization_member
table. There is another way people can access an arganization's repositories: as outside collaborator.An outside collaborator is a person who is not a member of an organization, but has access to one or more of the organization's repositories
It would be very nice if we could query these as well using Steampipe.
References The API documentation to query outside collaborators for an organization is here: https://docs.github.com/en/rest/orgs/outside-collaborators?apiVersion=2022-11-28