turbot / steampipe-plugin-github

Use SQL to instantly query repositories, users, gists and more from GitHub. Open source CLI. No DB required.
https://hub.steampipe.io/plugins/turbot/github
Apache License 2.0
72 stars 28 forks source link

Retrieve a specific file within a repository returns no record #444

Closed aminvielledebatAtBedrock closed 1 month ago

aminvielledebatAtBedrock commented 1 month ago

Describe the bug

The table github_repository_content is not able to return only one file when repository_content_path is used.

Steampipe version (steampipe -v) v0.21.8

Plugin version (steampipe plugin list) v0.42.0

To reproduce

select
*
from github.github_repository_content
where repository_full_name = 'Replay/service-bedrock-bff'
and repository_content_path = 'gitleaks.toml';

Expected behavior One record with the gitleaks.toml file.

Additional context With the following query, i get my record :

select
*
from github.github_repository_content
where repository_full_name = 'Replay/service-bedrock-bff';
aminvielledebatAtBedrock commented 1 month ago

The following patch works but some file informations are not present :

--- a/github/table_github_repository_content.go
+++ b/github/table_github_repository_content.go
@@ -106,6 +106,13 @@ func getFileContents(ctx context.Context, d *plugin.QueryData, h *plugin.Hydrate
                                                }
                                        }
                                } `graphql:"... on Tree"`
+                               Blob struct {
+                                       Oid            githubv4.String
+                                       AbbreviatedOid githubv4.String
+                                       Text           githubv4.String
+                                       IsBinary       githubv4.Boolean
+                                       CommitUrl      githubv4.String
+                               } `graphql:"... on Blob"`
                        } `graphql:"object(expression: $expression)"`
                } `graphql:"repository(owner: $owner, name: $repo)"`
        }
@@ -127,6 +134,18 @@ func getFileContents(ctx context.Context, d *plugin.QueryData, h *plugin.Hydrate
                return err
        }

+       if query.Repository.Object.Blob.Oid != "" {
+               data := query.Repository.Object.Blob
+               c := ContentInfo{
+                       Oid:            string(data.Oid),
+                       AbbreviatedOid: string(data.AbbreviatedOid),
+                       Content:        string(data.Text),
+                       IsBinary:       bool(data.IsBinary),
+                       CommitUrl:      string(data.CommitUrl),
+               }
+               d.StreamListItem(ctx, c)
+       }
+
        for _, data := range query.Repository.Object.Tree.Entries {
                if string(data.Type) != "tree" {
                        c := ContentInfo{
aminvielledebatAtBedrock commented 1 month ago

The problems seems to come from the graphQL query

Returns nothing :

query($expression:String!$owner:String!$repo:String!){
  rateLimit{
    remaining,used,cost,limit,resetAt,nodeCount
  },
  repository(owner: $owner, name: $repo){
    object(expression: $expression){
      ... on Tree{
        oid,abbreviatedOid,entries{
          name,path,size,lineCount,mode,pathRaw,isGenerated,type,object{
          ... on Blob{
            oid,abbreviatedOid,text,isBinary,commitUrl}}}
       }

    }
  }
}

Returns my file content :

query($expression:String!$owner:String!$repo:String!){
  rateLimit{
    remaining,used,cost,limit,resetAt,nodeCount
  },
  repository(owner: $owner, name: $repo){
    object(expression: $expression){
      ... on Tree{
        oid,abbreviatedOid,entries{
          name,path,size,lineCount,mode,pathRaw,isGenerated,type,object{
          ... on Blob{
            oid,abbreviatedOid,text,isBinary,commitUrl}}}
       }
      ... on Blob {
            oid, abbreviatedOid text,isBinary,commitUrl
      }
    }
  }
}

Maybe it's related to my GHE instance : Version 3.12.4

aminvielledebatAtBedrock commented 1 month ago

cc @ParthaI and @graza-io

ParthaI commented 1 month ago

Thanks @aminvielledebatAtBedrock, for the detailed information. I will take a look at it.

ParthaI commented 1 month ago

Hello @aminvielledebatAtBedrock,

I was looking into the issue you mentioned, and it seems the behavior is intended. The table provides the file content under a specified folder path. If no folder path is specified, it fetches all the content under the repository.

I agree that the suggestion you provided is working fine.

If you are looking for a particular file content, could you please try the following:

Could you please give these suggestions a try and let us know if they help?

Thanks!

aminvielledebatAtBedrock commented 1 month ago

Indeed it works but your graphql query looks like :

query($expression:String!$owner:String!$repo:String!){
  rateLimit{
    remaining,used,cost,limit,resetAt,nodeCount
  },
  repository(owner: $owner, name: $repo){
    object(expression: $expression){
      ... on Tree{
        oid,abbreviatedOid,entries{
          name,path,size,lineCount,mode,pathRaw,isGenerated,type,object{
          ... on Blob{
            oid,abbreviatedOid,text,isBinary,commitUrl}}}
       }

    }
  }
}

with expresion equals to HEAD:. It means you fetch all the files at root for a repository. If you want only one file in hundreds of repositories, you loose your time so much.

For other table, it means you've got to find the sha before fetching your HEAD, and actually, I'm note able to decode my file content with github_blob table :cry:

ParthaI commented 1 month ago

Hi @aminvielledebatAtBedrock,

with expresion equals to HEAD:. It means you fetch all the files at root for a repository. If you want only one file in hundreds of repositories, you loose your time so much.

Hmm, makes sense. It is definitely time-consuming.

For other table, it means you've got to find the sha before fetching your HEAD, and actually, I'm note able to decode my file content with github_blob table 😢

Yes, we have to find the SHA first to query the github_blob table, which requires more input to query.

However, I have raised a PR to address the above discussion.

Usage overview:

I hope this PR will meet our requirements. Please give it a try and share your feedback.

Thanks!

aminvielledebatAtBedrock commented 1 month ago

Hey @ParthaI and thank for your PR. It seems to be perfect :)

A simple question : is tolerated to use the column path for GET and let repositoy_content_path for LIST ?

Thank you so much

ParthaI commented 1 month ago

Hi @aminvielledebatAtBedrock,

A simple question : is tolerated to use the column path for GET and let repositoy_content_path for LIST ?

I think so, as per the code changes in this PR, the updates align with our other plugin development standards.

If you have any further requirements, please let us know, such as needing selective files at a time.


We are moving the conversation here: The implementation does not look correct since the two columns mentioned in the comment below are not returned by the GetConfig.

Regarding the Steampipe plugin standards:

Need suggestions:

  1. Do you have any GraphQL query suggestions to get all the column values provided by the list, even though we updated the GraphQL query here?
  2. Keeping the suggestions in mind:

I did not find a relevant way to present the column values for IsGenerated and Mode with the suggestions provided or with the implementation made in the above-mentioned PR.

@misraved and I discussed this, and it does not seem optimal according to our table development standards that the List API call provides more details than the Get API call.

Do you have any suggestions (specifically a GitHub GraphQL query) that will return all the column values we have today?

Any suggestions would be highly appreciated!

CC @graza-io, @misraved

Thank you!