turbot / steampipe-plugin-github

Use SQL to instantly query repositories, users, gists and more from GitHub. Open source CLI. No DB required.
https://hub.steampipe.io/plugins/turbot/github
Apache License 2.0
72 stars 28 forks source link

Table github_workflow: failed to populate column 'pipeline': EOF (SQLSTATE HV000) #403

Closed gabrielsoltz closed 6 months ago

gabrielsoltz commented 7 months ago

Hi @ParthaI

After the last release, now there is a new issue. It's related to changes made in the last release; sorry that I missed this when testing your changes. The query I was doing was probably not fetching many repositories to match the following condition.

Describe the bug

select * from github_workflow where repository_full_name in (select name_with_owner from github_my_repository)

Error: github: failed to populate column 'pipeline': EOF (SQLSTATE HV000)
ParthaI commented 6 months ago

Hello @gabrielsoltz, I'm sorry to hear you're experiencing problems.

I've been attempting to reproduce the issue you're facing. Despite several attempts, the query consistently returns the expected results.

> select count(*) from github_workflow where repository_full_name in (select name_with_owner from github_my_repository)
+-------+
| count |
+-------+
| 893   |
+-------+

Time: 175.6s. Rows fetched: 2. Hydrate calls: 2.

Could you please provide the contents of the log file from ~/.steampipe/logs/plugin-2024-*.log for the plugin, along with the content of the workflow file that's causing the issue? Make sure to omit any sensitive information before sharing. This information will greatly assist us in further investigating the problem.

Alternatively, could you please verify whether any of your workflow files might have indentation issues?

I can remember, during my tests with the previous PR, I executed the query (select * from github_workflow where repository_full_name in (select name_with_owner from github_my_repository)) across over 70+ repositories and 500+ workflow files without encountering any issues. The results were successful and I've included the query outcome in the PR for reference.

Thanks!

gabrielsoltz commented 6 months ago

HI @ParthaI thank you for verifying this, did you check your previous query using select * instead of select count(*)? This issue only happens when querying one of the content columns (workflow_content, workflow_content_json, and pipeline)... So I think that if you use count(*) it doesn't happen, you need to specify one of those columns in your query.

Here are the logs:

2024-02-16 15:20:55.327 UTC [ERROR] steampipe-plugin-github.plugin: [ERROR] 170809685441: github_workflow.decodeFileContentToPipeline: Pipeline conversion error=EOF
2024-02-16 15:20:55.327 UTC [ERROR] steampipe-plugin-github.plugin: [ERROR] 170809685441: transform decodeFileContentToPipeline returned error EOF
2024-02-16 15:20:55.327 UTC [ERROR] steampipe-plugin-github.plugin: [ERROR] 170809685441: failed to populate column 'pipeline': EOF
2024-02-16 15:20:55.327 UTC [WARN]  steampipe-plugin-github.plugin: [WARN]  170809685441: QueryData StreamError failed to populate column 'pipeline': EOF (github-170809685441)
2024-02-16 15:20:55.327 UTC [WARN]  steampipe-plugin-github.plugin: [WARN]  170809685441: streamRows execution has failed: github-170809685441 - calling queryCache.AbortSet (github: failed to populate column 'pipeline': EOF)
2024-02-16 15:20:55.328 UTC [WARN]  steampipe-plugin-github.plugin: [WARN]  170809685441: QueryCache AbortSet - aborting request  with error github: failed to populate column 'pipeline': EOF (1 subscriber) (github-170809685441)
2024-02-16 15:20:55.328 UTC [WARN]  steampipe-plugin-github.plugin: [WARN]  170809685441: queryData.streamRows returned error: github: failed to populate column 'pipeline': EOF
2024-02-16 15:20:55.328 UTC [WARN]  steampipe-plugin-github.plugin: [WARN]  170809685441: executeForConnection github returned error github: failed to populate column 'pipeline': EOF, writing to CHAN
2024-02-16 15:20:55.328 UTC [WARN]  steampipe-plugin-github.plugin: [WARN]  170809685441: error channel received github: failed to populate column 'pipeline': EOF
2024-02-16 15:20:55.337 UTC [WARN]  steampipe-plugin-github.plugin: [WARN]  170809685441: readAndStreamAsync failed to read previous rows from cache: github: failed to populate column 'pipeline': EOF publisher github-170809685441 (github-170809685441)
gabrielsoltz commented 6 months ago

Hi @ParthaI, I was able to identify the repository, triggering the error using the following script:

for i in $(steampipe query "select name_with_owner from github_my_repository" --output csv); do echo "Repo: $i" && steampipe query "select name, path, state, workflow_file_content from github_workflow where repository_full_name = '$i';"; done

You were right; there is one repository with an invalid YAML file, which is generating this problem.

I'll fix this on my side. As an idea, we can improve the error message when this happens.

I appreciate your help!

ParthaI commented 6 months ago

@gabrielsoltz, I hope the issue has been resolved, as per our previous discussion. Hence, closing the issue for now, please feel free to reopen it if you encounter any further issues.

Thanks!