ci(sqlsmith): create issue automatically on failure

risingwavelabs / risingwave

Best-in-class stream processing, analytics, and management. Perform continuous analytics, or build event-driven applications, real-time ETL pipelines, and feature stores in minutes. Unified streaming and batch. PostgreSQL compatible.

https://go.risingwave.com/slack

Apache License 2.0

6.95k stars 574 forks source link

ci(sqlsmith): create issue automatically on failure #11074

Open xxchan opened 1 year ago

xxchan commented 1 year ago

Is your feature request related to a problem? Please describe.

Failure of fuzz tests requires additional attention overheads of developers (1 for check what's the problem, 1 for rerun CI) who may not care much about it.

Developers may simply ignore failure and rerun.

Describe the solution you'd like

Something like this. It can report buildkite URL and/or logs.

And maybe we can ~let it rerun automatically~ mark it as optional, and let CI pass even it failed.

Describe alternatives you've considered

No response

Additional context

Maybe also deterministic simulation?

kwannoel commented 1 year ago

And maybe we can mark it as optional, and let CI pass even it failed. Good suggestion.

I think I will split sqlsmith into a separate workflow, so it can still show-up if it fails, but don't require it to pass, similar to main-cron workflow in PR.

Soft-failing will just show it as succeeding on github, so it's not a good approach.

In terms of creating issues, I'm thinking it might create duplicated issues, since some error messages have parameters. What can help instead is to have better error messages. I think sqlsmith logs can be parsed and further refined to make it easier to investigate / just copy-paste to create a new issue. That requires improving error logs for 2 components:

[ ] Error logs for frontend
[ ] Error logs for random fuzz test

xxchan commented 1 year ago

I don’t think duplicate issues are a large problem. I actually wanted to propose decoupling random test failure with merging a PR. The error should be reported to other places instead of to the developer of a PR. 🤔️

On Wed, 19 Jul 2023 at 16:58, Noel Kwan @.***> wrote:

And maybe we can mark it as optional, and let CI pass even it failed. Good suggestion.

I think I will split sqlsmith into a separate workflow, so it can still show-up if it fails, but don't require it to pass, similar to main-cron workflow in PR.

Soft-failing will just show it as succeeding on github, so it's not a good approach.

In terms of creating issues, I'm thinking it might create duplicated issues, since some error messages have parameters. What can help instead is to have better error messages. I think sqlsmith logs can be parsed and further refined to make it easier to investigate. That requires improving error logs for 2 components:

Error logs for frontend

Error logs for random fuzz test

— Reply to this email directly, view it on GitHub https://github.com/risingwavelabs/risingwave/issues/11074#issuecomment-1642251907, or unsubscribe https://github.com/notifications/unsubscribe-auth/AJBQZNLFOTJR5YO5BX4D6VTXQ7YZTANCNFSM6AAAAAA2P3CPSA . You are receiving this because you authored the thread.Message ID: @.***>

kwannoel commented 1 year ago

I don’t think duplicate issues are a large problem. I actually wanted to propose decoupling random test failure with merging a PR. The error should be reported to other places instead of to the developer of a PR. 🤔️

IMO sqlsmith may catch some errors from PRs still, esp those doing refactoring / performance tweaks.

The random test failure causing blockers is the biggest issue I guess. I have split it into a separate workflow, which won't be required to merge, but still provides an indication if it failed.

xxchan commented 1 year ago

The random test failure causing blockers is the biggest issue I guess.

Indeed. And also probably the failure is blindly ignored without reporting an issue (especially if the failure rate is high). And that's why I suggested automatic report.

Only run manually (with label) is another solution. BTW, I guess it's unusual practice to run fuzz tests on PRs. Maybe the more common practice is like running every day and many more rounds?

kwannoel commented 1 year ago

Indeed. And also probably the failure is blindly ignored without reporting an issue (especially if the failure rate is high). And that's why I suggested automatic report.

+1 also because error reporting and extracting the failure reason from buildkite logs is a chore.

Only run manually (with label) is another solution. BTW, I guess it's unusual practice to run fuzz tests on PRs.

My concern was that if we run it in main-cron, we lose the context. Instead of looking at the PR where it failed, we have to look through the PRs in between main-cron runs. But after thinking about it further, I think even if fuzzing fails in a PR, it can often be due to unrelated errors due to the random nature of fuzzing. So we still have to triage other PRs regardless.

Maybe the more common practice is like running every day and many more rounds?

Agree. We can run it in main-cron instead. We won't have that many duplicate issues, not too difficult to triage them if any.

[ ] Increase fuzzing count in main-cron.
[ ] Automatically create issues for failing fuzzing test runs.
- Issue should contain:
- Failing query
- Failing logs
- Buildkite link
- RW_COMMIT
[ ] Remove sqlsmith-tests workflow.

stdrc commented 1 year ago

IMO sqlsmith may catch some errors from PRs still, esp those doing refactoring / performance tweaks.

+1 for this.

BTW, I guess it's unusual practice to run fuzz tests on PRs. Maybe the more common practice is like running every day and many more rounds?

I guess part of the problem is that our unittest, e2e and other hand-made tests are not so complete, still too many corner cases not covered. So for now random testing for each PR can be complementary for such situation.

What about still running sqlsmith for each PR and requiring manually check and rerun, and at the same time, reporting all errors to a Slack channel? If the PR author think it's not related to the PR changes, then he/she can just ignore and rerun it, I don't think this bothers too much.

xxchan commented 1 year ago

BTW, I guess it's unusual practice to run fuzz tests on PRs. Maybe the more common practice is like running every day and many more rounds?

I guess part of the problem is that our unittest, e2e and other hand-made tests are not so complete, still too many corner cases not covered. So for now random testing for each PR can be complementary for such situation.

I think the problem is its efficiency. IMO the correlation between fuzz error and bug in the PR is very low, due to the nature of fuzzing. (It might be a larger problem if it's high...?)

(To make it effective, maybe some advanced techniques like coverage-guided fuzzing are needed)

the problem is that our unittest, e2e and other hand-made tests are not so complete

Fuzzing cannot solve this problem. Especially for new features/ bugfixes. The better approach might be that, when reviewing PRs, if we feel it's not properly covered by any tests, the reviewer should request the author to test more edges.

github-actions[bot] commented 1 year ago

This issue has been open for 60 days with no activity. Could you please update the status? Feel free to continue discussion or close as not planned.