SQL Smith: Measure Coverage

jon-chuang commented 1 year ago

Is your feature request related to a problem? Please describe.

I noticed that the plans generated often evaluate to very idiosyncratic queries. Hence, I am worried about the actual coverage. I believe the coverage for frontend/optimizer side may be ok, but am more worried about the e2e testing (batch and stream part).

As an anecdote, a bunch of errors were caught in the frontend due to const eval but were not caught in the batch/stream executors.

Describe the solution you'd like

I am suggested several ideas to get a better understanding of the e2e coverage:

[ ] Test how many columns/table are in the underlying tables referenced in the generated SQL v.s. columns/tables in the final optimized query plan's output.
- If a column does not appear in the final query or in a condition, it is very likely to be pruned away, so we do not get coverage.
- This metric should take into account the fact that certain columns may reference columns via input ref. As long as these input refs can be traced to an underlying table/MV, that table/MV's column should be counted as being referenced.
- If a column is derived from aggregating InputRef, then it should be considered referenced.
[ ] fraction of expressions that reference input variables after they have been optimized in frontend.
- Another issue is that with enough expressions, a condition always ends up being either false or true. Hence I would suggest not putting too many literals in expressions. Const exprs are not very interesting compared to expressions that need to be evaluated in our executors with row data as their input.

Describe alternatives you've considered

No response

Additional context

No response

xxchan commented 1 year ago

We want to implement coverage-guided fuzzing? 😲

jon-chuang commented 1 year ago

@xxchan not sure. The first step is to measure how good our coverage is before determining if we need to improve the fuzz test generation.

The automatic improvement (coverage-guided) part is optional, tuning it manually is also an option @kwannoel

The measurement part is definitely the first step to implement coverage-guided fuzzing, and we can define more useful metrics as we go along, especially if it helps us catch actual bugs in the e2e setting as a result.

kwannoel commented 1 year ago

A simple thing to do is enable codecov: https://about.codecov.io/. We currently already use it in our ci pipeline. We can upload a separate set of reports for sqlsmith codecov (deterministic test).

neverchanje commented 1 year ago

How do we measure the coverage in general now? We used to have similar discussions and the final conclusion was that the coverage is hard to be calculated if taking e2e tests into account. E2e is not covered by general Rust coverage tools.

kwannoel commented 1 year ago

How do we measure the coverage in general now? We used to have similar discussions and the final conclusion was that the coverage is hard to be calculated if taking e2e tests into account. E2e is not covered by general Rust coverage tools.

AFAIK, for unit tests we use codecov for reprots and llvm-cov via nextest: https://nexte.st/book/test-coverage.html. E2E test coverage might be possible, if we run tests with all-in-one binary / deterministic-simulation.

I don't think there are currently other things measuring coverage apart from unit test.

jon-chuang commented 1 year ago

E2e is not covered by general Rust coverage tools.

I think we can still get coverage via deterministic test. Secondly, we can use LLVM to instrument the deterministic test all in one binary to get a better coverage.

xxchan commented 1 year ago

the final conclusion was that the coverage is hard to be calculated

https://github.com/risingwavelabs/risingwave/issues/32 I didn't find any conclusion. It seems that there are just no follow-ups. 🤪

fuyufjh commented 1 year ago

Shall we close this?

kwannoel commented 1 year ago

Still needed eventually I think. Can just remove from milestone but leave open to track.

risingwavelabs / risingwave