risinglightdb / sqllogictest-rs

Sqllogictest (dialect with extensions) parser and runner in Rust.
Apache License 2.0
172 stars 45 forks source link

fix: trim string before rowsort #137

Closed wangrunji0408 closed 1 year ago

wangrunji0408 commented 1 year ago

Signed-off-by: Runji Wang wangrunji0408@163.com

If the engine returns a value with leading spaces, it will affect the result of rowsort, but won't be recognized in error message since we trim the string before reporting error but after the rowsort. This PR trims the string before rowsort to avoid this problem. Despite this, engines should not return a value with leading or tailing space.

xxchan commented 1 year ago

Hmmm I think this is wrong. First spaces should be respected. Trimming spaces affects correctness.

As for the problem you mentioned, I didn't get it. Can you give an example?

wangrunji0408 commented 1 year ago

Suppose the engine somehow gives a result: [["1"], [" 2"]] (casted from INT). The rows being compared after rowsort would be [[" 2", "1"]]. However, the error message would look like this:

 (-expected|+actual)
+ 2
  1
- 2

I know it looks weird, but it's possible. Any better solution for this problem? 🤔

First spaces should be respected. Trimming spaces affects correctness.

I'm afraid string values with leading or tailing spaces are not even supported for now. They are not representable in the script.

xxchan commented 1 year ago

I'm afraid string values with leading or tailing spaces are not even supported for now. They are not representable in the script.

They are actually supported. Note that validator is fn(actual: &[Vec<String>], expected: &[String]). expected is the whole line AS-IS. So if we use an agreed separator, e.g., \t, and split by \t in a custom validator, we can compare the spaces. (This is duckdb's approach) 😇 related issue https://github.com/risinglightdb/sqllogictest-rs/issues/109

xxchan commented 1 year ago

casted from INT

What does the "cast" mean? 🤔

And is your example a real case? IIRC the error message should contain original rows instead of normalized rows.

wangrunji0408 commented 1 year ago

What does the "cast" mean? 🤔

The value type is INT, which is then converted to string.

And is your example a real case?

The case was found by @TennyZhuang when developing the JDBC driver. You can find the original error message here.

BTW, @TennyZhuang Does this change fix your problem? 👀

xxchan commented 1 year ago

That failure is because of this:

image

There's only one row, so sorting does nothing for actual results...

xxchan commented 1 year ago

Let me fix it