trinodb / benchto

Framework for running macro benchmarks in a clustered environment
Apache License 2.0
28 stars 30 forks source link

Remove white spaces for query validation. #36

Closed przemekak closed 2 years ago

przemekak commented 2 years ago

This will remove white spaces from result set values read for comparison.

There might be cases when given connector we want to benchmark doesn't support data type from official benchmark specification (e.g. TPCH/DS) then some casting might be required, which can introduce or remove white spaces.

In my case this is needed for benchmarking Iceberg connector, which doesn't support Char(n) type. So I was force to cast, e.g.: TRIM(CAST(web_company_name AS varchar(50))) And the TRIM is necessary because any query that use such column in comparison like: WHERE web_company_name = 'abc' will fail to find any data, as above cast would produce values appended by spaces (if used without trim). Originally I was going with just trim, but unfortunately there might be cases of using Concat in queries which produce extra white spaces in the middle of column values.

Alternative to this PR would be to rewrite all queries to use Trim on output string columns.

przemekak commented 2 years ago

Closing this one as we decided to keep two separate sets of validation data.