memsql / dbbench

Database Benchmark Tool
Apache License 2.0
155 stars 30 forks source link

Support queries which returns NULL value #6

Closed fjammes closed 6 years ago

fjammes commented 7 years ago

Queries which returns NULL values crash dbbench with error message below, when query-results-file is specified in configuration file :

dbbench error message:

2017/07/25 21:08:22 error for query SELECT MIN(ra), MAX(ra) FROM Object WHERE z_apFlux BETWEEN 1 and 2 in shared_scan_2_minmax_object_2: sql: Scan error on column index 0: unsupported Scan, storing driver.Value type <nil> into type *string

dbbench configuration file:

fjammes@ccosvms0070:~/src/qserv_testscale/S15/tests/dbbench (tickets/DM-10041 *$%=)$ cat queries/4_shared_scan_2.ini
# Five scans on Object. Startup is staggered."

[shared_scan_2_minmax_object_1]
query=SELECT MIN(ra), MAX(ra) FROM Object WHERE decl > 3
query-results-file=out/4_shared_scan_2_minmax_object_1.csv
count=1

[shared_scan_2_minmax_object_1_1]
query=SELECT MIN(ra), MAX(ra) FROM Object WHERE decl > 3
query-results-file=out/4_shared_scan_2_minmax_object_1_1.csv
start=5s
count=1

[shared_scan_2_count_object_1]
query=SELECT COUNT(*) FROM Object WHERE y_instFlux > u_instFlux 
query-results-file=out/4_shared_scan_2_count_object_1.csv
start=10s
count=1

[shared_scan_2_count_object_2]
query=SELECT COUNT(*) FROM Object WHERE y_instFlux > 5
query-results-file=out/4_shared_scan_2_count_object_2.csv
start=15s
count=1

[shared_scan_2_minmax_object_2]
query=SELECT MIN(ra), MAX(ra) FROM Object WHERE z_apFlux BETWEEN 1 and 2
query-results-file=out/4_shared_scan_2_minmax_object_2.csv
start=20s
count=1

Crash occurs in function outputRows() of memsql/dbbench/sql_database.go

Would it be possible to support this kind of queries please?

PeterFaiman commented 7 years ago

The column values are regular strings. Changing to sql.NullString would be easy, but what could be output for NULL values? Empty string and the string "NULL" are legitimate string values.

@fjammes you can work around by using coalesce in your queries, for example:

SELECT coalesce(MIN(ra), 0),
       coalesce(MAX(ra), 0)
FROM Object
WHERE z_apFlux BETWEEN 1 and 2

Perhaps we should detect the conversion error to nil and suggest using coalesce?

fjammes commented 7 years ago

Hi,

Thanks so much for your answer :-)

However, this solution does not look good to me. Indeed, I want to benchmark/test the original query, and not the COALESCE one. Indeed the COALESCE function is an additional feature on my database system (i.e. qserv), and involve different code, can have different performance and should have its related test.

Couldn't you output "NULL", as does standard MySQL client?

Regards,

fjammes commented 7 years ago

Furthermore, Describe table queries also crashes, and I think it is also because of the NULL stuff:

2017/07/26 21:54:48 error for query Describe Object in describe: sql: Scan error on column index 4: unsupported Scan, storing driver.Value type <nil> into type *string

We cannot use COALESCE here, isn't it?

fjammes commented 7 years ago

Furthermore, COALESCE cannot be used in SELECT * queries...

PeterFaiman commented 7 years ago

You do not need a query-result-file to benchmark the query. In fact, you want to avoid the overhead of writing to a file while benchmarking. The NULL issue only affects writing the column values to a CSV file, dbbench otherwise has no problem with them. You should be fine during a regular benchmark with no query-result-file.

Furthermore, if we coalesce NULL to "NULL" inside of dbbench, it's the exact same work being done. Probably the database is even faster at it. If you are outputting to a file, the performance of COALESCE does not matter because you have already added the extra overhead of writing the CSV file.

You are correct, you cannot use COALESCE on describe table. But I don't understand why you would benchmark describe table. If you are actually trying to benchmark it, don't use a query-result-file.

Same with SELECT *, no result file, no NULL problem.

fjammes commented 7 years ago

ok, looks reasonable. However result file allow me to double check Qserv results are correct and track that no regression/bug has been introduced, alongside performance monitoring. I suppose there is no other way than result file to dump query output?

Regards,