vitessio / vitess

Vitess is a database clustering system for horizontal scaling of MySQL.
http://vitess.io
Apache License 2.0
18.23k stars 2.07k forks source link

Extend OLAP workloads to bypass OLTP query timeouts for DML queries #11180

Open maxenglander opened 1 year ago

maxenglander commented 1 year ago

Description

Unlike SELECT queries, DML queries timeout the same way in OLTP and OLAP workloads. It would be great to have a way to bypass the OLTP timeout for DML queries without having to raise the OLTP query timeout.

One way to do this would be to extend OLAP workloads to bypass the OLTP query timeout for DML queries the same way that OLAP workloads currently bypass the OLTP query timeout for SELECT queries.

Details

Both of these OLTP queries timeout after the 30s OLTP query timeout elapses (as expected).

mysql> select sleep(35);
ERROR 1317 (70100): target: src.-.primary: vttablet: rpc error: code = Canceled desc = (errno 2013) due to context deadline exceeded, elapsed time: 30.000126889s, killing query ID 70 (CallerID: maxenglander)
mysql>
mysql> update data set text1 = text2 where sleep(35) = 0 limit 1;
ERROR 1317 (70100): target: src.-.primary: vttablet: rpc error: code = Canceled desc = (errno 2013) due to context deadline exceeded, elapsed time: 29.999351305s, killing query ID 116 (CallerID: maxenglander)

In OLAP, the SELECT query does not timeout (as expected), but the UPDATE query does timeout.

mysql> set workload=olap;
Query OK, 0 rows affected (0.00 sec)

mysql> select sleep(35);
+-----------+
| sleep(35) |
+-----------+
|         0 |
+-----------+
1 row in set (35.00 sec)

mysql> update data set text1 = text2 where sleep(35) = 0 limit 1;
ERROR 1317 (70100): target: src.-.primary: vttablet: rpc error: code = Canceled desc = (errno 2013) due to context deadline exceeded, elapsed time: 29.99971268s, killing query ID 223 (CallerID: maxenglander)

Whether or not it is expected for the UPDATE query to timeout in OLAP workloads is a question that would be great to answer in this issue.

Use cases

There are PlanetScale/Vitess users with long-running DML queries that would like to be able to bypass the OLTP timeout for those queries, but who would still like to leave the OLTP timeout at a low/default value.

deepthi commented 1 year ago

@maxenglander the Issue title and description are contradictory. I assume you mean "OLAP workloads to bypass OLTP query timeouts ..."

@vitessio/query-serving WDYT? Should we create a new flag for OLAP query timeout and use that for all (SELECT/UPDATE/INSERT)? Or just ignore the timeout like we do for SELECT?

maxenglander commented 1 year ago

@deepthi oops fixed thanks for catching.

systay commented 1 year ago

@harshit-gangal ☝️

harshit-gangal commented 1 year ago

We will have to check the workload on all the tabletserver API and set the timeout accordingly

deepthi commented 1 year ago

We will have to check the workload on all the tabletserver API and set the timeout accordingly

Didn't answer my question though.. Do we ignore the OLTP timeout or add a new flag for OLAP timeout?

harshit-gangal commented 1 year ago

OLAP is without a timeout, what I mean is that if workload = olap, set the timeout to zero, or no need to set timeout basically.

harshit-gangal commented 1 year ago

OLAP already has a transaction timeout today, Separate from OLTP transaction timeout. That should not be ignored, essentially DML should be executed within a timebound manner to avoid rows lock time.

deepthi commented 1 year ago

Documenting that the existing flag that we should be respecting for DML in OLAP mode is queryserver-config-olap-transaction-timeout.