vitessio / vitess

Vitess is a database clustering system for horizontal scaling of MySQL.
http://vitess.io
Apache License 2.0
18.43k stars 2.09k forks source link

VTTablet needs to be in sync with MySQL's timeouts #3592

Open sougou opened 6 years ago

sougou commented 6 years ago

MySQL has two timeouts that are relevant to vttablet: wait_timeout and net_write_timeout. On the vttablet side, the idle timeout corresponds to the wait_timeout, and there is none for net_write_timeout.

There are a few options:

  1. Allow these timeouts to be specified for vttablet on the command line. VTTablet will sanity check these values against MySQL and print warnings if they're mismatched.
  2. Have VTTablet read these timeouts from MySQL and set its own to be slightly more conservative. The downside of this approach is that some of these could be enforced by other external tools. So, what VTTablet reads may not be canonical.
  3. Have VTTablet set these timeouts inside MySQL based on the input parameters. The downside of this approach is that it may not have the privileges for it.
bbeaudreault commented 6 years ago

I personally like option 3. We are already in a case where vttablet needs to have a high level of privilege to be able to do anything. Perhaps it can gracefully degrade for those folks who want to run it with less permissions.

One thing to note here is that these are both able to be set as session variables as well. I think it's important to support that to some degree, if possible. Because at HubSpot the people setting these timeouts have no access to the underlying processes. They would set it from their app, possibly on a per-app basis.

sjmudd commented 6 years ago
  1. warnings are easy to miss and you need to configure both mysql and vttablet consistently.
  2. might be problematic for long running queries (OLAP vs OLTP)
  3. feels better. you should always have control of session parameters even if you can't change the global one, and collecting and adjusting the session setting doesn't need to be done immediately: it could be done asynchronously. If values differ (significantly?) extra warnings might be useful as this may not be intentional. 3. also works better if you're putting vttablet on top of existing infrastructure.
demmer commented 6 years ago

It seems to me that for vttablet pooled connections, there's no good reason for mysql to close the connection for any reason, so 3 makes the most sense to me assuming that what we would end up doing is setting the mysql settings for idle timeout to something super high to effectively stop it from kicking in.

As Simon suggested we could do this asynchronously from the idle connection sweeper if that makes more sense.

sjmudd commented 6 years ago

I get the impression that something like doing a mysql ping on an idle connection would keep it active (I think that works) which would avoid unexpected connections dropping. It doesn't look like the underlying issue of the client and server not agreeing on timeout values will be fixed by the upstream protocol so we just need to make sure that we avoid this happening. If the connection is idle sending out periodic "idle ping" should be harmless.

sjmudd commented 6 years ago

A further thought as this came up again today. If I do this on my mysql prompt I get:

root@127.0.0.1 [(none)]> show global variables like '%timeout%';
+-----------------------------------+----------+
| Variable_name                     | Value    |
+-----------------------------------+----------+
| connect_timeout                   | 10       |
...
| interactive_timeout               | 28800    |
...
| net_read_timeout                  | 30       |
| net_write_timeout                 | 60       |
...
| wait_timeout                      | 28800    |
+-----------------------------------+----------+
20 rows in set (0.00 sec)

You'll get something back from vitess if you send a similar query. I wonder if extra vitess variables should be shown or the actual values used should be those used by the vtgate process as in theory knowing what the backend mysqld that you may end up talking to may be hard to do especially if the query gets routed to a replica or rdonly tablet as there may be many of them.

I can't remember now but think there's some sort of "interactive flag" passed in the connection. Does Vitess support/recognise this? (maybe not relevant).

For the other settings indicating the session/global values might be interesting by providing those used by vtgate (also does vtgate allow me to change these and respect my session level settings?)

Just a thought.

techwolf359 commented 4 years ago

Ran across this issue today. Adjusted our wait_timeout up from 6 to 60 to match the Vitess idle timeout and saw an immediate reduction in connections/sec. I would also support option 3 in this scenario, set a session variable for wait_timeout to a configured value (maybe >= the Vitess idle timeout?).

aquarapid commented 3 years ago

I should note that a reading of the code (and some experimentation) reveals that simply having the vttablet idle timeouts < MySQL wait_timeout is not sufficient. The issue is that the current implementation scans the connection pools for idle connections to evict every idle_timeout/10 seconds. As a result, if the vttablet idle timeouts are near or above 90% of the MySQL wait_timeout timeout, you are still likely to see (some) errors.