vitessio / vitess

Vitess is a database clustering system for horizontal scaling of MySQL.
http://vitess.io
Apache License 2.0
18.65k stars 2.1k forks source link

Bug Report: VStream does not switch tablets when lag exceeds discovery_low_replication_lag #15754

Closed twthorn closed 6 months ago

twthorn commented 6 months ago

Overview of the Issue

VStreams are not terminated/restarted to pick a new tablet when the lag of the tablet being used grows too large (higher than discovery_low_replication_lag). This can lead to high lag for the client and degrade performance.

When the lag grows high enough, and exceeds discovery_high_replication_lag_minimum_serving then the tablet stops serving and the vstream doesn't send any data. Then once it goes below that threshold, it will start sending data again that will be out of date/high lag.

So the fix may be two parts:

  1. Pick a new tablet if lag goes above above discovery_low_replication_lag
  2. Pick a new tablet if lag goes above discovery_high_replication_lag_minimum_serving

See logs - we can see the tablet stop serving, but there are no logs about a new vstream being created to rerun the tablet picker logic and get a healthy tablet.

Reproduction Steps

  1. Set up a VStream client, vtgate picks tablet A for the vstream
  2. Get tablet A to have high replication lag (e.g., trigger a backup) (lag should be higher than discovery_low_replication_lag)
  3. Observe the events streamed by the VStream client will be old/high lag, and that the VTGates doesn't switch to another tablet with low lag (lower than discovery_low_replication_lag)
  4. Increase lag further above high replication lag
  5. Observe no events are received (a new healthy tablet is not reselected)

Binary Version

v14 with some backports
https://github.com/slackhq/vitess/commits/slack-vitess-r14.0.5

vtgate config:
/vt/v14/bin/vtgate --topo_implementation consul --topo_global_server_address [...] --topo_global_root vitess/global --topo_consul_max_conns_per_host 250 --consul_auth_static_file [...] --cell us_east_1b --cells_to_watch us_east_1b,us_east_1c,us_east_1d,us_east_1e --discovery_low_replication_lag 30s --discovery_high_replication_lag_minimum_serving 1h --legacy_replication_lag_algorithm=false --min_number_serving_vttablets 2 --grpc_port 15999 --grpc_max_message_size 31457280 --max_memory_rows 1000000 --warn_memory_rows 300000 --mysql_server_port 15306 --mysql_server_socket_path /var/run/mysqld/mysqld.sock --mysql_auth_server_static_file [...] --normalize_queries --enable_buffer --buffer_size 10000 --transaction_mode MULTI --tablet_refresh_interval 120s --tablet_refresh_known_tablets=false --keyspaces_to_watch aux1,byuser,msgdata,mainuser,mainteam,vifl --topo_read_concurrency 1 --port 15000 --tablet_types_to_wait PRIMARY,REPLICA --gateway_initial_tablet_timeout 120s --gateway_route_replica_to_rdonly --service_map grpc-vtgateservice --logtostderr --srv_topo_cache_ttl 8760h --srv_topo_cache_refresh 5s --grpc_server_initial_window_size 1073741824 --grpc_server_initial_conn_window_size 1073741824 --grpc_keepalive_time 30s --grpc_keepalive_timeout 30s --mysql_server_read_timeout 60s --mysql_server_write_timeout 60s --mysql_server_query_timeout 60s --mysql-server-pool-conn-read-buffers --sql-max-length-errors 1024 --sql-max-length-ui 1024 --healthcheck_retry_delay 30s --querylog-format text --log_queries_to_file /mnt/murronbridge/murronbridge.pipe --querylog-filter-tag VTQUERYLOG=1 --tablet_url_template https://{{.GetHostNameLevel 0}}.tinyspeck.com --v 0 --grpc_auth_static_client_creds [...] --grpc_auth_static_password_file [...] --grpc_auth_mode static --normalize_queries --vtgate-config-terse-errors --redact-debug-ui-queries --structured-logging --schema_change_signal=false

Operating System and Environment details

PRETTY_NAME="Ubuntu 22.04.4 LTS"
NAME="Ubuntu"
VERSION_ID="22.04"
VERSION="22.04.4 LTS (Jammy Jellyfish)"
VERSION_CODENAME=jammy
ID=ubuntu
ID_LIKE=debian
HOME_URL="https://www.ubuntu.com/"
SUPPORT_URL="https://help.ubuntu.com/"
BUG_REPORT_URL="https://bugs.launchpad.net/ubuntu/"
PRIVACY_POLICY_URL="https://www.ubuntu.com/legal/terms-and-policies/privacy-policy"
UBUNTU_CODENAME=jammy

Linux 6.5.0-1017-aws

x86_64

Log Fragments

{
  "level": "info",
  "ts": 1713321308.8583724,
  "caller": "logutil/vts_logger.go:44",
  "msg": "HealthCheckUpdate(Serving State): tablet: us_east_1b-167967559 (tablet-iad-prod-msgdata-ef-f0-zrn9 az_id=use1-az4 az=us-east-1b cell=us_east_1b os=22.04 mysqld_version=8.0) serving true => false for msgdata/ef-f0 (REPLICA) reason: healthCheck update"
}
{
  "level": "info",
  "ts": 1713322625.4965923,
  "caller": "logutil/vts_logger.go:44",
  "msg": "HealthCheckUpdate(Serving State): tablet: us_east_1b-167967559 (tablet-iad-prod-msgdata-ef-f0-zrn9 az_id=use1-az4 az=us-east-1b cell=us_east_1b os=22.04 mysqld_version=8.0) serving false => true for msgdata/ef-f0 (REPLICA) reason: healthCheck update"
}
mattlord commented 6 months ago

I'm able to repeat the problem this way using the local examples on main:

git checkout main && make build

pushd examples/local

# Apply this diff
diff --git a/examples/common/scripts/vttablet-up.sh b/examples/common/scripts/vttablet-up.sh
index daa40aee89..b572018188 100755
--- a/examples/common/scripts/vttablet-up.sh
+++ b/examples/common/scripts/vttablet-up.sh
@@ -54,6 +54,7 @@ vttablet \
  --service_map 'grpc-queryservice,grpc-tabletmanager,grpc-updatestream' \
  --pid_file $VTDATAROOT/$tablet_dir/vttablet.pid \
  --heartbeat_on_demand_duration=5s \
+ --enable_replication_reporter \
  > $VTDATAROOT/$tablet_dir/vttablet.out 2>&1 &

 # Block waiting for the tablet to be listening
diff --git a/examples/local/101_initial_cluster.sh b/examples/local/101_initial_cluster.sh
index 95b51f168c..d0a1f3a507 100755
--- a/examples/local/101_initial_cluster.sh
+++ b/examples/local/101_initial_cluster.sh
@@ -48,7 +48,7 @@ else
    # correct durability policy. Please see the comment above for
    # more context on using a custom sidecar database name in your
    # Vitess clusters.
-   vtctldclient CreateKeyspace --sidecar-db-name="${SIDECAR_DB_NAME}" --durability-policy=semi_sync commerce || fail "Failed to create and configure the commerce keyspace"
+   vtctldclient CreateKeyspace --sidecar-db-name="${SIDECAR_DB_NAME}" --durability-policy=none commerce || fail "Failed to create and configure the commerce keyspace"
 fi

 # start mysqlctls for keyspace commerce
diff --git a/examples/local/vstream_client.go b/examples/local/vstream_client.go
index 98d2129f89..30a611acb4 100644
--- a/examples/local/vstream_client.go
+++ b/examples/local/vstream_client.go
@@ -38,7 +38,7 @@ import (
 */
 func main() {
    ctx := context.Background()
-   streamCustomer := true
+   streamCustomer := false
    var vgtid *binlogdatapb.VGtid
    if streamCustomer {
        vgtid = &binlogdatapb.VGtid{
@@ -76,7 +76,7 @@ func main() {
        //MinimizeSkew:      false,
        //HeartbeatInterval: 60, //seconds
    }
-   reader, err := conn.VStream(ctx, topodatapb.TabletType_PRIMARY, vgtid, filter, flags)
+   reader, err := conn.VStream(ctx, topodatapb.TabletType_REPLICA, vgtid, filter, flags)
    for {
        e, err := reader.Recv()
        switch err {

# Set up the cluster
./101_initial_cluster.sh; mysql < ../common/insert_commerce_data.sql

# Stop vtorc so that it doesn't repair replication
../common/scripts/vtorc-down.sh

# Start the vstream
go run vstream_client.go

# In another Terminal window
mysql -e "insert into customer values (100, 'mlord1@planetscale.com')"

# Stop replication on the 1 replica tablet
replicauid=$(vtctldclient GetTablets --keyspace commerce --tablet-type replica | awk '{print $1}' | cut -d- -f2 | bc)
command mysql -u root --socket=${VTDATAROOT}/vt_0000000${replicauid}/mysql.sock --binary-as-hex=false vt_commerce -e "stop replica"
command mysql -u root --socket=${VTDATAROOT}/vt_0000000${replicauid}/mysql.sock --binary-as-hex=false vt_commerce -e "show replica status\G" | grep Running

# Wait beyond the default healthy replica threshold
sleep 40

curl -s localhost:15${replicauid}/debug/status_details

mysql -e "insert into customer values (200, 'mlord2@planetscale.com')"

grep vstream /opt/vtdataroot/tmp/vtgate.WARNING

The VStream never tries to reinitialize and select a new tablet. And it never receives the second INSERT because the replica it's connected to never receives it:

[type:BEGIN  keyspace:"commerce"  shard:"0" type:FIELD  field_event:{table_name:"commerce.customer"  fields:{name:"customer_id"  type:INT64  table:"customer"  org_table:"customer"  database:"vt_commerce"  org_name:"customer_id"  column_length:20  charset:63  flags:49667  column_type:"bigint"}  fields:{name:"email"  type:VARBINARY  table:"customer"  org_table:"customer"  database:"vt_commerce"  org_name:"email"  column_length:128  charset:63  flags:128  column_type:"varbinary(128)"}  keyspace:"commerce"  shard:"0"}  keyspace:"commerce"  shard:"0"]
[type:VGTID  vgtid:{shard_gtids:{keyspace:"commerce"  shard:"0"  gtid:"MySQL56/71551112-fdcd-11ee-9fe9-8fcc543c344e:1-43"}}  keyspace:"commerce"  shard:"0"]
[type:ROW  row_event:{table_name:"commerce.customer"  row_changes:{after:{lengths:1  lengths:16  values:"1alice@domain.com"}}  keyspace:"commerce"  shard:"0"}  keyspace:"commerce"  shard:"0" type:ROW  row_event:{table_name:"commerce.customer"  row_changes:{after:{lengths:1  lengths:14  values:"2bob@domain.com"}}  keyspace:"commerce"  shard:"0"}  keyspace:"commerce"  shard:"0" type:ROW  row_event:{table_name:"commerce.customer"  row_changes:{after:{lengths:1  lengths:18  values:"3charlie@domain.com"}}  keyspace:"commerce"  shard:"0"}  keyspace:"commerce"  shard:"0" type:ROW  row_event:{table_name:"commerce.customer"  row_changes:{after:{lengths:1  lengths:14  values:"4dan@domain.com"}}  keyspace:"commerce"  shard:"0"}  keyspace:"commerce"  shard:"0" type:ROW  row_event:{table_name:"commerce.customer"  row_changes:{after:{lengths:1  lengths:14  values:"5eve@domain.com"}}  keyspace:"commerce"  shard:"0"}  keyspace:"commerce"  shard:"0" type:VGTID  vgtid:{shard_gtids:{keyspace:"commerce"  shard:"0"  gtid:"MySQL56/71551112-fdcd-11ee-9fe9-8fcc543c344e:1-43"  table_p_ks:{table_name:"customer"  lastpk:{fields:{name:"customer_id"  type:INT64  charset:63  flags:49667}  rows:{lengths:1  values:"5"}}}}}  keyspace:"commerce"  shard:"0" type:COMMIT  keyspace:"commerce"  shard:"0"]
[type:BEGIN  keyspace:"commerce"  shard:"0" type:VGTID  vgtid:{shard_gtids:{keyspace:"commerce"  shard:"0"  gtid:"MySQL56/71551112-fdcd-11ee-9fe9-8fcc543c344e:1-43"}}  keyspace:"commerce"  shard:"0" type:COMMIT  keyspace:"commerce"  shard:"0"]
[type:COPY_COMPLETED  keyspace:"commerce"  shard:"0" type:COPY_COMPLETED]

[type:BEGIN  timestamp:1713476997  current_time:1713476997575921000  keyspace:"commerce"  shard:"0" type:FIELD  timestamp:1713476997  field_event:{table_name:"commerce.customer"  fields:{name:"customer_id"  type:INT64  table:"customer"  org_table:"customer"  database:"vt_commerce"  org_name:"customer_id"  column_length:20  charset:63  flags:49667  column_type:"bigint"}  fields:{name:"email"  type:VARBINARY  table:"customer"  org_table:"customer"  database:"vt_commerce"  org_name:"email"  column_length:128  charset:63  flags:128  column_type:"varbinary(128)"}  keyspace:"commerce"  shard:"0"}  current_time:1713476997577073000  keyspace:"commerce"  shard:"0" type:ROW  timestamp:1713476997  row_event:{table_name:"commerce.customer"  row_changes:{after:{lengths:3  lengths:22  values:"100mlord1@planetscale.com"}}  keyspace:"commerce"  shard:"0"  flags:1}  current_time:1713476997577249000  keyspace:"commerce"  shard:"0" type:VGTID  vgtid:{shard_gtids:{keyspace:"commerce"  shard:"0"  gtid:"MySQL56/71551112-fdcd-11ee-9fe9-8fcc543c344e:1-44"}}  keyspace:"commerce"  shard:"0" type:COMMIT  timestamp:1713476997  current_time:1713476997577257000  keyspace:"commerce"  shard:"0"]