vitessio / vitess

Vitess is a database clustering system for horizontal scaling of MySQL.
http://vitess.io
Apache License 2.0
18.51k stars 2.09k forks source link

Bug Report: Transaction throttler does not respect config parameters passed to it in CLI #12549

Open ejortegau opened 1 year ago

ejortegau commented 1 year ago

Overview of the Issue

The transaction throttler takes several config parameters. Specifically, the docs indicate the following:

The configuration of the transaction throttler as a text formatted throttlerdata.Configuration protocol buffer message (default "target_replication_lag_sec: 2\nmax_replication_lag_sec: 10\ninitial_rate: 100\nmax_increase: 1\nemergency_decrease: 0.5\nmin_duration_between_increases_sec: 40\nmax_duration_between_increases_sec: 62\nmin_duration_between_decreases_sec: 20\nspread_backlog_across_sec: 20\nage_bad_rate_after_sec: 180\nbad_rate_increase: 0.1\nmax_rate_approach_threshold: 0.9\n")

However, at the very least the initial rate is not being passed to the max replication lag module that is used by the transaction throttler. This can be seen by checking code calls as follows:

TabletServer's TxThrottler is created here using txthrottler.NewTxThrottler(). That ends up instantiating a txThrottlerState which in turn creates a Throttler via throttler.NewThrottler() and the throttler.newThrottler(). The last one of those instantiates a MaxReplicationLagModule and, while doing so, uses NewMaxReplicationLagModuleConfig() which only clones the default configuration and overrides the MaxReplicationLagSec attribute. Other attributes in the configuration are only being passed to the underlying MaxReplicationLagModule later by calling Throttler.UpdateConfiguration(), but this is done after the new MaxReplicationLagModule instance has been created and set to have a rate that matches the one passed in the configuration (which had the default one of 100). Therefore, the MaxReplicationLagoModule always starts with an initial rate of 100 (the default) instead of the one passed by the CLI arguments.

Reproduction Steps

Start vttablet with:

-enable-tx-throttler -tx-throttler-config target_replication_lag_sec: 10 max_replication_lag_sec: 80 initial_rate: 1000 max_increase: 1 emergency_decrease: 0.5 min_duration_between_increases_sec: 2 max_duration_between_increases_sec: 5 min_duration_between_decreases_sec: 1 spread_backlog_across_sec: 1 age_bad_rate_after_sec: 180 bad_rate_increase: 0.1 max_rate_approach_threshold: 0.9 -tx-throttler-healthcheck-cells <cells>

Tail -f the vttablet log. Notice these messages:

I0303 06:57:54.345315   25610 max_replication_lag_module.go:378] rate was: not changed from: 100 to: 100

showing that the rate that it is using is the one coming from default config instead of the one coming from the CLI flag.

Binary Version

Version: 12.0.5 (Jenkins build 765) (Git revision 1c1fea83df branch 'HEAD') built on Fri Sep 30 16:10:04 UTC 2022 by root@8c60c8cf557f using go1.17.12 linux/amd64

Operating System and Environment details

NAME="Ubuntu"
VERSION="18.04.6 LTS (Bionic Beaver)"
Linux 5.4.0-1096-aws
x86_64

Log Fragments

I0303 06:57:54.345315   25610 max_replication_lag_module.go:378] rate was: not changed from: 100 to: 100
ejortegau commented 1 year ago

Actually, the issue description text is factually incorrect - my bad, I had missed this when I filed it. The config seems to be passed here - though it seems to take a while to be used to increase the rate. I am going to close this.

ejortegau commented 1 year ago

Re-opening after realizing that one parameter - Initial Max Rate - is indeed not being applied. Bug description has been updated.