vitessio / vitess

Vitess is a database clustering system for horizontal scaling of MySQL.
http://vitess.io
Apache License 2.0
18.42k stars 2.08k forks source link

Bug Report: GetFullStatus results in E2BIG #15095

Closed hq6 closed 7 months ago

hq6 commented 7 months ago

Overview of the Issue

After starting up the local example with ./101_initial_cluster.sh and then waiting for a few hours, I observe the following behavior when trying to call GetFullStatus.

vtctldclient --server=localhost:15999 GetFullStatus "$PRIMARY_TABLET"
E0130 01:35:12.979772 3712567 main.go:56] rpc error: code = Unknown desc = TabletManager.FullStatus on zone1-0000000101 error: /usr/sbin/mysqld: fork/exec /usr/sbin/mysqld: argument list too long, output: : /usr/sbin/mysqld: fork/exec /usr/sbin/mysqld: argument list too long, output:

Reproduction Steps

  1. Run through the Vitess local install guide.
  2. Run ./101_initial_cluster.sh.
  3. Wait several hours (not exactly sure how long, because I waited overnight).
  4. Run the commands below:
    source ../common/env.sh
    vtctldclient --server=localhost:15999 GetFullStatus "$PRIMARY_TABLET"
  5. Observe error.

Binary Version

vttablet --version
vttablet version Version: 18.0.2 (Git revision d3012c188ea0cfc6837917fc6642ea23be9bb1ff branch 'HEAD') built on Wed Dec 20 14:27:31 UTC 2023 by runner@fv-az975-901 using go1.21.5 linux/amd64

Operating System and Environment details

cat /etc/os-release
PRETTY_NAME="Ubuntu 22.04.3 LTS"
NAME="Ubuntu"
VERSION_ID="22.04"
VERSION="22.04.3 LTS (Jammy Jellyfish)"
VERSION_CODENAME=jammy
ID=ubuntu
ID_LIKE=debian
HOME_URL="https://www.ubuntu.com/"
SUPPORT_URL="https://help.ubuntu.com/"
BUG_REPORT_URL="https://bugs.launchpad.net/ubuntu/"
PRIVACY_POLICY_URL="https://www.ubuntu.com/legal/terms-and-policies/privacy-policy"
UBUNTU_CODENAME=jammy

uname -sr
Linux 6.2.0-1017-aws

uname -m
x86_64

Log Fragments

If I strace the vttablet process, I see the following output:

strace: Process 3803474 attached 
[pid 3803474] execve("/usr/sbin/mysqld", ["/usr/sbin/mysqld", "--version"], 0xc000f72780 /* 37 vars */) = -1 E2BIG (Argument list too long)
[pid 3803474] +++ exited with 253 +++
[pid 3727238] --- SIGCHLD {si_signo=SIGCHLD, si_code=CLD_EXITED, si_pid=3803474, si_uid=1000, si_status=253, si_utime=0, si_stime=0} --- 
mattlord commented 7 months ago

Here's a test case:

git checkout main && make build

pushd examples/local

./101_initial_cluster.sh

while true; do
  vtctldclient GetFullStatus zone1-100
done

After some hours you should start to get the execve errors. It produced the error for me after ~ 4 hours:

❯ vtctldclient GetFullStatus zone1-100
E0131 01:17:34.237400    9068 main.go:56] rpc error: code = Unknown desc = TabletManager.FullStatus on zone1-0000000100: /usr/local/mysql/bin/mysqld: fork/exec /usr/local/mysql/bin/mysqld: argument list too long, output:
hq6 commented 7 months ago

Additionally, it seems that after ./101_initial_cluster.sh, some process (not sure which) is causing this call to happen on a regular basis.

Here is a sample of the strace output of vttablet, which continues repeatedly even with no requests happening at all that I've started.


strace: Process 3851321 attached
[pid 3851321] execve("/usr/sbin/mysqld", ["/usr/sbin/mysqld", "--version"], ["SHELL=/bin/bash", "KEYSPACE=commerce", "NVM_INC=/home/ubuntu/.nvm/versions/node/v18.19.0/include/node", "TERM_PROGRAM_VERSION=3.2a", "TMUX=/tmp/tmux-1000/default,3525,0", "VTDATAROOT=/home/ubuntu/my-vitess-example/examples/local/vtdataroot", "LOGNAME=ubuntu", "XDG_SESSION_TYPE=tty", "MOTD_SHOWN=pam", "TABLET_UID=101", "HOME=/home/ubuntu", "LANG=C.UTF-8", "LS_COLORS=rs=0:di=01;34:ln=01;36:mh=00:pi=40;33:so=01;35:do=01;35:bd=40;33;01:cd=40;33;01:or=40;31;01:mi=00:su=37;41:sg=30;43:ca=30;41:tw=30;42:ow=34;42:st=37;44:ex=01;32:*.tar=01;31:*.tgz=01;31:*.arc=01;31:*.arj=01;31:*.taz=01;31:*.lha=01;31:*.lz4=01;31:*.lzh=01;31:*.lzma=01;31:*.tlz=01;31:*.txz=01;31:*.tzo=01;31:*.t7z=01;31:*.zip=01;31:*.z=01;31:*.dz=01;31:*.gz=01;31:*.lrz=01;31:*.lz=01;31:*.lzo=01;31:*.xz=01;31:*.zst=01;31:*.tzst=01;31:*.bz2=01;31:*.bz=01;31:*.tbz=01;31:*.tbz2=01;31:*.tz=01;31:*.deb=01;31:*.rpm=01;31:*.jar=01;31:*.war=01;31:*.ear=01;31:*.sar=01;31:*.rar=01;31:*.alz=01;31:*.ace=01;31:*.zoo=01;31:*.cpio=01;31:*.7z=01;31:*.rz=01;31:*.cab=01;31:*.wim=01;31:*.swm=01;31:*.dwm=01;31:*.esd=01;31:*.jpg=01;35:*.jpeg=01;35:*.mjpg=01;35:*.mjpeg=01;35:*.gif=01;35:*.bmp=01;35:*.pbm=01;35:*.pgm=01;35:*.ppm=01;35:*.tga=01;35:*.xbm=01;35:*.xpm=01;35:*.tif=01;35:*.tiff=01;35:*.png=01;35:*.svg=01;35:*.svgz=01;35:*.mng=01;35:*.pcx=01;35:*.mov=01;35:*.mpg=01;35:*.mpeg=01;35:*.m2v=01;35:*.mkv=01;35:*.webm=01;35"..., "SSH_CONNECTION=10.1.71.10 50476 10.170.116.99 22", "NVM_DIR=/home/ubuntu/.nvm", "LESSCLOSE=/usr/bin/lesspipe %s %s", "XDG_SESSION_CLASS=user", "TERM=screen", "LESSOPEN=| /usr/bin/lesspipe %s", "USER=ubuntu", "TMUX_PANE=%3", "SHLVL=3", "NVM_CD_FLAGS=", "CELL=zone1", "XDG_SESSION_ID=6", "LC_CTYPE=en_US.UTF-8", "XDG_RUNTIME_DIR=/run/user/1000", "SSH_CLIENT=10.1.71.10 49438 22", "XDG_DATA_DIRS=/usr/local/share:/usr/share:/var/lib/snapd/desktop", "PATH=/usr/sbin:/usr/sbin:/usr/sbin:/usr/sbin:/usr/sbin:/usr/sbin:/usr/sbin:/usr/sbin:/usr/sbin:/usr/sbin:/usr/sbin:/usr/sbin:/usr/sbin:/usr/sbin:/usr/sbin:/usr/sbin:/usr/sbin:/usr/sbin:/usr/sbin:/usr/sbin:/usr/sbin:/usr/sbin:/usr/sbin:/usr/sbin:/usr/sbin:/usr/sbin:/usr/sbin:/usr/sbin:/usr/sbin:/usr/sbin:/usr/sbin:/usr/sbin:/usr/sbin:/usr/sbin:/usr/sbin:/usr/sbin:/usr/sbin:/usr/sbin:/usr/sbin:/usr/sbin:/usr/sbin:/usr/sbin:/usr/sbin:/usr/sbin:/usr/sbin:/usr/sbin:/usr/sbin:/usr/sbin:/usr/sbin:/usr/sbin:/usr/sbin:/usr/sbin:/usr/sbin:/usr/sbin:/usr/sbin:/usr/sbin:/usr/sbin:/usr/sbin:/usr/sbin:/usr/sbin:/usr/sbin:/usr/sbin:/usr/sbin:/usr/sbin:/usr/sbin:/usr/sbin:/usr/sbin:/usr/sbin:/usr/sbin:/usr/sbin:/usr/sbin:/usr/sbin:/usr/sbin:/usr/sbin:/usr/sbin:/usr/sbin:/usr/sbin:/usr/sbin:/usr/sbin:/usr/sbin:/usr/sbin:/usr/sbin:/usr/sbin:/usr/sbin:/usr/sbin:/usr/sbin:/usr/sbin:/usr/sbin:/usr/sbin:/usr/sbin:/usr/sbin:/usr/sbin:/usr/sbin:/usr/sbin:/usr/sbin:/usr/sbin:/usr/sbin:/usr/sbin:/usr/sbin:/usr/sbin:/usr/sbin:/usr/sbin"..., "DBUS_SESSION_BUS_ADDRESS=unix:path=/run/user/1000/bus", "NVM_BIN=/home/ubuntu/.nvm/versions/node/v18.19.0/bin", "SSH_TTY=/dev/pts/0", "OLDPWD=/home/ubuntu/my-vitess-example/examples/common/scripts", "TERM_PROGRAM=tmux", "_=/usr/local/vitess/bin/vttablet", "PWD=/usr"]) = 0
[pid 3851321] +++ exited with 0 +++
strace: Process 3851324 attached
[pid 3851324] execve("/usr/sbin/mysqld", ["/usr/sbin/mysqld", "--version"], ["SHELL=/bin/bash", "KEYSPACE=commerce", "NVM_INC=/home/ubuntu/.nvm/versions/node/v18.19.0/include/node", "TERM_PROGRAM_VERSION=3.2a", "TMUX=/tmp/tmux-1000/default,3525,0", "VTDATAROOT=/home/ubuntu/my-vitess-example/examples/local/vtdataroot", "LOGNAME=ubuntu", "XDG_SESSION_TYPE=tty", "MOTD_SHOWN=pam", "TABLET_UID=101", "HOME=/home/ubuntu", "LANG=C.UTF-8", "LS_COLORS=rs=0:di=01;34:ln=01;36:mh=00:pi=40;33:so=01;35:do=01;35:bd=40;33;01:cd=40;33;01:or=40;31;01:mi=00:su=37;41:sg=30;43:ca=30;41:tw=30;42:ow=34;42:st=37;44:ex=01;32:*.tar=01;31:*.tgz=01;31:*.arc=01;31:*.arj=01;31:*.taz=01;31:*.lha=01;31:*.lz4=01;31:*.lzh=01;31:*.lzma=01;31:*.tlz=01;31:*.txz=01;31:*.tzo=01;31:*.t7z=01;31:*.zip=01;31:*.z=01;31:*.dz=01;31:*.gz=01;31:*.lrz=01;31:*.lz=01;31:*.lzo=01;31:*.xz=01;31:*.zst=01;31:*.tzst=01;31:*.bz2=01;31:*.bz=01;31:*.tbz=01;31:*.tbz2=01;31:*.tz=01;31:*.deb=01;31:*.rpm=01;31:*.jar=01;31:*.war=01;31:*.ear=01;31:*.sar=01;31:*.rar=01;31:*.alz=01;31:*.ace=01;31:*.zoo=01;31:*.cpio=01;31:*.7z=01;31:*.rz=01;31:*.cab=01;31:*.wim=01;31:*.swm=01;31:*.dwm=01;31:*.esd=01;31:*.jpg=01;35:*.jpeg=01;35:*.mjpg=01;35:*.mjpeg=01;35:*.gif=01;35:*.bmp=01;35:*.pbm=01;35:*.pgm=01;35:*.ppm=01;35:*.tga=01;35:*.xbm=01;35:*.xpm=01;35:*.tif=01;35:*.tiff=01;35:*.png=01;35:*.svg=01;35:*.svgz=01;35:*.mng=01;35:*.pcx=01;35:*.mov=01;35:*.mpg=01;35:*.mpeg=01;35:*.m2v=01;35:*.mkv=01;35:*.webm=01;35"..., "SSH_CONNECTION=10.1.71.10 50476 10.170.116.99 22", "NVM_DIR=/home/ubuntu/.nvm", "LESSCLOSE=/usr/bin/lesspipe %s %s", "XDG_SESSION_CLASS=user", "TERM=screen", "LESSOPEN=| /usr/bin/lesspipe %s", "USER=ubuntu", "TMUX_PANE=%3", "SHLVL=3", "NVM_CD_FLAGS=", "CELL=zone1", "XDG_SESSION_ID=6", "LC_CTYPE=en_US.UTF-8", "XDG_RUNTIME_DIR=/run/user/1000", "SSH_CLIENT=10.1.71.10 49438 22", "XDG_DATA_DIRS=/usr/local/share:/usr/share:/var/lib/snapd/desktop", "PATH=/usr/sbin:/usr/sbin:/usr/sbin:/usr/sbin:/usr/sbin:/usr/sbin:/usr/sbin:/usr/sbin:/usr/sbin:/usr/sbin:/usr/sbin:/usr/sbin:/usr/sbin:/usr/sbin:/usr/sbin:/usr/sbin:/usr/sbin:/usr/sbin:/usr/sbin:/usr/sbin:/usr/sbin:/usr/sbin:/usr/sbin:/usr/sbin:/usr/sbin:/usr/sbin:/usr/sbin:/usr/sbin:/usr/sbin:/usr/sbin:/usr/sbin:/usr/sbin:/usr/sbin:/usr/sbin:/usr/sbin:/usr/sbin:/usr/sbin:/usr/sbin:/usr/sbin:/usr/sbin:/usr/sbin:/usr/sbin:/usr/sbin:/usr/sbin:/usr/sbin:/usr/sbin:/usr/sbin:/usr/sbin:/usr/sbin:/usr/sbin:/usr/sbin:/usr/sbin:/usr/sbin:/usr/sbin:/usr/sbin:/usr/sbin:/usr/sbin:/usr/sbin:/usr/sbin:/usr/sbin:/usr/sbin:/usr/sbin:/usr/sbin:/usr/sbin:/usr/sbin:/usr/sbin:/usr/sbin:/usr/sbin:/usr/sbin:/usr/sbin:/usr/sbin:/usr/sbin:/usr/sbin:/usr/sbin:/usr/sbin:/usr/sbin:/usr/sbin:/usr/sbin:/usr/sbin:/usr/sbin:/usr/sbin:/usr/sbin:/usr/sbin:/usr/sbin:/usr/sbin:/usr/sbin:/usr/sbin:/usr/sbin:/usr/sbin:/usr/sbin:/usr/sbin:/usr/sbin:/usr/sbin:/usr/sbin:/usr/sbin:/usr/sbin:/usr/sbin:/usr/sbin:/usr/sbin:/usr/sbin:/usr/sbin:/usr/sbin"..., "DBUS_SESSION_BUS_ADDRESS=unix:path=/run/user/1000/bus", "NVM_BIN=/home/ubuntu/.nvm/versions/node/v18.19.0/bin", "SSH_TTY=/dev/pts/0", "OLDPWD=/home/ubuntu/my-vitess-example/examples/common/scripts", "TERM_PROGRAM=tmux", "_=/usr/local/vitess/bin/vttablet", "PWD=/usr"]) = 0
hq6 commented 7 months ago

It just occurred to me that this issue does not just affect the execve call; it also causes unbounded memory growth in the vttablet process. I'm now confused about why the unbounded memory growth was never reported by people running vttablet in production.

Is there some production configuration that avoids this code path, such that this issue only happens with the local example configuration?