Closed hq6 closed 7 months ago
Here's a test case:
git checkout main && make build
pushd examples/local
./101_initial_cluster.sh
while true; do
vtctldclient GetFullStatus zone1-100
done
After some hours you should start to get the execve
errors. It produced the error for me after ~ 4 hours:
❯ vtctldclient GetFullStatus zone1-100
E0131 01:17:34.237400 9068 main.go:56] rpc error: code = Unknown desc = TabletManager.FullStatus on zone1-0000000100: /usr/local/mysql/bin/mysqld: fork/exec /usr/local/mysql/bin/mysqld: argument list too long, output:
Additionally, it seems that after ./101_initial_cluster.sh
, some process (not sure which) is causing this call to happen on a regular basis.
Here is a sample of the strace
output of vttablet
, which continues repeatedly even with no requests happening at all that I've started.
strace: Process 3851321 attached
[pid 3851321] execve("/usr/sbin/mysqld", ["/usr/sbin/mysqld", "--version"], ["SHELL=/bin/bash", "KEYSPACE=commerce", "NVM_INC=/home/ubuntu/.nvm/versions/node/v18.19.0/include/node", "TERM_PROGRAM_VERSION=3.2a", "TMUX=/tmp/tmux-1000/default,3525,0", "VTDATAROOT=/home/ubuntu/my-vitess-example/examples/local/vtdataroot", "LOGNAME=ubuntu", "XDG_SESSION_TYPE=tty", "MOTD_SHOWN=pam", "TABLET_UID=101", "HOME=/home/ubuntu", "LANG=C.UTF-8", "LS_COLORS=rs=0:di=01;34:ln=01;36:mh=00:pi=40;33:so=01;35:do=01;35:bd=40;33;01:cd=40;33;01:or=40;31;01:mi=00:su=37;41:sg=30;43:ca=30;41:tw=30;42:ow=34;42:st=37;44:ex=01;32:*.tar=01;31:*.tgz=01;31:*.arc=01;31:*.arj=01;31:*.taz=01;31:*.lha=01;31:*.lz4=01;31:*.lzh=01;31:*.lzma=01;31:*.tlz=01;31:*.txz=01;31:*.tzo=01;31:*.t7z=01;31:*.zip=01;31:*.z=01;31:*.dz=01;31:*.gz=01;31:*.lrz=01;31:*.lz=01;31:*.lzo=01;31:*.xz=01;31:*.zst=01;31:*.tzst=01;31:*.bz2=01;31:*.bz=01;31:*.tbz=01;31:*.tbz2=01;31:*.tz=01;31:*.deb=01;31:*.rpm=01;31:*.jar=01;31:*.war=01;31:*.ear=01;31:*.sar=01;31:*.rar=01;31:*.alz=01;31:*.ace=01;31:*.zoo=01;31:*.cpio=01;31:*.7z=01;31:*.rz=01;31:*.cab=01;31:*.wim=01;31:*.swm=01;31:*.dwm=01;31:*.esd=01;31:*.jpg=01;35:*.jpeg=01;35:*.mjpg=01;35:*.mjpeg=01;35:*.gif=01;35:*.bmp=01;35:*.pbm=01;35:*.pgm=01;35:*.ppm=01;35:*.tga=01;35:*.xbm=01;35:*.xpm=01;35:*.tif=01;35:*.tiff=01;35:*.png=01;35:*.svg=01;35:*.svgz=01;35:*.mng=01;35:*.pcx=01;35:*.mov=01;35:*.mpg=01;35:*.mpeg=01;35:*.m2v=01;35:*.mkv=01;35:*.webm=01;35"..., "SSH_CONNECTION=10.1.71.10 50476 10.170.116.99 22", "NVM_DIR=/home/ubuntu/.nvm", "LESSCLOSE=/usr/bin/lesspipe %s %s", "XDG_SESSION_CLASS=user", "TERM=screen", "LESSOPEN=| /usr/bin/lesspipe %s", "USER=ubuntu", "TMUX_PANE=%3", "SHLVL=3", "NVM_CD_FLAGS=", "CELL=zone1", "XDG_SESSION_ID=6", "LC_CTYPE=en_US.UTF-8", "XDG_RUNTIME_DIR=/run/user/1000", "SSH_CLIENT=10.1.71.10 49438 22", "XDG_DATA_DIRS=/usr/local/share:/usr/share:/var/lib/snapd/desktop", "PATH=/usr/sbin:/usr/sbin:/usr/sbin:/usr/sbin:/usr/sbin:/usr/sbin:/usr/sbin:/usr/sbin:/usr/sbin:/usr/sbin:/usr/sbin:/usr/sbin:/usr/sbin:/usr/sbin:/usr/sbin:/usr/sbin:/usr/sbin:/usr/sbin:/usr/sbin:/usr/sbin:/usr/sbin:/usr/sbin:/usr/sbin:/usr/sbin:/usr/sbin:/usr/sbin:/usr/sbin:/usr/sbin:/usr/sbin:/usr/sbin:/usr/sbin:/usr/sbin:/usr/sbin:/usr/sbin:/usr/sbin:/usr/sbin:/usr/sbin:/usr/sbin:/usr/sbin:/usr/sbin:/usr/sbin:/usr/sbin:/usr/sbin:/usr/sbin:/usr/sbin:/usr/sbin:/usr/sbin:/usr/sbin:/usr/sbin:/usr/sbin:/usr/sbin:/usr/sbin:/usr/sbin:/usr/sbin:/usr/sbin:/usr/sbin:/usr/sbin:/usr/sbin:/usr/sbin:/usr/sbin:/usr/sbin:/usr/sbin:/usr/sbin:/usr/sbin:/usr/sbin:/usr/sbin:/usr/sbin:/usr/sbin:/usr/sbin:/usr/sbin:/usr/sbin:/usr/sbin:/usr/sbin:/usr/sbin:/usr/sbin:/usr/sbin:/usr/sbin:/usr/sbin:/usr/sbin:/usr/sbin:/usr/sbin:/usr/sbin:/usr/sbin:/usr/sbin:/usr/sbin:/usr/sbin:/usr/sbin:/usr/sbin:/usr/sbin:/usr/sbin:/usr/sbin:/usr/sbin:/usr/sbin:/usr/sbin:/usr/sbin:/usr/sbin:/usr/sbin:/usr/sbin:/usr/sbin:/usr/sbin:/usr/sbin:/usr/sbin"..., "DBUS_SESSION_BUS_ADDRESS=unix:path=/run/user/1000/bus", "NVM_BIN=/home/ubuntu/.nvm/versions/node/v18.19.0/bin", "SSH_TTY=/dev/pts/0", "OLDPWD=/home/ubuntu/my-vitess-example/examples/common/scripts", "TERM_PROGRAM=tmux", "_=/usr/local/vitess/bin/vttablet", "PWD=/usr"]) = 0
[pid 3851321] +++ exited with 0 +++
strace: Process 3851324 attached
[pid 3851324] execve("/usr/sbin/mysqld", ["/usr/sbin/mysqld", "--version"], ["SHELL=/bin/bash", "KEYSPACE=commerce", "NVM_INC=/home/ubuntu/.nvm/versions/node/v18.19.0/include/node", "TERM_PROGRAM_VERSION=3.2a", "TMUX=/tmp/tmux-1000/default,3525,0", "VTDATAROOT=/home/ubuntu/my-vitess-example/examples/local/vtdataroot", "LOGNAME=ubuntu", "XDG_SESSION_TYPE=tty", "MOTD_SHOWN=pam", "TABLET_UID=101", "HOME=/home/ubuntu", "LANG=C.UTF-8", "LS_COLORS=rs=0:di=01;34:ln=01;36:mh=00:pi=40;33:so=01;35:do=01;35:bd=40;33;01:cd=40;33;01:or=40;31;01:mi=00:su=37;41:sg=30;43:ca=30;41:tw=30;42:ow=34;42:st=37;44:ex=01;32:*.tar=01;31:*.tgz=01;31:*.arc=01;31:*.arj=01;31:*.taz=01;31:*.lha=01;31:*.lz4=01;31:*.lzh=01;31:*.lzma=01;31:*.tlz=01;31:*.txz=01;31:*.tzo=01;31:*.t7z=01;31:*.zip=01;31:*.z=01;31:*.dz=01;31:*.gz=01;31:*.lrz=01;31:*.lz=01;31:*.lzo=01;31:*.xz=01;31:*.zst=01;31:*.tzst=01;31:*.bz2=01;31:*.bz=01;31:*.tbz=01;31:*.tbz2=01;31:*.tz=01;31:*.deb=01;31:*.rpm=01;31:*.jar=01;31:*.war=01;31:*.ear=01;31:*.sar=01;31:*.rar=01;31:*.alz=01;31:*.ace=01;31:*.zoo=01;31:*.cpio=01;31:*.7z=01;31:*.rz=01;31:*.cab=01;31:*.wim=01;31:*.swm=01;31:*.dwm=01;31:*.esd=01;31:*.jpg=01;35:*.jpeg=01;35:*.mjpg=01;35:*.mjpeg=01;35:*.gif=01;35:*.bmp=01;35:*.pbm=01;35:*.pgm=01;35:*.ppm=01;35:*.tga=01;35:*.xbm=01;35:*.xpm=01;35:*.tif=01;35:*.tiff=01;35:*.png=01;35:*.svg=01;35:*.svgz=01;35:*.mng=01;35:*.pcx=01;35:*.mov=01;35:*.mpg=01;35:*.mpeg=01;35:*.m2v=01;35:*.mkv=01;35:*.webm=01;35"..., "SSH_CONNECTION=10.1.71.10 50476 10.170.116.99 22", "NVM_DIR=/home/ubuntu/.nvm", "LESSCLOSE=/usr/bin/lesspipe %s %s", "XDG_SESSION_CLASS=user", "TERM=screen", "LESSOPEN=| /usr/bin/lesspipe %s", "USER=ubuntu", "TMUX_PANE=%3", "SHLVL=3", "NVM_CD_FLAGS=", "CELL=zone1", "XDG_SESSION_ID=6", "LC_CTYPE=en_US.UTF-8", "XDG_RUNTIME_DIR=/run/user/1000", "SSH_CLIENT=10.1.71.10 49438 22", "XDG_DATA_DIRS=/usr/local/share:/usr/share:/var/lib/snapd/desktop", "PATH=/usr/sbin:/usr/sbin:/usr/sbin:/usr/sbin:/usr/sbin:/usr/sbin:/usr/sbin:/usr/sbin:/usr/sbin:/usr/sbin:/usr/sbin:/usr/sbin:/usr/sbin:/usr/sbin:/usr/sbin:/usr/sbin:/usr/sbin:/usr/sbin:/usr/sbin:/usr/sbin:/usr/sbin:/usr/sbin:/usr/sbin:/usr/sbin:/usr/sbin:/usr/sbin:/usr/sbin:/usr/sbin:/usr/sbin:/usr/sbin:/usr/sbin:/usr/sbin:/usr/sbin:/usr/sbin:/usr/sbin:/usr/sbin:/usr/sbin:/usr/sbin:/usr/sbin:/usr/sbin:/usr/sbin:/usr/sbin:/usr/sbin:/usr/sbin:/usr/sbin:/usr/sbin:/usr/sbin:/usr/sbin:/usr/sbin:/usr/sbin:/usr/sbin:/usr/sbin:/usr/sbin:/usr/sbin:/usr/sbin:/usr/sbin:/usr/sbin:/usr/sbin:/usr/sbin:/usr/sbin:/usr/sbin:/usr/sbin:/usr/sbin:/usr/sbin:/usr/sbin:/usr/sbin:/usr/sbin:/usr/sbin:/usr/sbin:/usr/sbin:/usr/sbin:/usr/sbin:/usr/sbin:/usr/sbin:/usr/sbin:/usr/sbin:/usr/sbin:/usr/sbin:/usr/sbin:/usr/sbin:/usr/sbin:/usr/sbin:/usr/sbin:/usr/sbin:/usr/sbin:/usr/sbin:/usr/sbin:/usr/sbin:/usr/sbin:/usr/sbin:/usr/sbin:/usr/sbin:/usr/sbin:/usr/sbin:/usr/sbin:/usr/sbin:/usr/sbin:/usr/sbin:/usr/sbin:/usr/sbin:/usr/sbin:/usr/sbin"..., "DBUS_SESSION_BUS_ADDRESS=unix:path=/run/user/1000/bus", "NVM_BIN=/home/ubuntu/.nvm/versions/node/v18.19.0/bin", "SSH_TTY=/dev/pts/0", "OLDPWD=/home/ubuntu/my-vitess-example/examples/common/scripts", "TERM_PROGRAM=tmux", "_=/usr/local/vitess/bin/vttablet", "PWD=/usr"]) = 0
It just occurred to me that this issue does not just affect the execve
call; it also causes unbounded memory growth in the vttablet
process.
I'm now confused about why the unbounded memory growth was never reported by people running vttablet in production.
Is there some production configuration that avoids this code path, such that this issue only happens with the local example configuration?
Overview of the Issue
After starting up the local example with
./101_initial_cluster.sh
and then waiting for a few hours, I observe the following behavior when trying to callGetFullStatus
.Reproduction Steps
./101_initial_cluster.sh
.Binary Version
Operating System and Environment details
Log Fragments
If I
strace
thevttablet
process, I see the following output: