Open JonathanHuot opened 7 years ago
Please retry with latest stable release.
Hi, I have the same issue.
Version:
python3 -m pip list | grep uWSGI
uWSGI 2.0.15
I'm using the following to enable the uwsgi master fifo file:
master-fifo=/tmp/uwsgi__lx__hydra__amdde_dev__dwe__dwe.fifo
When sending the chain reload instruction:
echo c > ${file}
I can see in the logfiles that sometimes I get the error uwsgi_master_fifo()/mkfifo(): File exists [core/fifo.c line 110]
More complete log:
Mon Jul 29 10:13:20 2019 - chain reload starting...
Mon Jul 29 10:13:20 2019 - chain next victim is worker 1
Gracefully killing worker 1 (pid: 24937)...
uwsgi_master_fifo()/mkfifo(): File exists [core/fifo.c line 110]
07-29 10:13:20 - INFO > amdde_dev_dwe_dwe_worker_1: Stopping external process...
Mon Jul 29 10:13:20 2019 - uWSGI worker 2 screams: UAAAAAAH my master disconnected: i will kill myself !!!
Mon Jul 29 10:13:20 2019 - uWSGI worker 3 screams: UAAAAAAH my master disconnected: i will kill myself !!!
Mon Jul 29 10:13:20 2019 - uWSGI worker 4 screams: UAAAAAAH my master disconnected: i will kill myself !!!
07-29 10:13:20 - INFO > amdde_dev_dwe_dwe_worker_2: Stopping external process...
07-29 10:13:20 - INFO > amdde_dev_dwe_dwe_worker_4: Stopping external process...
07-29 10:13:20 - INFO > amdde_dev_dwe_dwe_worker_3: Stopping external process...
The main issue about this is that the master crashes brutally in the case of mkfifo(): File exists
, so the uwsgi workers are loosing its connection to the master and killing themselves. Then, the workers are going crazy and are running on 100% CPU usage. See #296 (and maybe others).
Can't uwsgi just use the existing fifo file, or can I inject a command to remove that file if needed? In my opinion uwsgi could just use the file, because it's up to the developers/users to provide a unique file to each uwsgi!
Please help me. @xrmx
It still happens with v2.0.18. @xrmx
We have encountered the same issue as reported by @labsolution. @xrmx is there are chance this issue will be handled in the next weeks/months. Can I somehow help to solve that issue?
I've opened #2091 to log an unlink()
error if it happens - maybe this can help to debug.
Scenarios that may cause uwsgi to fail removing/creating the fifo I can come up with, but it's widely guessing...
The stated echo c > ${file}
will create a normal when the fifo doesn't exist. If this is done as root or privileged user, there may be permission issues. Maybe script should guard the invocation by adding a check for the named pipe: test -p ${file} && echo c > ${file}
echo c > ${file}
isn't executed just once, but in a tight-ish loop or simultaneously from two independent scripts (two cron jobs running at the same time?).
Some interaction with systemd
(or other supervisor/scripts) that deletes/creates or changes permissions/owner of the directory/master-fifo. At least the initial report mentioned /var/run/nginx/uwsgi.masterfifo
which could be a systemd
managed RuntimeDirectory of nginx.
@labsolution @palmkevin - is there any chance you could you check your environment? E.g, are you using systemd, other supervisor, cleanup scripts running, etc? Could you post the state of the filesystem (permissions, owners, types) when/after the issue occurred?
Hello @awelzel ,
I installed the master version and did some tests.
I added the test command and it doesn't seem to help. However, it takes me 3-4 tries before I get the error. The first 1 and 2 times it works correctly. W or w/ the test command. Interestingly it's reproducable every 3-4 tries.
Log:
Mon Oct 28 08:45:07 2019 - chain reload starting... Mon Oct 28 08:45:07 2019 - chain next victim is worker 1 Gracefully killing worker 1 (pid: 1240)... 08:45:07 - INFO > amdde_dev_dwe_dwe_worker_1: Stopping external process... uwsgi_master_fifo()/mkfifo(): File exists [core/fifo.c line 112] Mon Oct 28 08:45:07 2019 - uWSGI worker 4 error: the master disconnected from this worker. Shutting down the worker. Mon Oct 28 08:45:07 2019 - uWSGI worker 2 error: the master disconnected from this worker. Shutting down the worker.
Note that I don't see your new unlink warning.
Version log:
Starting uWSGI 2.1-dev (64bit) on [Mon Oct 28 08:45:11 2019] compiled with version: 4.8.5 20150623 (Red Hat 4.8.5-36) on 28 October 2019 07:20:36 os: Linux-3.10.0-957.21.2.el7.x86_64 #1 SMP Tue May 28 09:26:43 UTC 2019 nodename: lsdev01 machine: x86_64 clock source: unix detected number of CPU cores: 8 current working directory: /home/lx
Fifo files after the error happened:
lsdev01(dwe/amdde_dev):/opt/ls/lx/release/dwe> ls -latr /tmp/uwsgilx*amdde_devdwedwe.fifo prw------- 1 lx lx 0 Oct 25 05:20 /tmp/uwsgilxsocket__amdde_devdwedwe.fifo prw------- 1 lx lx 0 Oct 28 08:51 /tmp/uwsgilxrestuiamdde_devdwedwe.fifo prw------- 1 lx lx 0 Oct 28 08:51 /tmp/uwsgilxhydraamdde_devdwedwe.fifo prw------- 1 lx lx 0 Oct 28 08:51 /tmp/uwsgilxrest__amdde_devdwe__dwe.fifo
I'm 99% sure that the echo is done only once for a file by me. Yes, we are in a loop but I execute the echo only once.
Script:
files=`ls /tmp/uwsgi__lx__*__${CUR_RELEASE}__*.fifo | grep "$1" | grep -v grep 2> /dev/null`
if [ "$?" -eq 0 ]; then
# Advantage: Workers are reloaded one after the other, so no downtime and issue LX-9607 shoud be fixed
echo "Using 'Chain reloading (lazy apps)' option to reload the processes..."
is_named_pipe_being_read(){ /bin/echo unlocked 1<>"$1" >"$1"; }
for file in ${files}; do
echo "Using master fifo file ${file}..."
if is_named_pipe_being_read ${file}; then
echo c > ${file} # c = chain-reload, w = worker-reload
if [ "$?" -eq 0 ]; then
echo " (done)"
else
echo " (error)"
exit 1;
fi
else
echo " (no readers)"
fi
done
fi
Execution log:
Using 'Chain reloading (lazy apps)' option to reload the processes... Using master fifo file /tmp/uwsgilxhydraamdde_devdwedwe.fifo... (done) Using master fifo file /tmp/uwsgilxrest__amdde_devdwedwe.fifo... (done) Using master fifo file /tmp/uwsgilxrestuiamdde_devdwedwe.fifo... (done) Using master fifo file /tmp/uwsgilxsocket__amdde_devdwedwe.fifo... (no readers)
Thank you so much for your support!
Hey @labsolution , thanks for the script, this helps to understand what's happening:
is_named_pipe_being_read(){ /bin/echo unlocked 1<>"$1" >"$1"; }
This writes "unlocked" into the master fifo and then right afterwards there is echo c > ${file}
. If there's a uwsgi
master listening, then the c
in unlocked
triggers the first chain reload. Then, echo c
attempts to trigger the second one right afterwards - see point 2. in my previous reply.
I was able to reproduce it locally as follows (However: I did observe the master.fifo as a regular file - I'm assuming you restart uwsgi quickly when you observe it dead and that cleaned up and re-created the master fifo).
# The body of the while loop is what your script basically does.
$ while true ; do (/bin/echo unlocked 1<>/tmp/master.fifo > /tmp/master.fifo) && echo "c" > /tmp/master.fifo ; sleep 0.3 ; done
$ ./uwsgi --master-fifo /tmp/master.fifo --http-socket :9090 --workers 1 --plugin ./python37_plugin.so --wsgi-file tests/hello.py
...
Mon Oct 28 10:06:11 2019 - chain reloading complete
Mon Oct 28 10:06:11 2019 - chain reload starting...
Mon Oct 28 10:06:11 2019 - chain next victim is worker 1
Gracefully killing worker 1 (pid: 9800)...
uwsgi_master_fifo()/mkfifo(): File exists [core/fifo.c line 112]
$ ls -lha /tmp/master.fifo
-rw-r--r-- 1 awelzel awelzel 2 Oct 28 10:07 /tmp/master.fifo
And that at least explains things for your.
Do you actually need the is_named_pipe_being_read()
? Maybe the following would suffice to avoid blocking fifo that isn't read for too long?
if [ -p ${file}] ; then
timeout 1 sh -c "echo c > ${file}"
fi
Probably protect the whole script with cronutils runlock or so, too https://github.com/google/cronutils
PS:
I'm 99% sure that the echo is done only once for a file by me.
There was a 1% chance it's done more than once after all, right? :-)
You are the man! Thank you so much!
New line:
test -p ${file} && timeout 3 bash -c "echo c > ${file}"
You found the 1% :) That's why we should never say 100% :)
Feel free to add things to the doc, so others don't run into the same issues.
@palmkevin
Hi, I have sometime errors when I'm sending a reload instruction to Master FIFO with uWSGI 1.9.18.2.
uWSGI configuration is :
Permissions looks good :
Command to send a reload (during logrotate) is :
timeout 60 bash -c "echo 'r' > /var/run/nginx/uwsgi.masterfifo"
Example of correct reload:
Example of failed reload:
If we look at
core/fifo.c:86
, we don't know if the unlink failed or the mkfifo, because we have none of the return codes are written into the logs. I think, at minimum, uWSGI should write return codes of system calls and system libraries.