winshining / nginx-http-flv-module

A media streaming server based on nginx-rtmp-module. In addtion to the features nginx-rtmp-module provides, HTTP-FLV, GOP cache, VHosts (one IP for multi domain names) and JSON style statistics are supported now.
BSD 2-Clause "Simplified" License
2.71k stars 567 forks source link

[bug] nginx reload causes lots of errors: 111: Connection refused and auto_push: connect failed in error log #252

Closed streamingsystems closed 7 months ago

streamingsystems commented 8 months ago

When you meet a bug, please open the issue including a title prefixed by '[bug]' and describe it as follows:

Hi Winshining,

We are seeing something that we wanted to bring to your attention.

We have nginx "Workers" running:

"Loop" a video or stream from an encoder to nginx flv module:

ffmpeg -stream_loop -1 -re -i low.mp4 -map 0 -c copy -f flv rtmp://127.0.0.1:1935/

Now while the stream is playing, reload nginx:

sytemctl reload nginx

When the reload is completed you will see the worker processes are trying to shut down but are "stuck" open and other workers have started:

nginx 101240 101239 1 Oct28 ? 00:10:52 nginx: worker process is shutting down nginx 101241 101239 0 Oct28 ? 00:02:51 nginx: worker process is shutting down nginx 101242 101239 0 Oct28 ? 00:02:49 nginx: worker process is shutting down nginx 101243 101239 0 Oct28 ? 00:02:55 nginx: worker process is shutting down nginx 293759 101239 0 07:58 ? 00:00:00 nginx: worker process nginx 293760 101239 0 07:58 ? 00:00:00 nginx: worker process nginx 293761 101239 0 07:58 ? 00:00:00 nginx: worker process nginx 293762 101239 0 07:58 ? 00:00:00 nginx: worker process

Then, start that exact same command again (to the same stream key or another key):

ffmpeg -stream_loop -1 -re -i low.mp4 -map 0 -c copy -f flv rtmp://127.0.0.1:1935/

After some time the error log will fill up with many messages like this:

2023/10/29 08:02:09 [error] 293759#293759: connect() to unix:/tmp/nginx-http-flv.2 failed (111: Connection refused) 2023/10/29 08:02:09 [error] 293759#293759: connect() to unix:/tmp/nginx-http-flv.3 failed (111: Connection refused) 2023/10/29 08:02:09 [error] 293759#293759: connect() to unix:/tmp/nginx-http-flv.0 failed (111: Connection refused) 2023/10/29 08:02:09 [error] 293759#293759: connect() to unix:/tmp/nginx-http-flv.1 failed (111: Connection refused) 2023/10/29 08:02:09 [error] 293759#293759: connect() to unix:/tmp/nginx-http-flv.2 failed (111: Connection refused) 2023/10/25 14:09:31 [error] 627627#627627: *397322 auto_push: connect failed: slot=17 pid=674541

Expected behavior (期望行为)

I am not sure what the expected behavior is exactly. I would think that somehow the open RTMP connections would nicely migrate to the new workers after a reload.

Actual behavior (实际行为)

It seems like the old workers are running and the new workers are running and there is many problems with sockets or something.

OS and Nginx version (操作系统和 Nginx 版本号)

CENTOS

[ss@origin-staging ~]$ nginx -V nginx version: nginx/1.24.0 built by gcc 11.3.1 20220421 (Red Hat 11.3.1-2) (GCC) built with OpenSSL 3.0.1 14 Dec 2021 (running with OpenSSL 3.0.7 1 Nov 2022) TLS SNI support enabled configure arguments: --prefix=/etc/nginx --sbin-path=/usr/sbin/nginx --modules-path=/usr/lib64/nginx/modules --conf-path=/etc/nginx/nginx.conf --error-log-path=/var/log/nginx/error.log --http-log-path=/var/log/nginx/access.log --pid-path=/var/run/nginx.pid --lock-path=/var/run/nginx.lock --http-client-body-temp-path=/var/cache/nginx/client_temp --http-proxy-temp-path=/var/cache/nginx/proxy_temp --http-fastcgi-temp-path=/var/cache/nginx/fastcgi_temp --http-uwsgi-temp-path=/var/cache/nginx/uwsgi_temp --http-scgi-temp-path=/var/cache/nginx/scgi_temp --user=nginx --group=nginx --with-compat --with-file-aio --with-threads --with-http_addition_module --with-http_auth_request_module --with-http_dav_module --with-http_flv_module --with-http_gunzip_module --with-http_gzip_static_module --with-http_mp4_module --with-http_random_index_module --with-http_realip_module --with-http_secure_link_module --with-http_slice_module --with-http_ssl_module --with-http_stub_status_module --with-http_sub_module --with-http_v2_module --with-mail --with-mail_ssl_module --with-stream --with-stream_realip_module --with-stream_ssl_module --with-stream_ssl_preread_module --with-openssl-opt=enable-ktls --with-cc-opt='-O2 -flto=auto -ffat-lto-objects -fexceptions -g -grecord-gcc-switches -pipe -Wall -Werror=format-security -Wp,-D_FORTIFY_SOURCE=2 -Wp,-D_GLIBCXX_ASSERTIONS -specs=/usr/lib/rpm/redhat/redhat-hardened-cc1 -fstack-protector-strong -specs=/usr/lib/rpm/redhat/redhat-annobin-cc1 -m64 -march=x86-64-v2 -mtune=generic -fasynchronous-unwind-tables -fstack-clash-protection -fcf-protection -fPIC' --with-ld-opt='-Wl,-z,relro -Wl,-z,now -pie'

Configuration file (配置文件)

nginx.conf:

user nginx; worker_processes auto; worker_rlimit_nofile 32768; error_log /var/log/nginx/error.log; pid /run/nginx.pid;

Load dynamic modules. See /usr/share/doc/nginx/README.dynamic.

include /usr/share/nginx/modules/*.conf;

events { worker_connections 2048; use epoll; multi_accept on; }

rtmp conf:

rtmp_auto_push on;

rtmp {

out_queue           4096;
out_cork            8;
max_streams         256;
timeout             2s;
drop_idle_publisher 30s;
respawn off;

server {
    listen 1935;
    listen [::]:1935;

....

}

Steps to reproduce the behavior (复现问题步骤)

See above:

Error log if any (错误日志)

The log fills up with these errors:

2023/10/29 08:02:09 [error] 293759#293759: connect() to unix:/tmp/nginx-http-flv.2 failed (111: Connection refused) 2023/10/29 08:02:09 [error] 293759#293759: connect() to unix:/tmp/nginx-http-flv.3 failed (111: Connection refused) 2023/10/29 08:02:09 [error] 293759#293759: connect() to unix:/tmp/nginx-http-flv.0 failed (111: Connection refused) 2023/10/29 08:02:09 [error] 293759#293759: connect() to unix:/tmp/nginx-http-flv.1 failed (111: Connection refused) 2023/10/29 08:02:09 [error] 293759#293759: connect() to unix:/tmp/nginx-http-flv.2 failed (111: Connection refused) 2023/10/29 08:02:09 [error] 293759#293759: connect() to unix:/tmp/nginx-http-flv.3 failed (111: Connection refused) 2023/10/29 08:02:10 [error] 293759#293759: connect() to unix:/tmp/nginx-http-flv.0 failed (111: Connection refused) 2023/10/29 08:02:10 [error] 293759#293759: connect() to unix:/tmp/nginx-http-flv.1 failed (111: Connection refused)

And other errors like this:

2023/10/25 14:09:31 [error] 627627#627627: *397322 auto_push: connect failed: slot=17 pid=674541

streamingsystems commented 8 months ago

We had an update last night which caused nginx to reload and this morning the log was filled:

2023/10/30 06:50:11 [error] 1209079#1209079: connect() to unix:/tmp/nginx-http-flv.1 failed (111: Connection refused) 2023/10/30 06:50:11 [error] 1209079#1209079: connect() to unix:/tmp/nginx-http-flv.2 failed (111: Connection refused) 2023/10/30 06:50:11 [error] 1209079#1209079: connect() to unix:/tmp/nginx-http-flv.3 failed (111: Connection refused) 2023/10/30 06:50:11 [error] 1209079#1209079: connect() to unix:/tmp/nginx-http-flv.4 failed (111: Connection refused) 2023/10/30 06:50:11 [error] 1209079#1209079: connect() to unix:/tmp/nginx-http-flv.5 failed (111: Connection refused) 2023/10/30 06:50:11 [error] 1209079#1209079: connect() to unix:/tmp/nginx-http-flv.6 failed (111: Connection refused) 2023/10/30 06:50:11 [error] 1209079#1209079: connect() to unix:/tmp/nginx-http-flv.7 failed (111: Connection refused) 2023/10/30 06:50:11 [error] 1209079#1209079: connect() to unix:/tmp/nginx-http-flv.8 failed (111: Connection refused) 2023/10/30 06:50:11 [error] 1209079#1209079: connect() to unix:/tmp/nginx-http-flv.9 failed (111: Connection refused) 2023/10/30 06:50:11 [error] 1209079#1209079: connect() to unix:/tmp/nginx-http-flv.10 failed (111: Connection refused) 2023/10/30 06:50:11 [error] 1209079#1209079: connect() to unix:/tmp/nginx-http-flv.11 failed (111: Connection refused) 2023/10/30 06:50:11 [error] 1209079#1209079: connect() to unix:/tmp/nginx-http-flv.12 failed (111: Connection refused) 2023/10/30 06:50:11 [error] 1209079#1209079: connect() to unix:/tmp/nginx-http-flv.13 failed (111: Connection refused) 2023/10/30 06:50:11 [error] 1209079#1209079: connect() to unix:/tmp/nginx-http-flv.14 failed (111: Connection refused)

2023/10/30 06:50:11 [error] 1209079#1209079: *1131613 auto_push: connect failed: slot=1 pid=1209083 name='****', client: 127.0.0.1, server: 0.0.0.0:1935

streamingsystems commented 8 months ago

Hi,

Is there any way we can pay you to fix this issue, it's really affecting us in production, every time we reload nginx we see these errors and my guess is the workers are not working correctly.

streamingsystems commented 7 months ago

Hello, I was just checking to see if we can pay you money to fix this issue for us?

winshining commented 7 months ago

nginx 101240 101239 1 Oct28 ? 00:10:52 nginx: worker process is shutting down nginx 101241 101239 0 Oct28 ? 00:02:51 nginx: worker process is shutting down nginx 101242 101239 0 Oct28 ? 00:02:49 nginx: worker process is shutting down nginx 101243 101239 0 Oct28 ? 00:02:55 nginx: worker process is shutting down nginx 293759 101239 0 07:58 ? 00:00:00 nginx: worker process nginx 293760 101239 0 07:58 ? 00:00:00 nginx: worker process nginx 293761 101239 0 07:58 ? 00:00:00 nginx: worker process nginx 293762 101239 0 07:58 ? 00:00:00 nginx: worker process

RTMP connections are actually long live TCP connections, when Nginx is reload, the old worker processes (those show "nginx: woker process is shutting down") will not exit until all connections are finished, this is an inherent feature of Nginx itself.

2023/10/29 08:02:09 [error] 293759#293759: connect() to unix:/tmp/nginx-http-flv.2 failed (111: Connection refused) 2023/10/29 08:02:09 [error] 293759#293759: connect() to unix:/tmp/nginx-http-flv.3 failed (111: Connection refused) 2023/10/29 08:02:09 [error] 293759#293759: connect() to unix:/tmp/nginx-http-flv.0 failed (111: Connection refused) 2023/10/29 08:02:09 [error] 293759#293759: connect() to unix:/tmp/nginx-http-flv.1 failed (111: Connection refused) 2023/10/29 08:02:09 [error] 293759#293759: connect() to unix:/tmp/nginx-http-flv.2 failed (111: Connection refused) 2023/10/25 14:09:31 [error] 627627#627627: *397322 auto_push: connect failed: slot=17 pid=674541

As Nginx is reloading, the old worker processes do not immediately close old connections they have accepted but do not accept new connections either, including normal sockets and unix domain sockets.

Hello, I was just checking to see if we can pay you money to fix this issue for us? Thank you! I'm afraid that I can do nothing, as this is not a bug in this module.

BTW, I have ever donated successfully to Eric S. Raymond monthly for a year with Paypal but I have never received any donation. I don't know why it doesn't work

streamingsystems commented 7 months ago

Ok thanks for the information so this is not a bug or a problem just the way nginx works. We can address on our end then.

As for the donation, I did send the money and I spoke to PayPal and they confirmed it was sent. See attached image.

2023-11-17_08-58-54