kaltura / nginx-vod-module

NGINX-based MP4 Repackager
GNU Affero General Public License v3.0
2k stars 439 forks source link

Strange segfault on usual config (ts and m3u8 requests) #1471

Open jinsteros opened 1 year ago

jinsteros commented 1 year ago

Hello. Hope to find some help with my situation

I have 1 caching ssd server and 2 same storage servers with 4x10 TB HDD streaming HLS from small mp4 files (100-500 MB each one). Website is very old, I was streaming with the same config many years without changing anything and without any problems. But now after installing new storage servers with Debian 11 Bullseye I found strange 502 errors in my access log for 1-5% of traffic (both ts and m3u8 requests). In nginx error log on caching server this errors looks like [error] 447386#447386: *352454516 upstream prematurely closed connection while reading upstream, request: "GET /Finland/59769/59769_720p.mp4/seg-39-v1-a1.ts HTTP/1.1", upstream: "https://x.x.x.x:443/Finland/59769/59769_720p.mp4/seg-39-v1-a1.ts" [...]

Then on storage servers (both of them) I found the reason as I think:

Sep  7 02:07:35 Debian-1107-bullseye-amd64-base kernel: [3414572.886896] nginx[525322]: segfault at 560655656e00 ip 0000560655656e00 sp 00007fff7b495b68 error 15
Sep  7 02:07:35 Debian-1107-bullseye-amd64-base kernel: [3414572.889201] Code: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 11 10 00 00 00 00 00 00 <f8> 7d 65 55 06 56 00 00 00 7e 65 55 06 56 00 00 c0 83 5d 55 06 56
Sep  7 02:08:16 Debian-1107-bullseye-amd64-base kernel: [3414614.413586] nginx[525321]: segfault at 56065594a000 ip 000056065594a000 sp 00007fff7b495b68 error 15
Sep  7 02:08:16 Debian-1107-bullseye-amd64-base kernel: [3414614.419755] Code: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 ba d5 97 f7 c8 89 cc 11 10 00 00 00 00 00 00 <fb> af 94 55 06 56 00 00 00 b0 94 55 06 56 00 00 00 da 63 55 06 56

Many errors, every 10-20 seconds, randomly with ts or m3u8 requests, on both storage servers with same configs.

Nginx is configured like this:

user www-data;
worker_processes  auto;
worker_rlimit_nofile  32768;
pid /run/nginx.pid;

worker_rlimit_core  3000M;
working_directory   /tmp/;

events {
    worker_connections  32768;
    use epoll;
}

http {
    tcp_nopush on;
    tcp_nodelay on;
    keepalive_timeout 35;
    server_names_hash_max_size 1024;
    server_names_hash_bucket_size 512;
    types_hash_max_size 2048;
    server_tokens off;
    fastcgi_buffers 8 32k;
    fastcgi_buffer_size 64k;
    client_body_timeout 120s;
    client_max_body_size 2000m;

    client_header_timeout  3m;
    send_timeout     3m;
    connection_pool_size  256;
    client_header_buffer_size 4k;
    large_client_header_buffers 4 32k;
    request_pool_size  4k;
    postpone_output  1460;

...

server {
          listen 443 ssl;
          server_name ...;

          ssl_protocols TLSv1 TLSv1.1 TLSv1.2;
          ssl_ciphers kEECDH+AES128:kEECDH:kEDH:-3DES:kRSA+AES128:kEDH+3DES:DES-CBC3-SHA:!RC4:!aNULL:!eNULL:!MD5:!EXPORT:!LOW:!SEED:!CAMELLIA:!IDEA:!PSK:!SRP:!SSLv2;
          ssl_prefer_server_ciphers on;
          ssl_certificate /etc/ssl/certs/nginx.crt;
          ssl_certificate_key /etc/ssl/private/nginx.key;
          ssl_dhparam /root/ssl/dhparam.pem;

        vod_mode local;
        vod_fallback_upstream_location /fallback;
        vod_last_modified 'Sun, 19 Nov 2000 08:52:00 GMT';
        vod_last_modified_types *;

        vod_metadata_cache metadata_cache 2512m;
        vod_response_cache response_cache 128m;

        gzip on;
        gzip_types application/vnd.apple.mpegurl;

        open_file_cache          max=1000 inactive=5m;
        open_file_cache_valid    2m;
        open_file_cache_min_uses 1;
        open_file_cache_errors   on;
        aio on;

        location / {

            location ~ \.(ts)$
            {
        root /var/www;
                vod hls;
                add_header Access-Control-Allow-Headers '*';
                add_header Access-Control-Expose-Headers 'Server,range,Content-Length,Content-Range';
                add_header Access-Control-Allow-Methods 'GET, HEAD, OPTIONS';
                add_header Access-Control-Allow-Origin '*';
                expires 30d;
            }
            location ~ \.(m3u8)$
            {
        root /var/www;
                vod hls;
                add_header Access-Control-Allow-Headers '*';
                add_header Access-Control-Expose-Headers 'Server,range,Content-Length,Content-Range';
                add_header Access-Control-Allow-Methods 'GET, HEAD, OPTIONS';
                add_header Access-Control-Allow-Origin '*';
                expires 30d;
            }
        }

}

At first I updated everything, but this can't solve the problem. So now I have:

nginx version: nginx/1.24.0 with the latest vod-module available here

nginx -V
nginx version: nginx/1.24.0
built by gcc 10.2.1 20210110 (Debian 10.2.1-6) 
built with OpenSSL 1.1.1n  15 Mar 2022
TLS SNI support enabled
configure arguments: 
--prefix=/etc/nginx 
--sbin-path=/usr/sbin/nginx 
--conf-path=/etc/nginx/nginx.conf 
--error-log-path=/var/log/nginx/error.log 
--http-log-path=/var/log/nginx/access.log 
--pid-path=/var/run/nginx.pid 
--lock-path=/var/run/nginx.lock 
--http-client-body-temp-path=/var/cache/nginx/client_temp 
--http-proxy-temp-path=/var/cache/nginx/proxy_temp 
--http-fastcgi-temp-path=/var/cache/nginx/fastcgi_temp 
--http-uwsgi-temp-path=/var/cache/nginx/uwsgi_temp 
--http-scgi-temp-path=/var/cache/nginx/scgi_temp 
--user=nginx 
--group=nginx 
--with-http_ssl_module 
--with-http_realip_module 
--with-http_addition_module 
--with-http_sub_module 
--with-http_dav_module 
--with-http_flv_module 
--with-http_mp4_module 
--with-http_gunzip_module 
--with-http_gzip_static_module 
--with-http_random_index_module 
--with-http_secure_link_module 
--with-http_stub_status_module 
--with-mail --with-mail_ssl_module 
--with-http_secure_link_module 
--with-file-aio --with-cc-opt='-g -O2 -Wp,-D_FORTIFY_SOURCE=2' 
--with-ld-opt=-Wl,
--as-needed 
--with-ipv6 
--with-threads 
--add-module=/root/nginx-vod-module-master 
--add-module=/root/nginx-secure-token-module-master 
--add-module=/root/nginx_limit_speed_module-master

Then I tried with core dump and gdb:

gdb /usr/sbin/nginx /tmp/core 
GNU gdb (Debian 10.1-1.7) 10.1.90.20210103-git
Copyright (C) 2021 Free Software Foundation, Inc.
License GPLv3+: GNU GPL version 3 or later <http://gnu.org/licenses/gpl.html>
This is free software: you are free to change and redistribute it.
There is NO WARRANTY, to the extent permitted by law.
Type "show copying" and "show warranty" for details.
This GDB was configured as "x86_64-linux-gnu".
Type "show configuration" for configuration details.
For bug reporting instructions, please see:
<https://www.gnu.org/software/gdb/bugs/>.
Find the GDB manual and other documentation resources online at:
    <http://www.gnu.org/software/gdb/documentation/>.

For help, type "help".
Type "apropos word" to search for commands related to "word"...
Reading symbols from /usr/sbin/nginx...

warning: Can't open file /dev/zero (deleted) during file-backed mapping note processing

warning: Can't open file /[aio] (deleted) during file-backed mapping note processing
[New LWP 532450]
[Thread debugging using libthread_db enabled]
Using host libthread_db library "/lib/x86_64-linux-gnu/libthread_db.so.1".
Core was generated by `nginx: worker process                           '.
Program terminated with signal SIGSEGV, Segmentation fault.
#0  0x000055fa169dee00 in ?? ()
(gdb) bt
#0  0x000055fa169dee00 in ?? ()
#1  0x000055fa14a50820 in ngx_http_run_posted_requests (c=0x7f015a0225b0) at src/http/ngx_http_request.c:2424
#2  0x000055fa14a52cec in ngx_http_process_request_headers (rev=rev@entry=0x7f0159d21250) at src/http/ngx_http_request.c:1519
#3  0x000055fa14a53296 in ngx_http_process_request_line (rev=0x7f0159d21250) at src/http/ngx_http_request.c:1163
#4  0x000055fa14a394a8 in ngx_epoll_process_events (cycle=<optimized out>, timer=<optimized out>, flags=<optimized out>) at src/event/modules/ngx_epoll_module.c:901
#5  0x000055fa14a2fa70 in ngx_process_events_and_timers (cycle=cycle@entry=0x55fa16a046a0) at src/event/ngx_event.c:248
#6  0x000055fa14a372c8 in ngx_worker_process_cycle (cycle=cycle@entry=0x55fa16a046a0, data=data@entry=0x4) at src/os/unix/ngx_process_cycle.c:721
#7  0x000055fa14a35a79 in ngx_spawn_process (cycle=cycle@entry=0x55fa16a046a0, proc=0x55fa14a37250 <ngx_worker_process_cycle>, data=0x4, name=0x55fa14b19933 "worker process", 
    respawn=respawn@entry=4) at src/os/unix/ngx_process.c:199
#8  0x000055fa14a37f89 in ngx_reap_children (cycle=0x55fa16a046a0) at src/os/unix/ngx_process_cycle.c:598
#9  ngx_master_process_cycle (cycle=<optimized out>) at src/os/unix/ngx_process_cycle.c:174
#10 0x000055fa14a0f63e in main (argc=<optimized out>, argv=<optimized out>) at src/core/nginx.c:383
(gdb) 

and no idea what is it and how to solve it. I would be very grateful for any tips

jinsteros commented 1 year ago

Ok, found problem already.

  1. vod-module can't generate 404 errors for missing mp4 files, generating 502 instead
  2. nginx healthcheck for upstream recognizing 502 as a problem from upstream
  3. When someone accessed missing files many times, nginx healthcheck turning off both storage servers for few seconds
  4. Problem for both missing and working files as a result for few seconds

To solve it I have to disable nginx healthchecks for upstreams because vod-module can produce only 502 errors unfortunately

And during this 502 on missing files, vod-module also produces this segfaults. As I found - all segfaults in logs are only from missing files requests. Example /Finland/dir2/1642/1642,144,240,360,480,720,p.mp4.urlset/master.m3u8 - if in the target dir we have only 144,240,360,480 mp4 files and no 720p.mp4, then accessing 1642,144,240,360,480,720,p.mp4.urlset/master.m3u8 manifest will cause a 502+segfault in system log in my case

Introducing 404 errors in vod-module can solve it, I think