Backend pressure - Githubissues

ySp-chld commented 1 year ago

Following the 2 issue #1450 and #1334, I see that our storage has some pressure from small read chunk.

By trying to change vod_initial_read_size to 16K I can see some problematic behavior in the way to read the remote file.
If vod_initial_read_size is larger than file header it will first ask for 16k :

GET /thr/storage/mp4-320kbps/02/01/0f/file1.mp4 HTTP/1.1
Host: origin.test
User-Agent: VLC/3.0.18 LibVLC/3.0.18
Range: bytes=0-16383
X-Forwarded-For: 10.x.x.60

HTTP/1.1 206 Partial Content
Cache-Control: s-maxage=604800
Content-Type: audio/mp4
Content-Range: bytes 0-16383/3029620
Last-Modified: Thu, 10 May 2018 16:36:55 GMT
Date: Tue, 18 Jul 2023 13:31:42 GMT
Content-Length: 16384

then it will rerun a request to re download the same chunk :

GET /thr/storage/mp4-320kbps/02/01/0f/file1.mp4 HTTP/1.1
Host: origin.test
User-Agent: VLC/3.0.18 LibVLC/3.0.18
Cache-Control: no-cache
X-Forwarded-For: 10.x.x.60
Range: bytes=0-16383

HTTP/1.1 206 Partial Content
Cache-Control: s-maxage=604800
Content-Type: audio/mp4
Content-Range: bytes 0-16383/3029620
Last-Modified: Thu, 10 May 2018 16:36:55 GMT
Date: Tue, 18 Jul 2023 13:31:42 GMT
Content-Length: 16384

Then it finally start streaming the file :

GET /thr/storage/mp4-320kbps/02/01/0f/file1.mp4 HTTP/1.1
Host: origin.test
User-Agent: VLC/3.0.18 LibVLC/3.0.18
Cache-Control: no-cache
X-Forwarded-For: 10.x.x.60
Range: bytes=13812-94616

HTTP/1.1 206 Partial Content
Cache-Control: s-maxage=604800
Content-Type: audio/mp4
Content-Range: bytes 13812-94616/3029620
Last-Modified: Thu, 10 May 2018 16:36:55 GMT
Date: Tue, 18 Jul 2023 13:31:42 GMT
Content-Length: 80805

It seems counter intuitive to download the file header twice before starting the streaming and this pattern matches with a lot of small request received on our storage. We are running last version of vod-module.

Thank you for your help.

erankor commented 1 year ago

I think your test is probably wrong, maybe you have multiple clients requests and therefore you see multiple reads for the mp4 header. For example - if you are not caching the video metadata, it will be read at least once for the manifest and once for the segments (and if you're not caching it, you should...). The reason I think the test is wrong is that there's an explicit condition for this case in the code - https://github.com/kaltura/nginx-vod-module/blob/master/vod/mp4/mp4_format.c#L194. You can validate it by enabling debug logs, or maybe better - add to the upstream requests something like ?req=$request_id. This will enable you to link the upstream requests with the specific request to nginx-vod-module that triggered them. If you add such an argument for debugging, I would expect that you'll see 2 different request ids in the 2 upstream requests for range 0-16383 - meaning that the module didn't read it more than once for a single request.

ySp-chld commented 1 year ago

EDIT: Can confirm I'm the only one using this test server and behavior is reproducible. even with metadata cache

OK, I will try the request ID trick to see if 2 different request are emitted however those has been run on a test platform where only me and my vlc client were testing so I don't see where a second request would come from. Also here's our current (currated) nginx configuration :

user  nginx;
worker_processes  auto;
load_module modules/ngx_http_secure_token_filter_module.so;
load_module modules/ngx_http_vod_module.so;

error_log  /var/log/nginx/error.log warn;
pid        /var/run/nginx.pid;

events {
    worker_connections  1024;
    multi_accept on;
    use epoll;
}

http {
    include       /etc/nginx/mime.types;
    default_type  application/octet-stream;

    log_format  main  '$http_x_forwarded_for $remote_addr - $remote_user [$time_local] "$request" '
                '$status $bytes_sent $request_time "$http_referer" "$http_user_agent" "-" - '
                '"$sent_http_x_kaltura" "$http_host" $pid $sent_http_x_kaltura_session - '
                '$request_length "$sent_http_content_range" "$sent_http_cache_control" $connection ';

    access_log  /var/log/nginx/access.log  main;

    sendfile on;
    tcp_nopush on;
    tcp_nodelay on;

    keepalive_timeout 60;
    keepalive_requests 1000;
    client_header_timeout 20;
    client_body_timeout 20;
    reset_timedout_connection on;
    send_timeout 20;

    gzip on;
    gzip_types application/vnd.apple.mpegurl video/f4m application/dash+xml text/xml;
    upstream http.origin.test.domain.com {
        server http.origin.test.domain.com;
        keepalive 50;
    }

    vod_mode remote;
    vod_upstream_location /proxy_origin;

    vod_metadata_cache metadata_cache 512m;
    vod_mapping_cache mapping_cache 64m;
    vod_response_cache response_cache 64m;
    vod_performance_counters perf_counters;
    open_file_cache max=1000 inactive=5m;
    open_file_cache_valid 2m;
    open_file_cache_min_uses 1;
    open_file_cache_errors on;
    aio on;

    secure_token_ACDN $cdn_token {
        acl                     "https://url.domain.com$secure_token_baseuri*";
        end                     180m;
    }

    server {
        listen 80 default;
        server_name $hostname;
        secure_token $query_string;
        secure_token_types text/xml application/vnd.apple.mpegurl;

        location ~* ^/proxy_origin/(ahlsaes)/ {
                internal;
                rewrite ^/proxy_origin/(?:ahlsaes)/(.*)$ /thr/$1 break;
                proxy_pass http://http.origin.test.domain.com;
                proxy_http_version 1.1;
                proxy_set_header Host "http.origin.test.domain.com";
                proxy_set_header Connection "";
        }

        location /ahlsaes/ {
                secure_token $cdn_token&$query_string;
                vod hls;
                vod_base_url "";
                vod_hls_encryption_method aes-128;
                vod_secret_key "akey";
                vod_bootstrap_segment_durations 2000;
                vod_bootstrap_segment_durations 2000;
                vod_bootstrap_segment_durations 2000;
                vod_manifest_segment_durations_mode accurate;
                vod_segment_duration 4000;
                vod_initial_read_size 16K;

                add_header Access-Control-Allow-Headers "*";
                add_header Access-Control-Expose-Headers "Server,range,Content-Length,Content-Range";
                add_header Access-Control-Allow-Methods "GET, HEAD, OPTIONS";
                add_header Access-Control-Allow-Origin "*";
                add_header Cache-Control "no-store, no-cache, must-revalidate";
        }

        error_page   500 502 503 504  /50x.html;
        location = /50x.html {
                root   html;
        }
    }
}

I'll let you know the request ID once I've been able to run more test with request_id enable.

ySp-chld commented 1 year ago

Hello, Allow me to insists, the issue is still present while I'm the only one using our test server and cache is enable.
Thank you.

erankor commented 1 year ago

Did you run the test I suggested?

ySp-chld commented 1 year ago

Yes I no longer have the trace I'll redo them but the first test I run I was already the only one to access those files. I was doing it on a test env deployed only for my usage. I'll redo the test and include tcpdump from my VLC client.

ySp-chld commented 1 year ago

OK so I've done 2 new tests, one with 8k initial block size here's are the call from hls to backend file to download mp4 file :

GET /url/02/01/0f/file.mp4 HTTP/1.1
Host: origin.domain.com
User-Agent: VLC/3.0.18 LibVLC/3.0.18
Range: bytes=0-8191
X-Forwarded-For: 10.9.8.60

GET /url/02/01/0f/file.mp4 HTTP/1.1
Host: origin.domain.com
User-Agent: VLC/3.0.18 LibVLC/3.0.18
Range: bytes=8192-13795
X-Forwarded-For: 10.9.8.60

GET /url/02/01/0f/file.mp4 HTTP/1.1
Host: origin.domain.com
User-Agent: VLC/3.0.18 LibVLC/3.0.18
Cache-Control: no-cache
X-Forwarded-For: 10.9.8.60
Range: bytes=0-8191

GET /url/02/01/0f/file.mp4 HTTP/1.1
Host: origin.domain.com
User-Agent: VLC/3.0.18 LibVLC/3.0.18
Cache-Control: no-cache
X-Forwarded-For: 10.9.8.60
Range: bytes=8192-13795

and here's the same with 16k :

GET /url/02/01/0f/file.mp4 HTTP/1.1
Host: origin.domain.com
User-Agent: VLC/3.0.18 LibVLC/3.0.18
Range: bytes=0-16383
X-Forwarded-For: 10.9.8.60

GET /url/02/01/0f/file.mp4 HTTP/1.1
Host: origin.domain.com
User-Agent: VLC/3.0.18 LibVLC/3.0.18
Cache-Control: no-cache
X-Forwarded-For: 10.9.8.60
Range: bytes=0-16383

GET /url/02/01/0f/file.mp4 HTTP/1.1
Host: origin.domain.com
User-Agent: VLC/3.0.18 LibVLC/3.0.18
Cache-Control: no-cache
X-Forwarded-For: 10.9.8.60
Range: bytes=13812-94616

both test were done after a nginx restart (no cache) and I'm the only one using this server in a test environment. In both case kaltura module download file headers twice inducing pressure on storage in the backend.

Hope those informations help.

erankor commented 1 year ago

you still haven't added the request id param, as I previously requested, so it is not possible to see whether these requests originated from multiple requests to nginx or not

kaltura / nginx-vod-module

Backend pressure #1452