kaltura / nginx-vod-module

NGINX-based MP4 Repackager
GNU Affero General Public License v3.0
2k stars 439 forks source link

MSS performance: clear vs crypted content #491

Closed utu2016 closed 7 years ago

utu2016 commented 7 years ago

Hi, we're doing some performance test with MSS content. We have same content both for Clear and Crypted cases. We're using a Jmeter tool to inject the requests. When we execute the test with crypted content, the throughtput/sec is about 150, but for the clear content the throughtput/sec is about 50. We have done a lot of test changing the number of thread/users, working process, with/without cache configuration parameters (vod_metadata_cache metadata_cache 512m; vod_response_cache response_cache 128m;), but nothing change.

The requests urls are: manifest, video & audio fragments.

Could you explain us why the clear mode is too slow? How can we improve the throughput ?

In the nginx configuration file the following setting are presents:

worker_processes auto; events { worker_connections 4096; multi_accept on; } keepalive_timeout 30; keepalive_requests 100000; reset_timedout_connection on; client_body_timeout 10; send_timeout 2;

server { listen 80; server_name localhost; client_max_body_size 0;

    vod_performance_counters perf_counters;
    # vod caches
    vod_metadata_cache metadata_cache 512m;
    vod_response_cache response_cache 128m;

   location /mss/enc_req/ {
        if_modified_since off;
        vod mss;
        vod_drm_enabled on;
        vod_mode local;
        vod_mss_manifest_file_name_prefix Manifest;
        vod_segment_duration 5000;
        vod_manifest_segment_durations_mode accurate;
        vod_align_segments_to_key_frames on;
        alias $MY_FOLDER;
        vod_drm_clear_lead_segment_count 0;
        vod_drm_request_uri $hss_drm_uri_data;
        vod_drm_upstream_location /__child_request__/;
        expires 100d;
        add_header Last-Modified "";
        add_header Cache-Control max-age=604800;
        gzip on;
        gzip_types application/vnd.apple.mpegurl;
        etag off;
    }
   location /ssc/test/ {
        vod mss;
        vod_mode local;
        if_modified_since off;
        vod_mss_manifest_file_name_prefix Manifest;
        vod_segment_duration 5000;
        vod_manifest_segment_durations_mode accurate;
        vod_align_segments_to_key_frames on;
        alias $MY_FOLDER;
        expires 100d;
        add_header Last-Modified "";
        add_header Cache-Control max-age=604800;
        add_header Test true;
        gzip on;
        gzip_types application/vnd.apple.mpegurl;
        etag off;
    }
    location = /vod_status {
       vod_status;
    }
}

thanks.

utu2016 commented 7 years ago

moreover we see that if we remove the gzip configuration data, we obtain the following throughput: CLEAR: 52.6 CRYPTED: 212 Why there's a big difference in case of crypted content ? and why not for the clear one? thanks

erankor commented 7 years ago

did you go over this - https://github.com/kaltura/nginx-vod-module/#performance-recommendations ? For example, I don't see any mention of aio in your configuration, aio is fundamental for getting reasonable performance in local/mapped modes.

utu2016 commented 7 years ago

Hi, yes we looked the url of performance recommendation. No, we didn't add aio module jet, we'll schedule to add it, but we'd like to understand why the crypted is more faster than clear mode before to add it.

Our expectation was that the clear is much more faster than the crypted one (because of GET request is done to obtain the key and AES encryption algoritm is executed. These actions are not present in clear mode).

Anyway, we're working in local mode. thanks

erankor commented 7 years ago

The difference between clear and encrypted is expected to be very small, in new CPUs that support AES-NI. IIRC, when I benchmarked it, I managed to encrypt/decrypt 256MB/sec on a single core. Therefore, my guess is that the problem is in the way you are testing, e.g. if you first test clear, it can get all MP4 file data into the OS file system cache, and then when you test encrypted, it will appear to work much faster. You have to run the test multiple times (clear,encrypted,clear,encrypted,etc.) to get reliable results.

utu2016 commented 7 years ago

we agree with your expectation..and this is the reason why we asked your support. we run the test more and more ....with different combinations (clear,encrypted,clear,encrypted,etc.) , stopping/starting nginx, recompiled it and clearing the cache :

Anyway now we're setup a test with aio enable; we'll give you the result.

utu2016 commented 7 years ago

test result with aio enable: crypted is 190/210 throughput clear is 52 throughput seems that this change has no impact...

erankor commented 7 years ago

I ran a simple test with ab, I'm getting the same time with clear / encrypted (emss = encrypted, mss = clear) -

ab -n1000 -c150 'pa-front-vod-stg1/emss/p/2035982/sp/203598200/serveFlavor/entryId/0_nf0aybik/v/12/flavorId/0_2nh6ruyp/name/a.mp4.urlset/QualityLevels(398336)/Fragments(video=0)'
...
Document Length:        216842 bytes
...
Percentage of the requests served within a certain time (ms)
  50%     31
  66%    227
  75%    267
  80%    432
  90%    849
  95%   1164
  98%   1769
  99%   1962
 100%   2158 (longest request)

ab -n1000 -c150 'pa-front-vod-stg1/mss/p/2035982/sp/203598200/serveFlavor/entryId/0_nf0aybik/v/12/flavorId/0_2nh6ruyp/name/a.mp4.urlset/QualityLevels(398336)/Fragments(video=0)'
...
Document Length:        215067 bytes
...
Percentage of the requests served within a certain time (ms)
  50%     31
  66%    227
  75%    253
  80%    410
  90%    849
  95%   1373
  98%   1786
  99%   1972
 100%   2153 (longest request)
erankor commented 7 years ago

Btw, one significant difference between clear and encrypted is that in default configuration encrypted has to process the whole chunk before it starts producing any output. This is because the mp4 metadata is not enough to understand the layout of an encrypted segment, the module has to read all the frames for that. If for some reason in your test you are measuring the time from first response byte to full response, this can explain why encrypted produces better results. If in your videos there's a single h264 NAL unit per frame (this is usually the case for progressive scan videos produces with x264) you can set vod_min_single_nalu_per_frame_segment to a value greater than 0. This will enable the module to figure out the layout of encrypted segments without having to read all frames, allowing it to return the output as it is being built (like in clear content). We are setting this parameter to '2', since the first segment has a NAL unit of the h264 copyright, and this assumption does not apply to it.

utu2016 commented 7 years ago

Hi, the output of test done with ab tool follows. We use same input .mp4 both for clear & crypt test. setting vod_min_single_nalu_per_frame_segment 2; we see that 'Requests per second' a little bit decreased. But our problem is to increase the clear mode.

[root@CentOS-Jemeter-Linux9 ~]# ab -n1000 -c150 'IP_SERVER/CRYPT/20161129172021_7252R_137739961/20161129172021_7252R_137739961.ism/QualityLevels(300000)/Fragments(video=0)' This is ApacheBench, Version 2.3 <$Revision: 655654 $> Copyright 1996 Adam Twiss, Zeus Technology Ltd, http://www.zeustech.net/ Licensed to The Apache Software Foundation, http://www.apache.org/

Benchmarking IP_SERVER (be patient) Completed 100 requests Completed 200 requests Completed 300 requests Completed 400 requests Completed 500 requests Completed 600 requests Completed 700 requests Completed 800 requests Completed 900 requests Completed 1000 requests Finished 1000 requests

Server Software: nginx Server Hostname: IP_SERVER Server Port: 80

Document Path: CRYPT/20161129172021_7252R_137739961/20161129172021_7252R_137739961.ism/QualityLevels(300000)/Fragments(video=0) Document Length: 85415 bytes

Concurrency Level: 150 Time taken for tests: 4.153 seconds Complete requests: 1000 Failed requests: 0 Write errors: 0 Total transferred: 85780062 bytes HTML transferred: 85516536 bytes Requests per second: 240.81 [#/sec] (mean) Time per request: 622.899 [ms] (mean) Time per request: 4.153 [ms] (mean, across all concurrent requests) Transfer rate: 20172.53 [Kbytes/sec] received

Connection Times (ms) min mean[+/-sd] median max Connect: 0 2 4.4 1 16 Processing: 197 603 101.9 621 840 Waiting: 195 531 96.0 530 753 Total: 213 605 102.0 623 856

Percentage of the requests served within a certain time (ms) 50% 623 66% 646 75% 660 80% 685 90% 720 95% 762 98% 844 99% 848 100% 856 (longest request)

[root@CentOS-Jemeter-Linux9 ~]# ab -n1000 -c150 'IP_SERVER/CLEAR/PROVA_R_ENC_3061/PROVA_R_ENC_3061.ism/QualityLevels(300000)/Fragments(video=0)' This is ApacheBench, Version 2.3 <$Revision: 655654 $> Copyright 1996 Adam Twiss, Zeus Technology Ltd, http://www.zeustech.net/ Licensed to The Apache Software Foundation, http://www.apache.org/

Benchmarking IP_SERVER (be patient) Completed 100 requests Completed 200 requests Completed 300 requests Completed 400 requests Completed 500 requests Completed 600 requests Completed 700 requests Completed 800 requests Completed 900 requests Completed 1000 requests Finished 1000 requests

Server Software: nginx Server Hostname: IP_SERVER Server Port: 80

Document Path: CLEAR/PROVA_R_ENC_3061/PROVA_R_ENC_3061.ism/QualityLevels(300000)/Fragments(video=0) Document Length: 82453 bytes

Concurrency Level: 150 Time taken for tests: 15.989 seconds Complete requests: 1000 Failed requests: 1 (Connect: 0, Receive: 0, Length: 0, Exceptions: 1) Write errors: 0 Total transferred: 83041856 bytes HTML transferred: 82764656 bytes Requests per second: 62.54 [#/sec] (mean) Time per request: 2398.358 [ms] (mean) Time per request: 15.989 [ms] (mean, across all concurrent requests) Transfer rate: 5071.94 [Kbytes/sec] received

Connection Times (ms) min mean[+/-sd] median max Connect: 0 3 2.9 2 12 Processing: 740 2283 636.8 2458 4022 Waiting: 53 1498 540.2 1547 2496 Total: 744 2287 636.8 2459 4023

Percentage of the requests served within a certain time (ms) 50% 2459 66% 2565 75% 2685 80% 2707 90% 2732 95% 3144 98% 3456 99% 3712 100% 4023 (longest request)

erankor commented 7 years ago

I see you have the performance counters enabled, did you check what takes most of the time in the clear requests? (Note that you can reset the counters by hitting the vod_status location with ?reset=1)

utu2016 commented 7 years ago

Hi, here the vod_status result after reset it: Clear data: vod_status_clear_kaltura.zip crypted data: vod_status_crypt_kaltura.zip Looking the 2 result, I see that in clear mode there's set and in the crypt mode is all '0'. Our storage is not in local, but this for all kind of contents (clear &crypt). Your help is appreciated. thanks

erankor commented 7 years ago

Looking at the xmls, it seems that open_file and async_read are significantly worse in the clear test, the read is more than 10x what it is in the encrypted test (900s vs 80s), open is x2 but it's only 1.5s in total. Btw, the read metric is a bit problematic since it's async, so if the nginx process can't handle the io completion because it's blocked on something, it will still be accounted as 'read'. A couple of other performance optimizations I can think of (don't know if they will make a difference) -

  1. increase worker_aio_requests to 1024 - the fact the clear test had value under 'read' may indicate nginx ran out of aio contexts (however this only took 7 millis in total, so may not make any difference...)
  2. enable asynchronous file open - add to nginx.conf - thread_pool open_file_pool threads=32; and: vod_open_file_thread_pool open_file_pool;
utu2016 commented 7 years ago

Test-1: same result (maybe little bit worse). Test-2: I'm not able to configure thread directives (i'm using nginix1-10.0) could you help me? running the nginx with this data I get: nginx: [emerg] unknown directive "thread_pool" in /usr/etc/nginx/nginx.conf I tried to put it in different context but nothing change.

Anyway, it's seem that problem is related to the continues/blocking reading action on content file in clear mode that doesn't happens in the crypt due to the delay for Getting DRM data. Is it right?

erankor commented 7 years ago

when you compile nginx, you need to pass --with-threads to configure

utu2016 commented 7 years ago

Here the results with the change you suggested: CLEAR: ab_clear_kaltura_thread.txt vod_status_clear_kaltura_thread.zip

CRYPT: ab_crypt_kaltura_thread.txt vod_status_crypt_kaltura_thread.zip

The crypt doesn't change, the clear mode is speed-up a little bit... Any other ideas? Thanks

erankor commented 7 years ago

You can try to look at some performance counters on the server while running the test - CPU utilization, disk IO, network etc. maybe it will give some hint on where the problem is. Another possibility is to run nginx with a profiler, maybe it will show nginx is blocked on something.

utu2016 commented 7 years ago

hi, we go on with the test to find out our performance problem. now we're using a local storage but we obtain the same results. we're using the module 1.10 version. When you executed the test with ab, which module version did you use? Could you send us your nginx.conf file ? maybe we forget some setting thanks

utu2016 commented 7 years ago

Hi Erankor, about this topic, we have event the problem describes above. Could you send us your configuration file to do much more checks? When you executed the test with ab, which module version did you use? thanks

erankor commented 7 years ago

I can't share my full configuration, but below is a narrowed down version of it containing the relevant parts. I've just verified this configuration produces very similar results to what I pasted before. I'm using latest master of nginx-vod-module, running on Ubuntu 14.

user  www-data;
worker_processes  auto;

error_log  /var/log/nginx/error_log;

pid     /var/run/nginx.pid;

events {
    worker_connections  4096;
    worker_aio_requests 1024;
    multi_accept on;
    use epoll;
}

thread_pool open_file_pool threads=32;

http {
    upstream kalapi {
        server api-server;
    }

    upstream udrm {
        server drm-server;
    }

    include    mime.types;
    default_type  application/octet-stream;

    log_format  main  '$remote_addr - $remote_user [$time_local] "$request" '
        '$status $bytes_sent $request_time "$http_referer" "$http_user_agent" '
        '"$sent_http_x_kaltura" "$http_host" $pid $sent_http_x_kaltura_session - '
        '$request_length "$sent_http_content_range" "$http_x_forwarded_for" '
        '"$http_x_forwarded_server" "$http_x_forwarded_host" "$sent_http_cache_control" - '
        '$connection $request_id ';

    access_log /var/log/nginx/access_log.gz main gzip flush=5m;

    server {
        listen     80;
        server_name  vod;

        # common nginx performance settings
        sendfile on;
        tcp_nopush on;
        tcp_nodelay on;
        keepalive_timeout 60;
        keepalive_requests 1000;
        client_header_timeout 20;
        client_body_timeout 20;
        reset_timedout_connection on;
        send_timeout 20;

        # debug headers
        requestid on;
        more_set_headers 'X-Vod-Me: $hostname';
        more_set_headers 'X-Vod-Session: $request_id';

        # manifest compression
        gzip  on;
        gzip_types application/vnd.apple.mpegurl video/f4m application/dash+xml text/xml;

        # common file caching / aio
        open_file_cache max=8192 inactive=5m;
        open_file_cache_valid 2m;
        open_file_cache_min_uses 1;
        open_file_cache_errors on;
        aio on;

        # common vod settings
        vod_mode mapped;
        vod_open_file_thread_pool open_file_pool;
        vod_max_metadata_size 256m;
        vod_ignore_edit_list on;
        vod_last_modified 'Sun, 19 Nov 2000 08:52:00 GMT';
        vod_output_buffer_pool 64k 128;
        vod_last_modified_types *;
        vod_expires 100d;
        vod_upstream_extra_args "pathOnly=1&clientTag=vod:$hostname-$request_id";
        vod_response_cache response_cache 128m;
        vod_upstream_location /kalapi_proxy;
        vod_max_mapping_response_size 4k;
        vod_min_single_nalu_per_frame_segment 2;

        # vod drm settings
        vod_drm_clear_lead_segment_count 1;
        vod_drm_upstream_location /udrm_proxy;
        vod_drm_request_uri "/system/ovp$vod_suburi";

        # shared memory zones
        vod_metadata_cache metadata_cache 4096m;
        vod_mapping_cache mapping_cache 64m;
        vod_drm_info_cache drm_cache 64m;
        vod_performance_counters perf_counters;

        # redirect server error pages to the static page /50x.html
        error_page   500 502 503 504  /50x.html;
        location = /50x.html {
            root   html;
        }

        # internal location for vod subrequests
        location ^~ /kalapi_proxy/ {
            internal;
            proxy_pass http://kalapi/;
            proxy_set_header Host $http_host;
        }

        location ^~ /udrm_proxy/ {
            internal;
            proxy_pass http://udrm/;
            proxy_set_header Host $http_host;
        }

        # static files (crossdomain.xml, robots.txt etc.)
        location / {
            root   /opt/nginx-vod-module-saas/static;

            more_set_headers 'Access-Control-Allow-Origin: *';
            expires 90d;
        }

        # serve flavor MSS regular
        location ~ ^/mss/p/\d+/(sp/\d+/)?serveFlavor/ {
            vod mss;
            vod_segment_duration 4000;
            vod_align_segments_to_key_frames on;
            vod_manifest_segment_durations_mode accurate;

            more_set_headers 'Access-Control-Allow-Headers: Origin,Range,Accept-Encoding,Referer,Cache-Control';
            more_set_headers 'Access-Control-Expose-Headers: Server,Content-Length,Content-Range,Date';
            more_set_headers 'Access-Control-Allow-Methods: GET, HEAD, OPTIONS';
            more_set_headers 'Access-Control-Allow-Origin: *';
        }

        # serve flavor EMSS regular
        location ~ ^/emss/p/\d+/(sp/\d+/)?serveFlavor/ {
            vod mss;
            vod_segment_duration 4000;
            vod_align_segments_to_key_frames on;
            vod_manifest_segment_durations_mode accurate;

            vod_drm_enabled on;
            vod_drm_request_uri "/system/ovp/sharedkey/true/$vod_suburi";

            more_set_headers 'Access-Control-Allow-Headers: Origin,Range,Accept-Encoding,Referer,Cache-Control';
            more_set_headers 'Access-Control-Expose-Headers: Server,Content-Length,Content-Range,Date';
            more_set_headers 'Access-Control-Allow-Methods: GET, HEAD, OPTIONS';
            more_set_headers 'Access-Control-Allow-Origin: *';
        }
    }
}
erankor commented 7 years ago

@utu2016, did you try it out? can we close this issue?

utu2016 commented 7 years ago

Hi Erankor, yes we're trying but the problem persists. Now we have HLS & HSS both crypted & cleared contents, and only the HSS-Clear has a worst throughput.

We can close the issue even if the problem is still present.