videojs / videojs-contrib-hls

HLS library for video.js
http://videojs.github.io/videojs-contrib-hls/
Other
2.84k stars 793 forks source link

Video stutters when the next downloaded segment is being decrypted #218

Closed rajkosto closed 9 years ago

rajkosto commented 9 years ago

Non-encrypted HLS livestream test: http://sshnuke.net/teststream.php?03ej74H0LvDFBefzDCY4rPbtWYYbgdIHIGFA9t0s%2FcGhI%3D Encrypted HLS livestream test: http://sshnuke.net/teststream.php?9E81QtnIG1LuyLhKsOq0RMYwb%2FOi739KRh8%2BO6FNX34epYcZXfUzuQEPVTw4zt%2BU0j

Same delivery method, same .m3u8 and .ts file content (other than one having the key header and the other being encrypted with a constant key in a normal file) Best way to notice this is to view in chrome, go into Developer Tools->Rendering tab on the bottom, and check 'Show paint rectangles'. Then watch the green tint on the black borders around the stream. The encrypted one will fade out (indicating no updates happening) when the aes decrypt method is being called (which you can confirm by CPU profiling in chrome) on a newly-downloaded segment file. The non-encrypted one will be rock-solid except when switching qualities. The CPU is definitely not being overloaded, and the bitrate of the stream is only around 2.3mbps for the highest quality (the stuttering can be seen on all qualities, actually), the problem is that the decrypt method is being called in a blocking fashion which prevents the video from being updated at the same time. Other players such as flashls based ones do not have any issue with this.

dmlap commented 9 years ago

Thank you very much for the test streams! We'll take a look.

dmlap commented 9 years ago

I'm looking into this now. My hypothesis was swapping byte order before and after decrypting was tons of overhead and taking up the bulk of the time. My initial profiling in Chrome indicates a couple things:

  1. Decryption is at least 10x as expensive as transmuxing right now
  2. Byte-order swapping, CBC, and pkcs7 do not play a significant role in decryption time
  3. AES.decrypt() takes up 2/5 of decryption time in my test runs

All of which seem to invalidate my initial hunch. Assuming AES.decrypt() is the culprit, I'm going to explore a couple options:

Still investigating.

dmlap commented 9 years ago

I ended up building v8 and using the debugging console to try out a bunch of crazy ideas. For future reference:

  1. Inlining AES.decrypt() and hoisting loop invariants (the key, table references, etc.) can shave about 5% off decryption time
  2. DataView.prototype.getUint32 (and cousins) is significantly slower than standard TypedArrays with array accessors
  3. Using Uint32Arrays forced a regular check on signed-ness of the inputs and caused frequent function de-optimization. Switching to Int32Arrays halved the runtime.

A typical profile run looked like this:

Statistical profiling result from v8.log, (3056 ticks, 0 unaccounted, 0 excluded).

 [Shared libraries]:
   ticks  total  nonlib   name
     51    1.7%          /usr/lib/system/libsystem_platform.dylib
      1    0.0%          /usr/lib/system/libsystem_c.dylib

 [JavaScript]:
   ticks  total  nonlib   name
   1985   65.0%   66.1%  LazyCompile: *AES.decrypt src/decrypter-d8.js:209227:20
    220    7.2%    7.3%  LazyCompile: *decrypt src/decrypter-d8.js:209274:19
      7    0.2%    0.2%  LazyCompile: ~Uint8ArrayConstructByArrayLike native typedarray.js:62:40
      6    0.2%    0.2%  KeyedLoadIC: A keyed load IC from the snapshot
      3    0.1%    0.1%  Stub: StoreFastElementStub
      1    0.0%    0.0%  Stub: VectorRawLoadStub
      1    0.0%    0.0%  Stub: StoreFastElementStub {1}
      1    0.0%    0.0%  Stub: LoadFastElementStub

 [C++]:
   ticks  total  nonlib   name
    665   21.8%   22.1%  start
     45    1.5%    1.5%  ___thread_selfusage
     32    1.0%    1.1%  ___libplatform_init
     18    0.6%    0.6%  _mach_msg_destroy
      5    0.2%    0.2%  __os_once
      3    0.1%    0.1%  ___mkfifo_extended
      2    0.1%    0.1%  __simple_getenv
      2    0.1%    0.1%  ___chmod_extended
      1    0.0%    0.0%  _vm_read
      1    0.0%    0.0%  _malloc_zone_malloc
      1    0.0%    0.0%  _malloc_zone_from_ptr
      1    0.0%    0.0%  _inet_pton
      1    0.0%    0.0%  _free
      1    0.0%    0.0%  _create_scalable_zone
      1    0.0%    0.0%  __simple_asl_send
      1    0.0%    0.0%  ___cxa_free_exception

 [Summary]:
   ticks  total  nonlib   name
   2224   72.8%   74.0%  JavaScript
    780   25.5%   26.0%  C++
     23    0.8%    0.8%  GC
     52    1.7%          Shared libraries

 [C++ entry points]:
   ticks    cpp   total   name
     84  100.0%    2.7%  TOTAL

 [Bottom up (heavy) profile]:
  Note: percentage shows a share of a particular caller in the total
  amount of its parent calls.
  Callers occupying less than 2.0% are not shown.

   ticks parent  name
   1985   65.0%  LazyCompile: *AES.decrypt src/decrypter-d8.js:209227:20
   1984   99.9%    LazyCompile: *decrypt src/decrypter-d8.js:209274:19
   1984  100.0%      Function: ~Hls src/decrypter-d8.js:38:10
   1984  100.0%        Script: ~src/decrypter-d8.js

    665   21.8%  start
     19    2.9%    start
      4   21.1%      LazyCompile: ~Uint8ArrayConstructByArrayLike native typedarray.js:62:40
      4  100.0%        LazyCompile: ~Uint8Array native typedarray.js:74:31
      4  100.0%          Function: ~Hls src/decrypter-d8.js:38:10
      4  100.0%            Script: ~src/decrypter-d8.js
      3   15.8%      LazyCompile: ~decrypt src/decrypter-d8.js:209274:19
      3  100.0%        Function: ~Hls src/decrypter-d8.js:38:10
      3  100.0%          Script: ~src/decrypter-d8.js
      3   15.8%      LazyCompile: *decrypt src/decrypter-d8.js:209274:19
      3  100.0%        Function: ~Hls src/decrypter-d8.js:38:10
      3  100.0%          Script: ~src/decrypter-d8.js
      2   10.5%      LazyCompile: ~HarmonyToStringExtendSymbolPrototype native harmony-tostring.js:3:46
      2  100.0%        Script: ~native harmony-tostring.js
      2   10.5%      LazyCompile: *AES src/decrypter-d8.js:209118:16
      2  100.0%        LazyCompile: *decrypt src/decrypter-d8.js:209274:19
      2  100.0%          Function: ~Hls src/decrypter-d8.js:38:10
      2  100.0%            Script: ~src/decrypter-d8.js
      2   10.5%      Function: ~Hls src/decrypter-d8.js:38:10
      2  100.0%        Script: ~src/decrypter-d8.js
      1    5.3%      Script: ~native harmony-tostring.js
      1    5.3%      LazyCompile: ~subarray native typedarray.js:113:28
      1  100.0%        LazyCompile: ~<anonymous> src/decrypter-d8.js:209378:12
      1  100.0%          LazyCompile: *decrypt src/decrypter-d8.js:209274:19
      1  100.0%            Function: ~Hls src/decrypter-d8.js:38:10
      1    5.3%      LazyCompile: ~Uint8ArrayConstructByLength native typedarray.js:48:37
      1  100.0%        LazyCompile: ~Uint8Array native typedarray.js:74:31
      1  100.0%          LazyCompile: ~decrypt src/decrypter-d8.js:209274:19
      1  100.0%            Function: ~Hls src/decrypter-d8.js:38:10

    220    7.2%  LazyCompile: *decrypt src/decrypter-d8.js:209274:19
    220  100.0%    Function: ~Hls src/decrypter-d8.js:38:10
    220  100.0%      Script: ~src/decrypter-d8.js

All of these results were Chrome-only running bleeding edge. I have not checked how much of an effect these optimizations have on any other browsers since it doesn't seem feasible to reduce the runtime by another order of magnitude without giving up IE10 support. At this point, asynchronous decryption seems like the only realistic solution. Web workers have difficult restrictions for us in IE10 so I'll be looking into decrypting incrementally with regular setTimeouts along the way.

dmlap commented 9 years ago

Chunked up the work of decryption in #251. If there is a performance bottleneck remaining, I think it's in the segment transmuxing, not in the decryption anymore, and it was only noticeable at all for me when I got to testing 8Mbps streams. Closing this one but please feel free to comment if you're still seeing issues anywhere.