Closed rajkosto closed 9 years ago
Thank you very much for the test streams! We'll take a look.
I'm looking into this now. My hypothesis was swapping byte order before and after decrypting was tons of overhead and taking up the bulk of the time. My initial profiling in Chrome indicates a couple things:
All of which seem to invalidate my initial hunch. Assuming AES.decrypt() is the culprit, I'm going to explore a couple options:
Still investigating.
I ended up building v8 and using the debugging console to try out a bunch of crazy ideas. For future reference:
DataView.prototype.getUint32
(and cousins) is significantly slower than standard TypedArrays with array accessorsA typical profile run looked like this:
Statistical profiling result from v8.log, (3056 ticks, 0 unaccounted, 0 excluded).
[Shared libraries]:
ticks total nonlib name
51 1.7% /usr/lib/system/libsystem_platform.dylib
1 0.0% /usr/lib/system/libsystem_c.dylib
[JavaScript]:
ticks total nonlib name
1985 65.0% 66.1% LazyCompile: *AES.decrypt src/decrypter-d8.js:209227:20
220 7.2% 7.3% LazyCompile: *decrypt src/decrypter-d8.js:209274:19
7 0.2% 0.2% LazyCompile: ~Uint8ArrayConstructByArrayLike native typedarray.js:62:40
6 0.2% 0.2% KeyedLoadIC: A keyed load IC from the snapshot
3 0.1% 0.1% Stub: StoreFastElementStub
1 0.0% 0.0% Stub: VectorRawLoadStub
1 0.0% 0.0% Stub: StoreFastElementStub {1}
1 0.0% 0.0% Stub: LoadFastElementStub
[C++]:
ticks total nonlib name
665 21.8% 22.1% start
45 1.5% 1.5% ___thread_selfusage
32 1.0% 1.1% ___libplatform_init
18 0.6% 0.6% _mach_msg_destroy
5 0.2% 0.2% __os_once
3 0.1% 0.1% ___mkfifo_extended
2 0.1% 0.1% __simple_getenv
2 0.1% 0.1% ___chmod_extended
1 0.0% 0.0% _vm_read
1 0.0% 0.0% _malloc_zone_malloc
1 0.0% 0.0% _malloc_zone_from_ptr
1 0.0% 0.0% _inet_pton
1 0.0% 0.0% _free
1 0.0% 0.0% _create_scalable_zone
1 0.0% 0.0% __simple_asl_send
1 0.0% 0.0% ___cxa_free_exception
[Summary]:
ticks total nonlib name
2224 72.8% 74.0% JavaScript
780 25.5% 26.0% C++
23 0.8% 0.8% GC
52 1.7% Shared libraries
[C++ entry points]:
ticks cpp total name
84 100.0% 2.7% TOTAL
[Bottom up (heavy) profile]:
Note: percentage shows a share of a particular caller in the total
amount of its parent calls.
Callers occupying less than 2.0% are not shown.
ticks parent name
1985 65.0% LazyCompile: *AES.decrypt src/decrypter-d8.js:209227:20
1984 99.9% LazyCompile: *decrypt src/decrypter-d8.js:209274:19
1984 100.0% Function: ~Hls src/decrypter-d8.js:38:10
1984 100.0% Script: ~src/decrypter-d8.js
665 21.8% start
19 2.9% start
4 21.1% LazyCompile: ~Uint8ArrayConstructByArrayLike native typedarray.js:62:40
4 100.0% LazyCompile: ~Uint8Array native typedarray.js:74:31
4 100.0% Function: ~Hls src/decrypter-d8.js:38:10
4 100.0% Script: ~src/decrypter-d8.js
3 15.8% LazyCompile: ~decrypt src/decrypter-d8.js:209274:19
3 100.0% Function: ~Hls src/decrypter-d8.js:38:10
3 100.0% Script: ~src/decrypter-d8.js
3 15.8% LazyCompile: *decrypt src/decrypter-d8.js:209274:19
3 100.0% Function: ~Hls src/decrypter-d8.js:38:10
3 100.0% Script: ~src/decrypter-d8.js
2 10.5% LazyCompile: ~HarmonyToStringExtendSymbolPrototype native harmony-tostring.js:3:46
2 100.0% Script: ~native harmony-tostring.js
2 10.5% LazyCompile: *AES src/decrypter-d8.js:209118:16
2 100.0% LazyCompile: *decrypt src/decrypter-d8.js:209274:19
2 100.0% Function: ~Hls src/decrypter-d8.js:38:10
2 100.0% Script: ~src/decrypter-d8.js
2 10.5% Function: ~Hls src/decrypter-d8.js:38:10
2 100.0% Script: ~src/decrypter-d8.js
1 5.3% Script: ~native harmony-tostring.js
1 5.3% LazyCompile: ~subarray native typedarray.js:113:28
1 100.0% LazyCompile: ~<anonymous> src/decrypter-d8.js:209378:12
1 100.0% LazyCompile: *decrypt src/decrypter-d8.js:209274:19
1 100.0% Function: ~Hls src/decrypter-d8.js:38:10
1 5.3% LazyCompile: ~Uint8ArrayConstructByLength native typedarray.js:48:37
1 100.0% LazyCompile: ~Uint8Array native typedarray.js:74:31
1 100.0% LazyCompile: ~decrypt src/decrypter-d8.js:209274:19
1 100.0% Function: ~Hls src/decrypter-d8.js:38:10
220 7.2% LazyCompile: *decrypt src/decrypter-d8.js:209274:19
220 100.0% Function: ~Hls src/decrypter-d8.js:38:10
220 100.0% Script: ~src/decrypter-d8.js
All of these results were Chrome-only running bleeding edge. I have not checked how much of an effect these optimizations have on any other browsers since it doesn't seem feasible to reduce the runtime by another order of magnitude without giving up IE10 support. At this point, asynchronous decryption seems like the only realistic solution. Web workers have difficult restrictions for us in IE10 so I'll be looking into decrypting incrementally with regular setTimeouts along the way.
Chunked up the work of decryption in #251. If there is a performance bottleneck remaining, I think it's in the segment transmuxing, not in the decryption anymore, and it was only noticeable at all for me when I got to testing 8Mbps streams. Closing this one but please feel free to comment if you're still seeing issues anywhere.
Non-encrypted HLS livestream test: http://sshnuke.net/teststream.php?03ej74H0LvDFBefzDCY4rPbtWYYbgdIHIGFA9t0s%2FcGhI%3D Encrypted HLS livestream test: http://sshnuke.net/teststream.php?9E81QtnIG1LuyLhKsOq0RMYwb%2FOi739KRh8%2BO6FNX34epYcZXfUzuQEPVTw4zt%2BU0j
Same delivery method, same .m3u8 and .ts file content (other than one having the key header and the other being encrypted with a constant key in a normal file) Best way to notice this is to view in chrome, go into Developer Tools->Rendering tab on the bottom, and check 'Show paint rectangles'. Then watch the green tint on the black borders around the stream. The encrypted one will fade out (indicating no updates happening) when the aes decrypt method is being called (which you can confirm by CPU profiling in chrome) on a newly-downloaded segment file. The non-encrypted one will be rock-solid except when switching qualities. The CPU is definitely not being overloaded, and the bitrate of the stream is only around 2.3mbps for the highest quality (the stuttering can be seen on all qualities, actually), the problem is that the decrypt method is being called in a blocking fashion which prevents the video from being updated at the same time. Other players such as flashls based ones do not have any issue with this.