ytdl-org / youtube-dl

Command-line program to download videos from YouTube.com and other video sites
http://ytdl-org.github.io/youtube-dl/
The Unlicense
131.23k stars 9.93k forks source link

[ADN] impossible to extract subtitles #12724

Open Ririx02 opened 7 years ago

Ririx02 commented 7 years ago

Please follow the guide below


Make sure you are using the latest version: run youtube-dl --version and ensure your version is 2017.04.11. If it's not read this FAQ entry and update. Issues with outdated version will be rejected.

Before submitting an issue make sure you have:

What is the purpose of your issue?


The following sections concretize particular purposed issues, you can erase any section (the contents between triple ---) not applicable to your issue


If the purpose of this issue is a bug report, site support request or you are not completely sure provide the full verbose output as follows:

Add -v flag to your command line you run youtube-dl with, copy the whole output and insert it here. It should look similar to one below (replace it with your log inserted between triple ```):

youtube-dl --all-subs -v http://animedigitalnetwork.fr/video/boruto-naruto-next-generations/7937-episode-2-le-fils-du-hokage
[debug] System config: []
[debug] User config: []
[debug] Custom config: []
[debug] Command-line args: ['--all-subs', '-v', 'http://animedigitalnetwork.fr/video/boruto-naruto-next-generations/7937-episode-2-le-fils-du-hokage']
[debug] Encodings: locale cp1252, fs mbcs, out cp850, pref cp1252
[debug] youtube-dl version 2017.04.11
[debug] Python version 3.4.4 - Windows-10-10.0.14393
[debug] exe versions: ffmpeg N-80912-gce466d0, ffprobe N-80912-gce466d0, rtmpdump 2.3
[debug] Proxy map: {}
[ADN] 7937: Downloading webpage
[ADN] 7937: Downloading JSON metadata
[ADN] 7937: Downloading m3u8 information
[ADN] 7937: Downloading JSON metadata
[ADN] 7937: Downloading m3u8 information
[ADN] 7937: Downloading webpage
Traceback (most recent call last):
  File "__main__.py", line 19, in <module>
  File "C:\Users\dst\AppData\Roaming\Build archive\youtube-dl\rg3\tmpsqdjcx2e\build\youtube_dl\__init__.py", line 464, in main
  File "C:\Users\dst\AppData\Roaming\Build archive\youtube-dl\rg3\tmpsqdjcx2e\build\youtube_dl\__init__.py", line 454, in _real_main
  File "C:\Users\dst\AppData\Roaming\Build archive\youtube-dl\rg3\tmpsqdjcx2e\build\youtube_dl\YoutubeDL.py", line 1890, in download
  File "C:\Users\dst\AppData\Roaming\Build archive\youtube-dl\rg3\tmpsqdjcx2e\build\youtube_dl\YoutubeDL.py", line 761, in extract_info
  File "C:\Users\dst\AppData\Roaming\Build archive\youtube-dl\rg3\tmpsqdjcx2e\build\youtube_dl\extractor\common.py", line 429, in extract
  File "C:\Users\dst\AppData\Roaming\Build archive\youtube-dl\rg3\tmpsqdjcx2e\build\youtube_dl\extractor\adn.py", line 133, in _real_extract
  File "C:\Users\dst\AppData\Roaming\Build archive\youtube-dl\rg3\tmpsqdjcx2e\build\youtube_dl\extractor\common.py", line 2390, in extract_subtitles
  File "C:\Users\dst\AppData\Roaming\Build archive\youtube-dl\rg3\tmpsqdjcx2e\build\youtube_dl\extractor\adn.py", line 53, in _get_subtitles
  File "C:\Users\dst\AppData\Roaming\Build archive\youtube-dl\rg3\tmpsqdjcx2e\build\youtube_dl\extractor\common.py", line 672, in _parse_json
  File "C:\Python\Python34\lib\json\__init__.py", line 312, in loads
TypeError: the JSON object must be str, not 'bytes'

If the purpose of this issue is a site support request please provide all kinds of example URLs support for which should be included (replace following example URLs by yours):


Description of your issue, suggested solution and other information

impossible to extract subtitles.

remitamine commented 7 years ago

the problem in the issue can be fixed simply by decoding the decrypted subtitle, however the real problem is that they change the decryption key frequently, so this won't be fixed untill there is a js interpreter that can handle the key construction js code.

leonekmi commented 7 years ago

Can you (@remitamine) explain how to get the encr. key, i see that you regulary change the key in the extractor ? Me and JS, huh....

remitamine commented 7 years ago

the key can be found in http://animedigitalnetwork.fr/components/com_vodvideo/videojs/adn-vjs.min.js. for example at this time this is the code to construct the key:

function(){var a=function(){var a,b=[a=114874,a+=-1521,a+=-19814,a+=-75638,a+=45570,a+=46993,a+=-66124,a+=28122];b[3]=[b[4],b[4]=b[3]][0],b[5]=[b[2],b[2]=b[5]][0],b[2]=44262*b[2]%(2<<16),b[2]=6159*b[2]%(2<<16),b[5]=b[6]^b[2],b[3]=[b[0],b[0]=b[3]][0],b[0]=b[7]^b[0],b[5]=4906*b[5]%(2<<16),b[5]=b[7]^b[0],acopifakofuwil(b.map(function(a){return("0000"+a.toString(16)).substr(-4)}).join(""))};a(),a=null}()

what i did is copying what is inside the a function and replacing the name of the function written before b.map which in this case acopifakofuwil with console.log than execute the code in browser console. the executed code whould be:

var a, b = [a = 114874, a += -1521, a += -19814, a += -75638, a += 45570, a += 46993, a += -66124, a += 28122];
        b[3] = [b[4], b[4] = b[3]][0], b[5] = [b[2], b[2] = b[5]][0], b[2] = 44262 * b[2] % (2 << 16), b[2] = 6159 * b[2] % (2 << 16), b[5] = b[6] ^ b[2], b[3] = [b[0], b[0] = b[3]][0], b[0] = b[7] ^ b[0], b[5] = 4906 * b[5] % (2 << 16), b[5] = b[7] ^ b[0], console.log(b.map(function(a) {
            return ("0000" + a.toString(16)).substr(-4)
        }).join(""))

the result is: ece1bac92300c0ba45edf7efad341b0e then prefix every two characters with \x and replace the key in the extractor.

diff --git a/youtube_dl/extractor/adn.py b/youtube_dl/extractor/adn.py
index 66caf6a81..50cfdcdee 100644
--- a/youtube_dl/extractor/adn.py
+++ b/youtube_dl/extractor/adn.py
@@ -45,7 +45,7 @@ class ADNIE(InfoExtractor):
         # http://animedigitalnetwork.fr/components/com_vodvideo/videojs/adn-vjs.min.js
         dec_subtitles = intlist_to_bytes(aes_cbc_decrypt(
             bytes_to_intlist(base64.b64decode(enc_subtitles[24:])),
-            bytes_to_intlist(b'\nd\xaf\xd2J\xd0\xfc\xe1\xfc\xdf\xb61\xe8\xe1\xf0\xcc'),
+            bytes_to_intlist(b'\xec\xe1\xba\xc9\x23\x00\xc0\xba\x45\xed\xf7\xef\xad\x34\x1b\x0e'),
             bytes_to_intlist(base64.b64decode(enc_subtitles[:24]))
         ))
         subtitles_json = self._parse_json(
leonekmi commented 7 years ago

This is not possible to automatically execute this (via external server for example) ?

remitamine commented 7 years ago

This is not possible to automatically execute this (via external server for example) ?

as i said before:

so this won't be fixed untill there is a js interpreter that can handle the key construction js code.

it's possible with a js interpreter.

Ririx02 commented 7 years ago

I do not really have any knowledge in js interpreter but maybe there is something that would be appropriate? : Https://github.com/amol-/dukpy Https://github.com/NeilFraser/JS-Interpreter

remitamine commented 7 years ago

Https://github.com/amol-/dukpy

dukpy is currently not production ready and might actually crash your program as it is mostly implemented in C.

Https://github.com/NeilFraser/JS-Interpreter

javascript project we are using python in this project.

the decryption code has been changed again(they apply more obfuscation, but it still simple to deobfuscate), the change that also need to apply in the code is changing the user agent(they banned the user agent used by youtube-dl).

diff --git a/youtube_dl/extractor/adn.py b/youtube_dl/extractor/adn.py
index 66caf6a81..09e46cc34 100644
--- a/youtube_dl/extractor/adn.py
+++ b/youtube_dl/extractor/adn.py
@@ -38,14 +38,16 @@ class ADNIE(InfoExtractor):

         enc_subtitles = self._download_webpage(
             'http://animedigitalnetwork.fr/' + sub_path,
-            video_id, fatal=False)
+            video_id, headers={
+                'User-Agent': 'Mozilla/5.0 (X11; Linux x86_64; rv:53.0) Gecko/20100101 Firefox/53.0'
+            }, fatal=False)
         if not enc_subtitles:
             return None

         # http://animedigitalnetwork.fr/components/com_vodvideo/videojs/adn-vjs.min.js
         dec_subtitles = intlist_to_bytes(aes_cbc_decrypt(
             bytes_to_intlist(base64.b64decode(enc_subtitles[24:])),
-            bytes_to_intlist(b'\nd\xaf\xd2J\xd0\xfc\xe1\xfc\xdf\xb61\xe8\xe1\xf0\xcc'),
+            bytes_to_intlist(b'\xba\x11\x86\x24\x55\xa6\x40\xf8\x50\xb8\xb0\xe7\x46\x4d\x90\x13'),
             bytes_to_intlist(base64.b64decode(enc_subtitles[:24]))
         ))
         subtitles_json = self._parse_json(
Ririx02 commented 7 years ago

Otherwise, could we create an option where we could enter the key manually? Example: youtube-dl -key 15541681654654 .....

leonekmi commented 7 years ago

Ah yeah, which can be easily got via an external serivce in JS (in a GH Page for example) I propose --sub-decryption-key (or -subdk) for the name of the argument

leonekmi commented 7 years ago

@remitamine, the new obsfucation system is strarting at which line exactly ? (after unminify) From L21434 ?

We can also create a notice to explain how to get easily this decryption key to afterward specify in -subdk.

leonekmi commented 7 years ago

Hey, update :

leonekmi@leonekmi-MS-7693:~$ youtube-dl -v --print-traffic http://animedigitalnetwork.fr/video/my-hero-academia-saison-2/7945-episode-3-quelle-belle-alterite
[debug] System config: []
[debug] User config: []
[debug] Custom config: []
[debug] Command-line args: ['-v', '--print-traffic', 'http://animedigitalnetwork.fr/video/my-hero-academia-saison-2/7945-episode-3-quelle-belle-alterite']
[debug] Encodings: locale UTF-8, fs utf-8, out UTF-8, pref UTF-8
[debug] youtube-dl version 2017.05.26
[debug] Python version 3.6.1 - Linux-4.10.0-21-generic-x86_64-with-Ubuntu-17.04-zesty
[debug] exe versions: ffmpeg 3.2.4-1build2, ffprobe 3.2.4-1build2
[debug] Proxy map: {}
[ADN] 7945: Downloading webpage
send: b'GET /video/my-hero-academia-saison-2/7945-episode-3-quelle-belle-alterite HTTP/1.1\r\nHost: animedigitalnetwork.fr\r\nUser-Agent: Mozilla/5.0 (X11; Linux x86_64; rv:10.0) Gecko/20150101 Firefox/47.0 (Chrome)\r\nAccept-Charset: ISO-8859-1,utf-8;q=0.7,*;q=0.7\r\nAccept: text/html,application/xhtml+xml,application/xml;q=0.9,*/*;q=0.8\r\nAccept-Encoding: gzip, deflate\r\nAccept-Language: en-us,en;q=0.5\r\nConnection: close\r\n\r\n'
reply: 'HTTP/1.1 200 OK\r\n'
ERROR: No video formats found; please report this issue on https://yt-dl.org/bug . Make sure you are using the latest version; see  https://yt-dl.org/update  on how to update. Be sure to call youtube-dl with the --verbose flag and include its complete output.
Traceback (most recent call last):
  File "/usr/local/lib/python3.6/dist-packages/youtube_dl/YoutubeDL.py", line 760, in extract_info
    ie_result = ie.extract(url)
  File "/usr/local/lib/python3.6/dist-packages/youtube_dl/extractor/common.py", line 433, in extract
    ie_result = self._real_extract(url)
  File "/usr/local/lib/python3.6/dist-packages/youtube_dl/extractor/adn.py", line 125, in _real_extract
    self._sort_formats(formats)
  File "/usr/local/lib/python3.6/dist-packages/youtube_dl/extractor/common.py", line 1056, in _sort_formats
    raise ExtractorError('No video formats found')
youtube_dl.utils.ExtractorError: No video formats found; please report this issue on https://yt-dl.org/bug . Make sure you are using the latest version; see  https://yt-dl.org/update  on how to update. Be sure to call youtube-dl with the --verbose flag and include its complete output.

header: Server header: Date header: Content-Type header: Keep-Alive header: Vary header: P3P header: X-Logged-In header: Content-Encoding header: Cache-Control header: X-Varnish header: Age header: X-Cache header: X-Varnish-IP header: Content-Length header: Connection header: Accept-Ranges
remitamine commented 7 years ago

@remitamine, the new obsfucation system is strarting at which line exactly ? (after unminify) From L21434 ?

i will point directly to obfuscated part related to the decryption key. the key splitted into two parts. the first part is stored in _0x2002('0xf8') of this part of code(the real value is 1be0296138942400):

        '\x70\x72\x65\x70\x61\x72\x65\x53\x75\x62\x74\x69\x74\x6c\x65\x73': function(_0x4425ef) {
            var _0x21b8fa = _0x4425ef[_0x2002('0xf7')](0x0, 0x18);
            var _0x5a1269 = _0x4425ef['\x73\x75\x62\x73\x74\x72\x69\x6e\x67'](0x18);
            try {
                var _0x3597bf = _0xe329f8[_0x2002('0xc8')][_0x2002('0xa6')](_0x5a1269, _0xe329f8[_0x2002('0x6f')][_0x2002('0x70')][_0x2002('0x78')](_0x2002('0xf8') + _0x2f9bc6), {
                    '\x69\x76': _0xe329f8['\x65\x6e\x63'][_0x2002('0x8e')][_0x2002('0x78')](_0x21b8fa)
                });
                _0x3597bf = _0x3597bf[_0x2002('0x62')](_0xe329f8[_0x2002('0x6f')][_0x2002('0x76')]);
                _0x1cfa06[this[_0x2002('0xd1')]] = JSON[_0x2002('0x78')](_0x3597bf) || {};
            } catch (_0x263178) {
                this['\x74\x72\x69\x67\x67\x65\x72']('\x61\x64\x6e\x2e\x65\x72\x72\x6f\x72');
            }
            this[_0x2002('0xf9')]();
        },

the second part can be found in(the real value is 12bdc580accebeb0):

        var _0x2c722f = [_0x2c9d03 = 0x1127e, _0x2c9d03 = _0x2c9d03 + -0x9d8e, _0x2c9d03 = _0x2c9d03 + -0x4a4b, _0x2c9d03 = _0x2c9d03 + 0x12000];
        _0x2c722f[0x2] = _0x2c722f[0x1] * _0x2c722f[0x2];
        _0x2c722f[0x2] = [_0x2c722f[0x3], _0x2c722f[0x3] = _0x2c722f[0x2]][0x0];
        _0x2c722f[0x1] = _0x2c722f[0x1] * 0xc0c8 % (0x2 << 0x10);
        _0x2c722f[0x2] = _0x2c722f[0x3] * _0x2c722f[0x2];
        _0x2c722f[0x1] = _0x2c722f[0x1] * 0x79cd % (0x2 << 0x10);
        _0x2c722f[0x2] = _0x2c722f[0x0] ^ _0x2c722f[0x3];
        for (var _0x44da56 = 0x8; _0x44da56 < 0xe; _0x44da56++) {
            _0x2c722f[0x0] += _0x44da56;
        }
        _0x0bdd31(_0x2c722f[_0x2002('0x176')](function(_0x541fc3) {
            return (_0x2002('0x177') + _0x541fc3[_0x2002('0x62')](0x10))['\x73\x75\x62\x73\x74\x72'](-0x4);
        })[_0x2002('0x71')](''));

I propose --sub-decryption-key (or -subdk) for the name of the argument

options should be generic, normally we don't add options specific for one of the extractors. if the user know how to get the encryption key, he can just edit the key directly in the source code.

leonekmi commented 7 years ago

oh, we can use --video-password, it will be a little weird but if we cannot create a new argument.

hutigh commented 7 years ago

Hi, i still get this issue any idea ? ERROR: No video formats found; please report this issue on https://yt-dl.org/bug . Make sure you are using the latest version; type youtube-dl -U to update. Be sure to call youtube-dl with the --verbose flag and include its complete output.

remitamine commented 7 years ago

Hi, i still get this issue any idea ? ERROR: No video formats found;

it will be fixed in the next version.

leonekmi commented 7 years ago

Ow...

[leonekmi@leonekmi-PC` ~]$ youtube-dl --write-sub http://animedigitalnetwork.fr/video/chunibyou_demo_koi_ga_shitai/2702-episode-1-rencontre-avec-le-jao-shingan-l-il-de-verite-du-roi-du-mal
[ADN] 2702: Downloading webpage
[ADN] 2702: Downloading JSON metadata
[ADN] 2702: Downloading JSON metadata
[ADN] 2702: Downloading m3u8 information
[ADN] 2702: Downloading JSON metadata
[ADN] 2702: Downloading m3u8 information
[ADN] 2702: Downloading webpage
Traceback (most recent call last):
  File "/usr/bin/youtube-dl", line 11, in <module>
    load_entry_point('youtube-dl==2017.6.25', 'console_scripts', 'youtube-dl')()
  File "/usr/lib/python3.6/site-packages/youtube_dl/__init__.py", line 465, in main
    _real_main(argv)
  File "/usr/lib/python3.6/site-packages/youtube_dl/__init__.py", line 455, in _real_main
    retcode = ydl.download(all_urls)
  File "/usr/lib/python3.6/site-packages/youtube_dl/YoutubeDL.py", line 1927, in download
    url, force_generic_extractor=self.params.get('force_generic_extractor', False))
  File "/usr/lib/python3.6/site-packages/youtube_dl/YoutubeDL.py", line 762, in extract_info
    ie_result = ie.extract(url)
  File "/usr/lib/python3.6/site-packages/youtube_dl/extractor/common.py", line 433, in extract
    ie_result = self._real_extract(url)
  File "/usr/lib/python3.6/site-packages/youtube_dl/extractor/adn.py", line 144, in _real_extract
    'subtitles': self.extract_subtitles(player_config.get('subtitles'), video_id),
  File "/usr/lib/python3.6/site-packages/youtube_dl/extractor/common.py", line 2467, in extract_subtitles
    return self._get_subtitles(*args, **kwargs)
  File "/usr/lib/python3.6/site-packages/youtube_dl/extractor/adn.py", line 56, in _get_subtitles
    dec_subtitles[:-compat_ord(dec_subtitles[-1])].decode(),
UnicodeDecodeError: 'utf-8' codec can't decode byte 0xca in position 0: invalid continuation byte
Ririx02 commented 7 years ago

re https://github.com/PiotrDabkowski/Js2Py

possible ?

siddht4 commented 7 years ago

i feel like @testlog0 has not completly understand what the problem is? The issue here is that the site creates a new key for the subtitle similar to an issue .The site uses date,time and key to obsfuricate the code so that the user cannot understand it.Think it this was if you have a valuable file, you would do whatever possible to avoid anyone copying it.So your subtitle is protected that way.In order to break it you need to deobsfuricate the code,then it will be parsed by the necessary function and you will get your subtitle.This sometimes can be done directly and if the result are positive the code can be pushed here. @remitamine please give one more example so that i can prepare the regex function meanwhile

remitamine commented 7 years ago

@remitamine please give one more example so that i can prepare the regex function meanwhile.

i will put the unminified code as it will be the one that is needed to be matched. the first part is stored in _0x2342('0xf9') of this part of code:

'\x70\x72\x65\x70\x61\x72\x65\x53\x75\x62\x74\x69\x74\x6c\x65\x73':function(_0xe6cfbd){var _0x9dea74=_0xe6cfbd[_0x2342('0xf8')](0x0,0x18);var _0x4334c4=_0xe6cfbd[_0x2342('0xf8')](0x18);try{var _0x4a1554=_0xa9e524[_0x2342('0xcf')][_0x2342('0xab')](_0x4334c4,_0xa9e524['\x65\x6e\x63']['\x48\x65\x78'][_0x2342('0x7f')](_0x2342('0xf9')+_0x49e891),{'\x69\x76':_0xa9e524[_0x2342('0x75')]['\x42\x61\x73\x65\x36\x34'][_0x2342('0x7f')](_0x9dea74)});_0x4a1554=_0x4a1554['\x74\x6f\x53\x74\x72\x69\x6e\x67'](_0xa9e524[_0x2342('0x75')]['\x55\x74\x66\x38']);_0x1317c5[this['\x74\x72\x61\x63\x6b\x49\x6e\x64\x65\x78']]=JSON[_0x2342('0x7f')](_0x4a1554)||{};}catch(_0x22dfdc){this['\x74\x72\x69\x67\x67\x65\x72'](_0x2342('0xec'));}this[_0x2342('0xfa')]();}

the second part can be found in:

(function(){var _0x146fe4=function(){var _0x553262;var _0x387bcd=[_0x553262=0x10270,_0x553262=_0x553262+0xd0ba,_0x553262=_0x553262+-0x15405,_0x553262=_0x553262+0x7ad9];_0x387bcd[0x1]=_0x387bcd[0x1]*0x2751%(0x2<<0x10);_0x387bcd[0x3]=[_0x387bcd[0x2],_0x387bcd[0x2]=_0x387bcd[0x3]][0x0];for(var _0x5e4edc=0x4;_0x5e4edc<0xc;_0x5e4edc++){_0x387bcd[0x3]+=_0x5e4edc;}_0x387bcd[0x1]=[_0x387bcd[0x0],_0x387bcd[0x0]=_0x387bcd[0x1]][0x0];for(var _0x5e4edc=0x2;_0x5e4edc<0x13;_0x5e4edc++){_0x387bcd[0x2]+=_0x5e4edc;}_0x387bcd[0x1]=_0x387bcd[0x3]^_0x387bcd[0x1];_0x387bcd[0x2]=_0x387bcd[0x1]*_0x387bcd[0x2];_0x9ebf57(_0x387bcd[_0x2342('0x172')](function(_0x57d859){return(_0x2342('0x173')+_0x57d859[_0x2342('0x69')](0x10))[_0x2342('0x78')](-0x4);})[_0x2342('0x77')](''));};_0x146fe4();_0x146fe4={};}());
siddht4 commented 7 years ago

@remitamine thanks for the 2nd example,found out they use date as a get value followed by something similar to version id, example :
I think they are splitting it at _(underscore) so it is 1."date":"2017-07-07" ,2. "version":"1.2.2" differentiating both shows obvious size difference.

siddht4 commented 7 years ago

they use a minfied version then append it with the obsfuricated code,just not able to understand which part to use, @remitamine please guide as i am simply using parts of the code,part of the code is at https://gist.github.com/siddht1/ed9837e6d2af205b4ccdb25459ba20e4

remitamine commented 7 years ago

first part: https://gist.github.com/siddht1/ed9837e6d2af205b4ccdb25459ba20e4#file-adn-part-code-L1079-L1092 second part: https://gist.github.com/siddht1/ed9837e6d2af205b4ccdb25459ba20e4#file-adn-part-code-L1586-L1606

siddht4 commented 7 years ago

@remitamine '\x70\x72\x65\x70\x61\x72\x65\x53\x75\x62\x74\x69\x74\x6c\x65\x73' actually means that

remitamine commented 7 years ago

prepareSubtitles

siddht4 commented 7 years ago

@remitamine i know '\x70\x72\x65\x70\x61\x72\x65\x53\x75\x62\x74\x69\x74\x6c\x65\x73' means preparesubtitlle,'\x73\x75\x62\x73\x74\x72\x69\x6e\x67' means substring,'\x70\x61\x72\x73\x65' means parse,just searching where it gets invoked

remitamine commented 7 years ago

loadSubtitles

siddht4 commented 7 years ago

@remitamine

loadSubtitles where is it ?

remitamine commented 7 years ago

loadSubtitles where is it ?

https://gist.github.com/siddht1/ed9837e6d2af205b4ccdb25459ba20e4#file-adn-part-code-L1060

@siddht1 if you're looking for a better understanding of the code, i think the first thing that you have to do is to deobfuscate the code:

this will give you a clear source code that will let you understand the flow.

siddht4 commented 7 years ago

@siddht4 if you're looking for a better understanding of the code, i think the first thing that you have to do is to deobfuscate the code:

replace hex and unicode escaped strings. replace the calls to the functions using the array from the first line in your gist(you need to get a number from the code and shift/rotate the array negativaly by this ammount than use index(in hexadecimal) passed to function calls to get the real string).

this will give you a clear source code that will let you understand the flow.

That prettly much what was i doing but you already know developer obfuscrate the code so that its next to impossible to get the code back in readable format(next to impossible,not impossible).My mozilla deobsfuricate doesnt seems to work,so had been document.write() what each part meant.Lengthy work so chit chatting with @remitamine.If @remitamine has partial deobfuscate of the code share it.I will be keeping the complete copy of the js for each day so that i can verify indeed there is a function working with the get values of date and version

remitamine commented 7 years ago

That prettly much what was i doing but you already know developer obfuscrate the code so that its next to impossible to get the code back in readable format(next to impossible,not impossible).

it's possible to convert the code into a readable format.

If @remitamine has partial deobfuscate of the code share it.

i wrote before a script to automatically get an fresh deobfuscated version of the js code, however i can't access it now because i can't access the HDD.

siddht4 commented 7 years ago

it's possible to the code back in readable format.

That`s the thing i was trying to do till now.

lastly i would simply

 --git a/youtube_dl/extractor/adn.py b/youtube_dl/extractor/adn.py
index 66caf6a81..50cfdcdee 100644
--- a/youtube_dl/extractor/adn.py
+++ b/youtube_dl/extractor/adn.py
@@ -45,7 +45,7 @@ class ADNIE(InfoExtractor):
         # http://animedigitalnetwork.fr/components/com_vodvideo/videojs/adn-vjs.min.js
         dec_subtitles = intlist_to_bytes(aes_cbc_decrypt(
             bytes_to_intlist(base64.b64decode(enc_subtitles[24:])),
-            bytes_to_intlist(b'\nd\xaf\xd2J\xd0\xfc\xe1\xfc\xdf\xb61\xe8\xe1\xf0\xcc'),
+            bytes_to_intlist(b'\xec\xe1\xba\xc9\x23\x00\xc0\xba\x45\xed\xf7\xef\xad\x34\x1b\x0e'),
             bytes_to_intlist(base64.b64decode(enc_subtitles[:24]))
         ))
         subtitles_json = self._parse_json(
      i.e fix the code in appropiate place and would add the date_whatever tag 
siddht4 commented 7 years ago

@remitamine please can we use https://gist.github.com/siddht4/ed9837e6d2af205b4ccdb25459ba20e4 to comment regarding the code

remitamine commented 7 years ago

@remitamine please can we use https://gist.github.com/siddht1/ed9837e6d2af205b4ccdb25459ba20e4 to comment regarding the code

yes, as most likely that people who subscribe to this issue look to get updates only when there is a fix to the extractor.

madrasile commented 6 years ago

I'v looked deeper into this subject and into the code from the website, as @siddht4 said, you can deobfuscate the code and understand what it is doing in the background. When you will understand you should be able to create a small (but boring) script to dynamically compute the AES key. Trust me, just take enough of your time and it would be possible to get what you want (as IRL)

EDIT : I know they are following this thread (from what I read), so I will not give you more informations about how to do that.

remitamine commented 6 years ago

@siddht4 as i lost any hope of restoring the content of HDD, this is a rewrite of the script that i have before, just run it and it will create adn.js file with a more readable javascript. the script depends on jsbeautifier

import os
import re
import urllib.request
from collections import deque

import jsbeautifier

with urllib.request.urlopen('http://www.animedigitalnetwork.fr/components/com_vodvideo/videojs/adn-vjs.min.js') as adn_min:
    opts = jsbeautifier.default_options()
    opts.eol = os.linesep
    js = jsbeautifier.beautify(adn_min.read().decode(), opts).encode().decode('unicode_escape')
    repl_array, shift, repl_func = re.search(r"(?s)var\s+(_0x[0-9a-f]+)\s*=\s*\[\s*(.+?)\s*\].+?}\(\1\s*,\s*(0x[0-9a-f]+)\)\);\s+var\s*(_0x[0-9a-f]+)", js).group(2, 3, 4)
    repl_array = deque(x[1:-1] for x in repl_array.split(', '))
    repl_array.rotate(-int(shift, 16))
    js = re.sub(r"%s\('(0x[0-9a-f]+)'\)" % repl_func, lambda mobj: "'%s'" % repl_array[int(mobj.group(1), 16)], js)
    with open('adn.js', 'wb') as adn_unmin:
        adn_unmin.write(js.encode())
siddht4 commented 6 years ago

@remitamine thats a neat code.

ghost commented 6 years ago

If someone is looking for the new key:

var _0x164cdf = ''

var _0xf34196 = function() {
    _0x164cdf = '77d4f03a06c9a053' + arguments[0x0];
};

var _0x58a852 = function() {
    var _0x2ce3eb, _0x1912e3 = [_0x2ce3eb = 0xcae0, _0x2ce3eb += 0xec7, _0x2ce3eb += 0x477d, _0x2ce3eb += -0xcfcc];
    _0x1912e3[0x1] = [_0x1912e3[0x3], _0x1912e3[0x3] = _0x1912e3[0x1]][0x0], _0x1912e3[0x1] = [_0x1912e3[0x2], _0x1912e3[0x2] = _0x1912e3[0x1]][0x0], _0x1912e3[0x1] = [_0x1912e3[0x0], _0x1912e3[0x0] = _0x1912e3[0x1]][0x0], _0x1912e3[0x2] = 0x31ec * _0x1912e3[0x2] % (0x2 << 0x10), _0x1912e3[0x0] = _0x1912e3[0x2] ^ _0x1912e3[0x0], _0x1912e3[0x0] = _0x1912e3[0x0] * _0x1912e3[0x3], _0x1912e3[0x2] = _0x1912e3[0x3] * _0x1912e3[0x2], _0x1912e3[0x0] = _0x1912e3[0x3] * _0x1912e3[0x2], _0xf34196(_0x1912e3['map'](function(_0x29eb0c) {
        return ('6593' + _0x29eb0c['toString'](0x10))['substr'](-0x4);
    })['join'](''));
};

_0x58a852()

console.log(_0x164cdf)
kitsu-man commented 6 years ago

@persi-persu does your code still working ?

ghost commented 6 years ago

@kitsu-man No, ADN has made changes on that side.

remitamine commented 6 years ago

new script for deobfuscation

import base64
import os
import re
import urllib.request
from collections import deque

import jsbeautifier
from jsbeautifier.unpackers import UNPACKERS

for unpacker in UNPACKERS:
    if 'javascriptobfuscator' in unpacker.__name__:
        def unpack(code):
            matches = re.search(r"(?s)((?:var\s+)?(_0x[0-9a-f]+)\s*=\s*\[\s*(.+?)\s*\].+?}\(\2\s*,\s*(0x[0-9a-f]+)\)\);\s*)(?:var\s+)?(_0x[0-9a-f]+)", code)
            if matches:
                repl_array, shift, repl_func = matches.group(3, 4, 5)
                repl_array = deque(base64.b64decode(x[1:-1].encode().decode('unicode_escape')).decode().replace(r"'", r"\'") for x in repl_array.split(','))
                repl_array.rotate(-int(shift, 16))
                code = code[len(matches.group(1)):]
                code = re.sub(r"%s\('(0x[0-9a-f]+)'\)" % repl_func, lambda mobj: "'%s'" % repl_array[int(mobj.group(1), 16)], code)
            return code
        unpacker.unpack = unpack
        break

with urllib.request.urlopen('http://www.animedigitalnetwork.fr/components/com_vodvideo/videojs/adn-vjs.min.js') as adn_min:
    opts = jsbeautifier.default_options()
    opts.eol = os.linesep
    opts.unescape_strings = True
    code = jsbeautifier.beautify(adn_min.read().decode(), opts)
    with open('adn.js', 'wb') as adn_unmin:
        adn_unmin.write(code.encode())

fix for video downloading(requeires PyCryptodome or PyCrypto):

diff --git a/youtube_dl/extractor/adn.py b/youtube_dl/extractor/adn.py
index 041c61aff..eb7acc430 100644
--- a/youtube_dl/extractor/adn.py
+++ b/youtube_dl/extractor/adn.py
@@ -1,8 +1,11 @@
 # coding: utf-8
 from __future__ import unicode_literals

+import base64
 import json
 import os
+import random
+import string

 from .common import InfoExtractor
 from ..aes import aes_cbc_decrypt
@@ -20,6 +23,9 @@ from ..utils import (
     urljoin,
 )

+from Crypto.Cipher import PKCS1_v1_5
+from Crypto.PublicKey import RSA
+

 class ADNIE(InfoExtractor):
     IE_DESC = 'Anime Digital Network'
@@ -112,8 +118,19 @@ class ADNIE(InfoExtractor):
         error = None
         if not links:
             links_url = player_config.get('linksurl') or options['videoUrl']
-            links_data = self._download_json(urljoin(
-                self._BASE_URL, links_url), video_id)
+            token = options['token']
+            public_key = self._download_webpage(urljoin(
+                self._BASE_URL, options['public_key']), video_id)
+            self._K = ''.join([random.choice(string.hexdigits) for _ in range(16)])
+            authorization = base64.b64encode(PKCS1_v1_5.new(RSA.importKey(public_key)).encrypt(json.dumps({
+                'k': self._K,
+                'e': 60,
+                't': token,
+            }).encode())).decode()
+            links_data = self._download_json(
+                urljoin(self._BASE_URL, links_url), video_id, headers={
+                    'Authorization': 'Bearer ' + authorization,
+                })
             links = links_data.get('links') or {}
             metas = metas or links_data.get('meta') or {}
             sub_path = sub_path or links_data.get('subtitles')
ghost commented 6 years ago

@remitamine Do you have any idea how to fix the subtitles?

remitamine commented 6 years ago

I will post an update if I continue working on it(a token has to be passed to the subtitles request and the randomly generated hex value needs to be concatenated with the second part calculated in adn.js script).

remitamine commented 6 years ago

here is the final fix:

diff --git a/youtube_dl/extractor/adn.py b/youtube_dl/extractor/adn.py
index 041c61aff..fe1c41065 100644
--- a/youtube_dl/extractor/adn.py
+++ b/youtube_dl/extractor/adn.py
@@ -1,8 +1,11 @@
 # coding: utf-8
 from __future__ import unicode_literals

+import base64
+import binascii
 import json
 import os
+import random

 from .common import InfoExtractor
 from ..aes import aes_cbc_decrypt
@@ -20,6 +23,9 @@ from ..utils import (
     urljoin,
 )

+from Crypto.Cipher import PKCS1_v1_5
+from Crypto.PublicKey import RSA
+

 class ADNIE(InfoExtractor):
     IE_DESC = 'Anime Digital Network'
@@ -42,16 +48,14 @@ class ADNIE(InfoExtractor):

         enc_subtitles = self._download_webpage(
             urljoin(self._BASE_URL, sub_path),
-            video_id, fatal=False, headers={
-                'User-Agent': 'Mozilla/5.0 (X11; Linux x86_64; rv:53.0) Gecko/20100101 Firefox/53.0',
-            })
+            video_id, fatal=False)
         if not enc_subtitles:
             return None

         # http://animedigitalnetwork.fr/components/com_vodvideo/videojs/adn-vjs.min.js
         dec_subtitles = intlist_to_bytes(aes_cbc_decrypt(
             bytes_to_intlist(compat_b64decode(enc_subtitles[24:])),
-            bytes_to_intlist(b'\xc8\x6e\x06\xbc\xbe\xc6\x49\xf5\x88\x0d\xc8\x47\xc4\x27\x0c\x60'),
+            bytes_to_intlist(binascii.unhexlify(self._K + '9032ad7083106400')),
             bytes_to_intlist(compat_b64decode(enc_subtitles[:24]))
         ))
         subtitles_json = self._parse_json(
@@ -112,11 +116,22 @@ class ADNIE(InfoExtractor):
         error = None
         if not links:
             links_url = player_config.get('linksurl') or options['videoUrl']
-            links_data = self._download_json(urljoin(
-                self._BASE_URL, links_url), video_id)
+            token = options['token']
+            public_key = self._download_webpage(urljoin(
+                self._BASE_URL, options['public_key']), video_id)
+            self._K = ''.join([random.choice('0123456789abcdef') for _ in range(16)])
+            authorization = base64.b64encode(PKCS1_v1_5.new(RSA.importKey(public_key)).encrypt(json.dumps({
+                'k': self._K,
+                'e': 60,
+                't': token,
+            }).encode())).decode()
+            links_data = self._download_json(
+                urljoin(self._BASE_URL, links_url), video_id, headers={
+                    'Authorization': 'Bearer ' + authorization,
+                })
             links = links_data.get('links') or {}
             metas = metas or links_data.get('meta') or {}
-            sub_path = sub_path or links_data.get('subtitles')
+            sub_path = (sub_path or links_data.get('subtitles')) + '&token=' + token
             error = links_data.get('error')
         title = metas.get('title') or video_info['title']

so, the second part of the key will probably change soon, so it will need to be updated as explained before.

ghost commented 6 years ago

It does not seem to be a problem but:

string.hexdigits == '0123456789abcdefABCDEF' != '0123456789abcdef'
remitamine commented 6 years ago

It does not seem to be a problem but:

it should've been only lower hex digits(0123456789abcdef) like what is used in the website js code.

pushed upstream a version that does not require PyCrypto.

ghost commented 6 years ago

As the key changes daily, it might be nice to use an option like --video-password:

key = self._downloader.params.get('videopassword')
if key is None:
    raise ExtractorError('These subtitles are encrypted by a key, use the --video-password option', expected=True)
...
    bytes_to_intlist(binascii.unhexlify(self._K + key)),
hardcpp commented 6 years ago

Hello can someone help me ? i tried to find the today Key but i can't find it :/

ghost commented 6 years ago

@hardcpp You can use this script (based on the deobfuscation script of remitamine):

key.py ```python #!/usr/bin/env python3 import base64 import os import re import urllib.request from collections import deque import jsbeautifier from jsbeautifier.unpackers import UNPACKERS for unpacker in UNPACKERS: if 'javascriptobfuscator' in unpacker.__name__: def unpack(code): matches = re.search(r"(?s)((?:var\s+)?(_0x[0-9a-f]+)\s*=\s*\[\s*(.+?)\s*\].+?}\(\2\s*,\s*(0x[0-9a-f]+)\)\);\s*)(?:var\s+)?(_0x[0-9a-f]+)", code) if matches: repl_array, shift, repl_func = matches.group(3, 4, 5) repl_array = deque(base64.b64decode(x[1:-1].encode().decode('unicode_escape')).decode().replace(r"'", r"\'") for x in repl_array.split(',')) repl_array.rotate(-int(shift, 16)) code = code[len(matches.group(1)):] code = re.sub(r"%s\('(0x[0-9a-f]+)'\)" % repl_func, lambda mobj: "'%s'" % repl_array[int(mobj.group(1), 16)], code) return code unpacker.unpack = unpack break h = {'User-Agent': 'Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/73.0.3683.103 Safari/537.36'} with urllib.request.urlopen(urllib.request.Request('http://www.animedigitalnetwork.fr/components/com_vodvideo/videojs/adn-vjs.min.js', headers=h)) as adn_min: opts = jsbeautifier.default_options() opts.eol = os.linesep opts.unescape_strings = True code = jsbeautifier.beautify(adn_min.read().decode(), opts) m = re.search(r'(?s)((?:for\s*\(\s*)?var\s+(_0x[0-9a-f]{5,6})\s*,\s*(_0x[0-9a-f]{5,6})\s*=\s*\[\s*\2\s*=\s*0x[0-9a-f]+\s*,\s*\2\s*\+=\s*-?0x[0-9a-f]+\s*,\s*\2\s*\+=\s*-?0x[0-9a-f]+\s*,\s*\2\s*\+=\s*-?0x[0-9a-f]+].*?(_0x[0-9a-f]{5,6})\s*\(\s*\3\s*\[\s*\'map\'\s*]\s*\(\s*function\s*\(\s*(_0x[0-9a-f]{5,6})\s*\)\s*{\s*return\s*\(\s*\'\d{4}\'\s*\+\s*\5\s*\[\s*\'toString\'\s*]\s*\(\s*0x10\s*\)\s*\)\s*\[\s*\'substr\'\s*]\s*\(\s*-\s*0x4\s*\)\s*;\s*}\s*\)\s*\[\s*\'join\'\s*]\s*\(\s*\'\'\s*\)\s*\)\s*;)', code) try: os.remove('key.html') except: pass if m: with open('key.html', 'w') as f: f.write( '''''' ) ```
skid9000 commented 6 years ago

Even with the token it doesn't work :/

[debug] System config: []
[debug] User config: []
[debug] Custom config: []
[debug] Command-line args: ['--username', 'PRIVATE', '--password', 'PRIVATE', '-
-list-sub', '--video-password', 'PRIVATE', 'https://animedigitalnetwork.fr/video
/shimoseka/6637-episode-1-a-qui-servent-les-bonnes-moeurs-et-l-ordre-moral', '-v
']
[debug] Encodings: locale cp1252, fs mbcs, out cp850, pref cp1252
[debug] youtube-dl version 2018.08.04
[debug] Python version 3.4.4 (CPython) - Windows-2012Server-6.2.9200-SP0
[debug] exe versions: ffmpeg N-86848-g03a9e6f, ffprobe N-86848-g03a9e6f
[debug] Proxy map: {}
[ADN] 6637: Downloading webpage
[ADN] 6637: Downloading JSON metadata
[ADN] 6637: Downloading JSON metadata
[ADN] 6637: Downloading m3u8 information
[ADN] 6637: Downloading JSON metadata
[ADN] 6637: Downloading m3u8 information
[ADN] 6637: Downloading webpage
Traceback (most recent call last):
  File "__main__.py", line 19, in <module>
  File "C:\Users\dst\AppData\Roaming\Build archive\youtube-dl\rg3\tmpckoq891b\build\youtube_dl\__init__.py", line 472, in main
  File "C:\Users\dst\AppData\Roaming\Build archive\youtube-dl\rg3\tmpckoq891b\build\youtube_dl\__init__.py", line 462, in _real_main
  File "C:\Users\dst\AppData\Roaming\Build archive\youtube-dl\rg3\tmpckoq891b\build\youtube_dl\YoutubeDL.py", line 2001, in download
  File "C:\Users\dst\AppData\Roaming\Build archive\youtube-dl\rg3\tmpckoq891b\build\youtube_dl\YoutubeDL.py", line 792, in extract_info
  File "C:\Users\dst\AppData\Roaming\Build archive\youtube-dl\rg3\tmpckoq891b\build\youtube_dl\extractor\common.py", line 502, in extract
  File "C:\Users\dst\AppData\Roaming\Build archive\youtube-dl\rg3\tmpckoq891b\build\youtube_dl\extractor\adn.py", line 170, in _real_extract
  File "C:\Users\dst\AppData\Roaming\Build archive\youtube-dl\rg3\tmpckoq891b\build\youtube_dl\extractor\common.py", line 2767, in extract_subtitles
  File "C:\Users\dst\AppData\Roaming\Build archive\youtube-dl\rg3\tmpckoq891b\build\youtube_dl\extractor\adn.py", line 63, in _get_subtitles
UnicodeDecodeError: 'utf-8' codec can't decode byte 0x8a in position 0: invalid start byte

E:\youtube-dl>
ghost commented 5 years ago

someone is trying to fix this ?