ytdl-org / youtube-dl

Command-line program to download videos from YouTube.com and other video sites
http://ytdl-org.github.io/youtube-dl/
The Unlicense
132.03k stars 10.01k forks source link

[ADN] How can I fix the subtitle support for the ADN extractor? #22035

Open cchudant opened 5 years ago

cchudant commented 5 years ago

Checklist

Question

Hello! I want to fix the subtitle support in adn.py. The current situation is explained in #12724. Right now, there is a key hardcoded here in the code:

bytes_to_intlist(binascii.unhexlify(self._K + '4b8ef13ec1872730')),

This key changes every day, so poeple have been modifying the extractor code in their installation to get the subtitle extraction working -- and they have been changing the key every day. This is obviously not user friendly.

Youtubedl should automatically get the key as part of the subtitle retrieval process, and that is what I want to implement. I know I need a JS Interpreter since the remote JS file that has the key changes every day and you can't easily get it with a regex or something similar because the key doesn't appear in the code; it is the result of an obfuscated js computation.

Should I use PhantomJS or another interpreter to do it?

I've found that you can get the key by executing

videojs.players["adn-video-js"].onChromecastCustomData().key

when an episode page is loaded. That's kinda hacky but it works :p

There are many more ways to do it (like proxying the CryptoJS object before the dom load event) but I found no easier way than that, because the variable is deeply nested in obfuscated code.

More importantly; this code uses calls to videojs (one of the dependencies ADN uses) so I think this code won't break for while.

Thank you!

ghost commented 5 years ago

Why do you wanna add something that's already supported by youtube-dl ? You just need to type --write-sub --sub-lang "fr" or --write-sub --sub-lang "vf" to get them.

I should have read before posting a comment, your method of getting the key is quite nice, but could it be integrated into youtube-dl in any way? The last time someone asked for the key to be changed automatically every day, it was a categorical refusal.

cchudant commented 5 years ago

@Asusagawa The key changes automatically every day You could use a regex but the obfuscated JS code changes everyday, so what we really need here is to use a JS interpreter -- to execute the obfuscated Javascript where the key generation is done; and get the key.

I mentionned PhantomJS because the Openload extractor uses it. It is a scriptable headless browser, meaning it is a browser that has no interface and you can control it using python. (if you are familiar with Puppeteer; it is exactly the same thing) This way, you can execute javascript as a normal user would.

I don't really know exactly how PhantomJS is used in the openload extractor, but I see it as a pretty good solution for this problem.

The last time someone asked for the key to be changed automatically every day, it was a categorical refusal

could you please show me where you read this?

Thank you!

ghost commented 5 years ago

The key changes automatically every day You could use a regex but the obfuscated JS code changes everyday, so what we really need here is to use a JS interpreter -- to execute the obfuscated Javascript where the key generation is done; and get the key.

Yeah, I know, I've got a modified extractor to do that.

I mentionned PhantomJS because the Openload extractor uses it. It is a scriptable headless browser, meaning it is a browser that has no interface and you can control it using python. (if you are familiar with Puppeteer; it is exactly the same thing) This way, you can execute javascript as a normal user would.

I don't really know exactly how PhantomJS is used in the openload extractor, but I see it as a pretty good solution for this problem.

Should be a great idea if it's accepted.

could you please show me where you read this?

Can't find it anymore but pretty sure that I've read this somewhere.

cchudant commented 5 years ago

Yeah, I know, I've got a modified extractor to do that.

do you mean it is supporting key changes?

ghost commented 5 years ago

do you mean it is supporting key changes?

Yes, it'll change the key in the adn.py file if needed automatically.

cchudant commented 5 years ago

so you basically have an external program/script that changes the adn.py file that's a strange but interesting way to solve the problem; what i meant was to have the script itself get the key without modifying any file

cchudant commented 5 years ago

I edited the issue too make it more clear.

@remitamine sorry for the mention; but I think you might be able to help me here I don't want to start implementing something when you might have a better alternative or prefer using another interpreter Thank you!

remitamine commented 5 years ago

it's fine to use PhantomJS(using the common code from openload.py) if it's available to automatically get the second part of the key.

cchudant commented 5 years ago

Thank you!! I'll post my questions here if I have any during implementation. I'll do a PR ASAP!

cchudant commented 5 years ago

PR is here #22150!

PR3D4T0R8778 commented 4 years ago

What is this ?

https://subtitle1.animedigitalnetwork.fr/4HlcBFY5ow7uAP%2BWFXEmgEw7QDsSAdc1FfuaYDwpBJxSaL9Eb5EvlLyobemEb1u5pFMXwYJ9yZwaapxcY_THGNFo7nXq_WC%2BJkwyBKQyjHDRnmBa0j5_EGKlnB5s2aBqHk03Cd2n0n6QPwjgrp4OLIS_8ysXMqk35fXPmCUWUw0%3D-lXQf11w84g4UNpu4SOwTdQ%3D%3D.json

and :

{links: {vostf: {,…}}, video: {id: "10210",…},…} links: {vostf: {,…}} video: {id: "10210",…} meta: {title: "BORUTO - NARUTO NEXT GENERATIONS - Épisode 137",…} savedCurrentTime: "468" subtitles: "//animedigitalnetwork.fr/loadbalance/subtitles/4HlcBFY5ow7uAP%2BWFXEmgEw7QDsSAdc1FfuaYDwpBJxSaL9Eb5EvlLyobemEb1u5pFMXwYJ9yZwaapxcY_THGNFo7nXq_WC%2BJkwyBKQyjHDRnmBa0j5_EGKlnB5s2aBqHk03Cd2n0n6QPwjgrp4OLIS_8ysXMqk35fXPmCUWUw0%3D-lXQf11w84g4UNpu4SOwTdQ%3D%3D.json" previousVideoUrl: "/index.php?option=com_vodvideo&task=player.videos&format=raw&fulllinks=1&free=1&video=10209"

gamersalpha commented 4 years ago

Hello there i thinks this my problem here :

so i try to get subtiles from ADN video :

i got to try : on with python 2


stagiaire@srv-web-01:~$ youtube-dl -v --write-sub https://animedigitalnetwork.fr/video/naruto/4189-episode-1-et-voici-naruto-uzumaki
[debug] System config: []
[debug] User config: []
[debug] Custom config: []
[debug] Command-line args: [u'-v', u'--write-sub', u'https://animedigitalnetwork.fr/video/naruto/4189-episode-1-et-voici-naruto-uzumaki']
[debug] Encodings: locale UTF-8, fs UTF-8, out UTF-8, pref UTF-8
[debug] youtube-dl version 2020.03.24
[debug] Python version 2.7.17 (CPython) - Linux-4.15.0-96-generic-x86_64-with-Ubuntu-18.04-bionic
[debug] exe versions: ffmpeg 3.4.6, ffprobe 3.4.6
[debug] Proxy map: {}
[ADN] 4189: Downloading webpage
[ADN] 4189: Downloading player config JSON metadata
[ADN] 4189: Downloading links JSON metadata
[ADN] 4189: Downloading vostf mobile JSON metadata
[ADN] 4189: Downloading m3u8 information
[ADN] 4189: Downloading vostf sd JSON metadata
[ADN] 4189: Downloading m3u8 information
[ADN] 4189: Downloading subtitles location
[ADN] 4189: Downloading subtitles data
Traceback (most recent call last):
  File "/usr/lib/python2.7/runpy.py", line 174, in _run_module_as_main
    "__main__", fname, loader, pkg_name)
  File "/usr/lib/python2.7/runpy.py", line 72, in _run_code
    exec code in run_globals
  File "/usr/local/bin/youtube-dl/__main__.py", line 19, in <module>
  File "/usr/local/bin/youtube-dl/youtube_dl/__init__.py", line 474, in main
  File "/usr/local/bin/youtube-dl/youtube_dl/__init__.py", line 464, in _real_main
  File "/usr/local/bin/youtube-dl/youtube_dl/YoutubeDL.py", line 2019, in download
  File "/usr/local/bin/youtube-dl/youtube_dl/YoutubeDL.py", line 797, in extract_info
  File "/usr/local/bin/youtube-dl/youtube_dl/extractor/common.py", line 530, in extract
  File "/usr/local/bin/youtube-dl/youtube_dl/extractor/adn.py", line 204, in _real_extract
  File "/usr/local/bin/youtube-dl/youtube_dl/extractor/common.py", line 2892, in extract_subtitles
  File "/usr/local/bin/youtube-dl/youtube_dl/extractor/adn.py", line 80, in _get_subtitles
UnicodeDecodeError: 'ascii' codec can't decode byte 0xf4 in position 0: ordinal not in range(128)

and the second with python 3

stagiaire@web-02:/usr/local/bin$ youtube-dl -v --write-sub https://animedigitalnetwork.fr/video/naruto/4189-episode-1-et-voici-naruto-uzumaki
[debug] System config: []
[debug] User config: []
[debug] Custom config: []
[debug] Command-line args: ['-v', '--write-sub', 'https://animedigitalnetwork.fr/video/naruto/4189-episode-1-et-voici-naruto-uzumaki']
[debug] Encodings: locale UTF-8, fs utf-8, out utf-8, pref UTF-8
[debug] youtube-dl version 2020.03.24
[debug] Python version 3.8.2 (CPython) - Linux-5.4.0-26-generic-x86_64-with-glibc2.29
[debug] exe versions: none
[debug] Proxy map: {}
[ADN] 4189: Downloading webpage
[ADN] 4189: Downloading player config JSON metadata
[ADN] 4189: Downloading links JSON metadata
[ADN] 4189: Downloading vostf mobile JSON metadata
[ADN] 4189: Downloading m3u8 information
[ADN] 4189: Downloading vostf sd JSON metadata
[ADN] 4189: Downloading m3u8 information
[ADN] 4189: Downloading subtitles location
[ADN] 4189: Downloading subtitles data
Traceback (most recent call last):
  File "/usr/lib/python3.8/runpy.py", line 193, in _run_module_as_main
    return _run_code(code, main_globals, None,
  File "/usr/lib/python3.8/runpy.py", line 86, in _run_code
    exec(code, run_globals)
  File "/usr/local/bin/youtube-dl/__main__.py", line 19, in <module>
  File "/usr/local/bin/youtube-dl/youtube_dl/__init__.py", line 474, in main
  File "/usr/local/bin/youtube-dl/youtube_dl/__init__.py", line 464, in _real_main
  File "/usr/local/bin/youtube-dl/youtube_dl/YoutubeDL.py", line 2018, in download
  File "/usr/local/bin/youtube-dl/youtube_dl/YoutubeDL.py", line 797, in extract_info
  File "/usr/local/bin/youtube-dl/youtube_dl/extractor/common.py", line 530, in extract
  File "/usr/local/bin/youtube-dl/youtube_dl/extractor/adn.py", line 204, in _real_extract
  File "/usr/local/bin/youtube-dl/youtube_dl/extractor/common.py", line 2892, in extract_subtitles
  File "/usr/local/bin/youtube-dl/youtube_dl/extractor/adn.py", line 80, in _get_subtitles
UnicodeDecodeError: 'utf-8' codec can't decode byte 0xb4 in position 2: invalid start byte

i got the same error

UnicodeDecodeError: 'utf-8' codec can't decode byte 0xb4 in position 2: invalid start byte

do u know how fix this

thanks

PierreC93 commented 3 years ago

i want to fix this too :(