Closed wiiaboo closed 2 years ago
i propose 2 solution for this:
@remitamine both flawed as well as current approach. The reasonable solution would be a customizable extraction behavior (in particular for crunchyroll - subtitles decryption) that will be used by subtitles extractor or even a postprocessor.
I had a similar problem while writing #6144. I ended up solving it with a few kludges to plug the downloader infrastructure into subtitle downloading (commit acbc6d38660092e90c4ab36110b30355d26c4363), but I'm not particularly proud of it.
Seems to be an issue not just with subtitles but with resolutions too. At least on my connection, it takes half-a-dozen seconds for each "media info" page to download, even if I just request one resolution.
I'm having a similar problem: youtube-dl don't download just requested subtitles. Take a look at this example running:
$ youtube-dl --verbose --sub-lang "en,es" http://www.ted.com/talks/lang/es/john_hodgman_s_brief_digression[debug] System config: []
[debug] User config: []
[debug] Command-line args: [u'--restrict-filenames', u'--retries', u'50', u'--continue', u'--verbose', u'--sub-lang', u'en,es', u'http://www.ted.com/talks/lang/es/john_hodgman_s_brief_digression']
[debug] Encodings: locale UTF-8, fs UTF-8, out UTF-8, pref UTF-8
[debug] youtube-dl version 2015.08.09
[debug] Git HEAD: 9f3da13
[debug] Python version 2.7.6 - Linux-3.13.0-57-generic-x86_64-with-Ubuntu-14.04-trusty
[debug] exe versions: avconv 9.18-6, avprobe 9.18-6
[debug] Proxy map: {}
[ted] john_hodgman_s_brief_digression: Downloading webpage
[ted] john_hodgman_s_brief_digression: Extracting information
[ted] john_hodgman_s_brief_digression: Downloading m3u8 information
WARNING: Your copy of avconv is outdated and unable to properly mux separate video and audio files, youtube-dl will download single file media. Update avconv to version 10-0 or newer to fix this.
[debug] Invoking downloader on u'http://download.ted.com/talks/JohnHodgman_2008-480p.mp4?apikey=489b859150fc58263f17110eeb44ed5fba4a3b22'
[download] Resuming download at byte 1865239
[download] Destination: John_Hodgman_-_Una_breve_digresi_n_sobre_asuntos_del_tiempo_perdido-374.mp4
[download] 2.9% of 110.46MiB at 98.36KiB/s ETA 18:36^C
ERROR: Interrupted by user
$
But if I list the subtitles, they appear:
$ youtube-dl --verbose --list-subs http://www.ted.com/talks/lang/es/john_hodgman_s_brief_digression
[debug] System config: []
[debug] User config: []
[debug] Command-line args: [u'--restrict-filenames', u'--retries', u'50', u'--continue', u'--verbose', u'--list-subs', u'http://www.ted.com/talks/lang/es/john_hodgman_s_brief_digression']
[debug] Encodings: locale UTF-8, fs UTF-8, out UTF-8, pref UTF-8
[debug] youtube-dl version 2015.08.09
[debug] Git HEAD: 9f3da13
[debug] Python version 2.7.6 - Linux-3.13.0-57-generic-x86_64-with-Ubuntu-14.04-trusty
[debug] exe versions: avconv 9.18-6, avprobe 9.18-6
[debug] Proxy map: {}
[ted] john_hodgman_s_brief_digression: Downloading webpage
[ted] john_hodgman_s_brief_digression: Extracting information
[ted] john_hodgman_s_brief_digression: Downloading m3u8 information
Available subtitles for 374:
Language formats
el srt, ted
en srt, ted
it srt, ted
ar srt, ted
pt-br srt, ted
cs srt, ted
es srt, ted
ru srt, ted
nl srt, ted
pt srt, ted
zh-tw srt, ted
tr srt, ted
zh-cn srt, ted
ro srt, ted
pl srt, ted
fr srt, ted
bg srt, ted
hr srt, ted
de srt, ted
hu srt, ted
ja srt, ted
he srt, ted
sr srt, ted
ko srt, ted
sv srt, ted
$
Thanks!
You need --write-sub
in addition to --sub-lang
. --sub-lang
just selects the ones to download. --all-subs
doesn't need --write-sub
.
@wiiaboo thanks a lot! It worked! I think it shouldn't be necessary to add that option, it doesn't make sense for me :)
There's another way to associate the language names with the codes by reading the page language selection. Example:
languages = {k: v for (v, k) in re.findall(r';([a-z]{2}[A-Z]{2})[^ ]+ data-language="([^"]+)', webpage)}
Is there any way for _get_subtitles or _extract_subtitles to know which languages were requested?
Problem
Using --sub-lang to request one or two subtitles from Crunchyroll doesn't just extract the requested subtitles, but instead extracts all of them, leading to big delays before starting the stream, whether you use
--all-subs
or just--sub-lang enUS
. In the case of sites where the subs just point to a certain URL, the extraction seems faster, so it's probably more of a problem for sites like Crunchyroll where you extract the full subtitles.Solution 1
At least for sites like Crunchyroll, just extract the requested languages.
Solution 2
Add an option that just extracts the requested languages?
I should probably also mention that this is mostly useful when you want to stream the resulting URL, like through mpv. When you're just using youtube-dl directly to download the video the time extracting the subs is probably not an issue either.