yt-dlp / yt-dlp

A feature-rich command-line audio/video downloader
https://discord.gg/H5MNcFW63r
The Unlicense
82.05k stars 6.4k forks source link

Unable to download multiple subtitles of same language #946

Open lfer94 opened 2 years ago

lfer94 commented 2 years ago

Checklist

Verbose log

yt-dlp -v --list-subs https://cmaf.lln.latam.hbomaxcdn.com/videos/GYPGKMQjoDkVLBQEAAAAo/1/1b5ad5/1_single_J8sExA_1080hi.mpd
[debug] Command-line config: ['-v', '--list-subs', 'https://cmaf.lln.latam.hbomaxcdn.com/videos/GYPGKMQjoDkVLBQEAAAAo/1/1b5ad5/1_single_J8sExA_1080hi.mpd']
[debug] Encodings: locale cp1252, fs utf-8, out utf-8, pref cp1252
[debug] yt-dlp version 2021.09.02 (exe)
[debug] Python version 3.8.10 (CPython 64bit) - Windows-8.1-6.3.9600-SP0
[debug] exe versions: ffmpeg 2021-06-23-git-947122f111-full_build-www.gyan.dev, ffprobe 2021-06-23-git-947122f111full_build-www.gyan.dev
[debug] Optional libraries: mutagen, pycryptodome, sqlite, websockets
[debug] Proxy map: {}
[debug] [generic] Extracting URL: https://cmaf.lln.latam.hbomaxcdn.com/videos/GYPGKMQjoDkVLBQEAAAAo/1/1b5ad5/1_single_J8sExA_1080hi.mpd
[generic] 1_single_J8sExA_1080hi: Requesting header
WARNING: [generic] Falling back on generic information extractor.
[generic] 1_single_J8sExA_1080hi: Downloading webpage
[generic] 1_single_J8sExA_1080hi: Extracting information
[debug] Formats sorted by: hasvid, ie_pref, lang, quality, res, fps, vcodec:vp9.2(10), acodec, filesize, fs_approx, tbr, vbr, abr, asr, proto, vext, aext, hasaud, source, id
Available subtitles for 1_single_J8sExA_1080hi:
Language Formats
en-US    vtt
es-419   vtt, vtt
pt-BR    vtt, vtt

Description

Hi. I have been trying to download a webtt subtitle from H*O M*X. When I run the command posted in verbose log, yt-dlp shows that there's two availables subtitles in spanish (complete and forced) but it only downloads one.

ffmpeg also shows that every subtitle has their own ID, but it can't download any of them because of... you know.

    Side data:
      unknown side data type 24 (842 bytes)
  Stream #0:30(en-US): Subtitle: webvtt
    Metadata:
      id              : t2
  Stream #0:31(es-419): Subtitle: webvtt
    Metadata:
      id              : t8
  Stream #0:32(es-419): Subtitle: webvtt
    Metadata:
      id              : t4
  Stream #0:33(pt-BR): Subtitle: webvtt
    Metadata:
      id              : t0
  Stream #0:34(pt-BR): Subtitle: webvtt
    Metadata:
      id              : t6

I've used the following commands, but the result is always the same:

yt-dlp --skip-download --allow-unplayable-formats --write-subs --sub-langs es-419 --sub-format vtt "https://cmaf.lln.latam.hbomaxcdn.com/videos/GYPGKMQjoDkVLBQEAAAAo/1/1b5ad5/1_single_J8sExA_1080hi.mpd"

yt-dlp --skip-download --allow-unplayable-formats --write-subs --sub-langs all --sub-format vtt "https://cmaf.lln.latam.hbomaxcdn.com/videos/GYPGKMQjoDkVLBQEAAAAo/1/1b5ad5/1_single_J8sExA_1080hi.mpd"

I need to download the second one. Any suggestion?

pukkandan commented 2 years ago

This is currently not possible with yt-dlp directly. The only way you can do this right now is to download the infojson with --write-infojson, remove the es-419 sub u dont want and then load it back with --load-info

chrizilla commented 2 years ago

@lfer94 said: Any suggestion?

Maybe this?

--write-all-thumbnails handles it this way:

videotitle.1.jpg
videotitle.2.jpg
videotitle.3.jpg
etc.

Couldn't it be handled the same way? So 2 subs have the same language code, both are downloaded like this:

videotitle.1.en.vtt
videotitle.2.en.vtt
desseim commented 1 year ago

This is currently not possible with yt-dlp directly. The only way you can do this right now is to download the infojson with --write-infojson, remove the es-419 sub u dont want and then load it back with --load-info

This is a nice suggestion, thanks.

I personally filter the json file with jq '.subtitles |= ([ path(.[][]) as $p | {"key": $p | join("_"), "value": [getpath($p)]} ] | from_entries)' (or jq '.subtitles |= . as $s | reduce path(.[][]) as $p ({}; . + { ($p | join("_")): [ $s | getpath($p) ] })', it works equally). This should modify a list of subtitles in a same language like this one:

Language Formats
en       vtt, vtt, mp4
de       vtt, vtt

to separate subtitle entries for each format, like this:

Language Formats
en_0      mp4
en_1      vtt
en_2      vtt
de_0      vtt
de_1      vtt

allowing them to be easily downloaded separately or all at once.

If you have to do this often, it can come in more handy than editing the file manually, at least until the feature gets implemented.