ytdl-org / youtube-dl

Command-line program to download videos from YouTube.com and other video sites
http://ytdl-org.github.io/youtube-dl/
The Unlicense
132.02k stars 10.01k forks source link

HiDive - fails to download correct subtitle file or detect multiple subtitle files #23830

Open orbitalflower opened 4 years ago

orbitalflower commented 4 years ago

Checklist

Verbose log

$ youtube-dl -v --write-sub --all-subs -f worst "https://www.hidive.com/stream/food-wars/s01e001"
[debug] System config: []
[debug] User config: [u'--no-call-home']
[debug] Custom config: []
[debug] Command-line args: [u'-v', u'--write-sub', u'--all-subs', u'-f', u'worst', u'https://www.hidive.com/stream/food-wars/s01e001']
[debug] Encodings: locale UTF-8, fs UTF-8, out UTF-8, pref UTF-8
[debug] youtube-dl version 2020.01.24
[debug] Python version 2.7.17 (CPython) - Linux-4.15.0-74-generic-x86_64-with-Ubuntu-18.04-bionic
[debug] exe versions: ffmpeg 3.4.6, ffprobe 3.4.6, phantomjs 2.1.1, rtmpdump 2.4
[debug] Proxy map: {}
[HiDive] food-wars/s01e001: Downloading JSON metadata
[HiDive] food-wars/s01e001: Downloading m3u8 information
[HiDive] food-wars/s01e001: Downloading m3u8 information
[HiDive] food-wars/s01e001: Downloading m3u8 information
[info] Writing video subtitles to: food-wars_s01e001-food-wars_s01e001.en.vtt
[debug] Invoking downloader on u'https://www.hidive.com/manifest/child/44e859ea5112f40a78407113c08a1dd9a0fd6096/FDW_s01e001_tv_or_na_ja_xx_HLS360p_16x9_00_v.m3u8'
[hlsnative] Downloading m3u8 manifest
[hlsnative] Total fragments: 151
[download] Destination: food-wars_s01e001-food-wars_s01e001.mp4
[download] 100% of 145.63MiB in 03:31
[debug] ffmpeg command line: ffprobe -show_streams 'file:food-wars_s01e001-food-wars_s01e001.mp4'
[ffmpeg] Fixing malformed AAC bitstream in "food-wars_s01e001-food-wars_s01e001.mp4"
[debug] ffmpeg command line: ffmpeg -y -loglevel 'repeat+info' -i 'file:food-wars_s01e001-food-wars_s01e001.mp4' -c copy -f mp4 '-bsf:a' aac_adtstoasc 'file:food-wars_s01e001-food-wars_s01e001.temp.mp4'

Description

I'm trying to download a video from HiDive with the Japanese audio and English subtitles. However, it only downloads a limited subtitle file intended to accompany the English dub audio. This .vtt file contains Caption lines (for translating on-screen text and song lyrics) but not Subtitle lines (for translating spoken text).

Even using --all-subs, it only detects and downloads this Captions subtitle file. The full Subtitle file definitely exists, and appears in the browser requests. For comparison, it can be found at https://www.hidive.com/caption/vtt/d40c35ce045ef78d47fe727b3690ad27b4d1f8a9/FDW_s01e001_tv_hv_or_ja_en_v02.vtt.

The output of --list-subs only shows one entry for English. If you look at the example URL, there are not only multiple English subtitles but other foreign-language versions which the youtube-dl extractor doesn't detect. Example:

$ youtube-dl --list-subs "https://www.hidive.com/stream/food-wars/s01e001"
[HiDive] food-wars/s01e001: Downloading JSON metadata
[HiDive] food-wars/s01e001: Downloading m3u8 information
[HiDive] food-wars/s01e001: Downloading m3u8 information
[HiDive] food-wars/s01e001: Downloading m3u8 information
Available subtitles for food-wars/s01e001:
Language formats
en       vtt, vtt, vtt

My guess is that "vtt, vtt, vtt" means it's interpreting three separate English-language .vtt subtitle files as if they were separate formats of the same file (a problem, since you can't select between them using --sub-lang), and it's failing to detect the Portuguese and Spanish subs entirely.

Sophira commented 4 years ago

I'm also having the same issues regarding downloading subtitle files, trying to get the Kase-san and Morning Glories OVA: https://www.hidive.com/stream/kase-san-and-morning-glories/2018060901 . It seems to download a subtitle file for a dubbed (English-language) version (my guess, because it doesn't include any dialogue but does include written text), and there's no way to get a subtitle file for the subbed (Japanese-language) version.

darkhelmet2016 commented 4 years ago

I'm also having the same issues regarding downloading subtitle files, trying to get the Kase-san and Morning Glories OVA: https://www.hidive.com/stream/kase-san-and-morning-glories/2018060901 . It seems to download a subtitle file for a dubbed (English-language) version (my guess, because it doesn't include any dialogue but does include written text), and there's no way to get a subtitle file for the subbed (Japanese-language) version.

It contains 2 separate sub files, one is for on screen translation, one for the Japanese audio. The English dub is under Eng Caps but if you want the subs for the Japanese audio is under English subs if they need to fix the program to detect them.

darkhelmet2016 commented 4 years ago

One thing should mention thought is that if you are looking to mux the file that you can only do mp4 video with srt subs and mkv with srt or ass which means you will need to add in a line to covert the subs from vtt. Also want might to mention they have updated their api recently too and that extractor is over 2 years old of hidive here.

orbitalflower commented 4 years ago

Note that there are a limited number of series that you can access without needing an account. The episode in my example, https://www.hidive.com/stream/food-wars/s01e001, is currently one episode which can be accessed without a login, and can be used to test this.

The page https://www.hidive.com/free-episodes will list others. Anything marked "dubbed" has multiple subtitle files and can therefore be used to test this bug.

darkhelmet2016 commented 4 years ago

As for the food wars problem youtube-dl needs to be able to grab the following subs, id="tv_hv_ja_en", id="tv_br_ja_en", id="tv_br_ja_pt", id="tv_br_ja_sp", id="tv_hv_en_xx", and id="tv_hv_en_en" in the extractor. Other shows may require additional detection.

darkhelmet2016 commented 4 years ago

The problem with the extractor for hidive is it needs to know if the video is home video hv or broadcast br and then needs to detect if the subs are English en, Japanese jp, Portuguese pt or Spanish LatAm sp. The https://www.hidive.com/tv page will give you a list of formats for audio and subtitles that will need supporting. It will need sub support for English, Latin American Spanish, Portuguese, Arabic, German, European Spanish, French, Italian, and Russian subs.