ytdl-org / youtube-dl

Command-line program to download videos from YouTube.com and other video sites
http://ytdl-org.github.io/youtube-dl/
The Unlicense
131.67k stars 9.97k forks source link

[Discovery] TLC.com subtitle download returns HTTP error 503 #20722

Open BlohoJo opened 5 years ago

BlohoJo commented 5 years ago

Video is downloading OK using cookies.txt export. Subtitle .scc download however returns HTTP error 503.

Before submitting an issue make sure you have:

What is the purpose of your issue?

If the purpose of this issue is a bug report, site support request or you are not completely sure provide the full verbose output as follows:

Add the -v flag to your command line you run youtube-dl with (youtube-dl -v <your command line>), copy the whole output and insert it here. It should look similar to one below (replace it with your log inserted between triple ```):

E:\USER\temp_ytdl>youtube-dl.exe --all-subs --cookies cookies.txt --verbose "htt
ps://www.tlc.com/tv-shows/my-600-lb-life-where-are-they-now/full-episodes/milla-
charity-2"
[debug] System config: []
[debug] User config: []
[debug] Custom config: []
[debug] Command-line args: ['--all-subs', '--cookies', 'cookies.txt', '--verbose
', 'https://www.tlc.com/tv-shows/my-600-lb-life-where-are-they-now/full-episodes
/milla-charity-2']
[debug] Encodings: locale cp1252, fs mbcs, out cp437, pref cp1252
[debug] youtube-dl version 2019.04.17
[debug] Python version 3.4.4 (CPython) - Windows-2008ServerR2-6.1.7601-SP1
[debug] exe versions: ffmpeg N-89595-g40d4b13228, ffprobe N-89595-g40d4b13228, r
tmpdump 2.4-20151223-gfa8646d-OpenSSL_1.0.2n-x86_64-static
[debug] Proxy map: {}
[Discovery] milla-charity-2: Downloading webpage
[Discovery] milla-charity-2: Downloading JSON metadata
[Discovery] milla-charity-2: Downloading m3u8 information
[debug] Default format spec: bestvideo+bestaudio/best
[info] Writing video subtitles to: Milla & Charity-5c9c286cd1b3ee536126091c.en.s
cc
WARNING: Unable to download subtitle for "en": Unable to download webpage: HTTP
Error 403: Forbidden (caused by HTTPError()); please report this issue on https:
//yt-dl.org/bug . Make sure you are using the latest version; type  youtube-dl -
U  to update. Be sure to call youtube-dl with the --verbose flag and include its
 complete output.
[debug] Invoking downloader on 'https://content-ause2.uplynk.com/fc9bcdc99058411
e8e3bb5484c78cb80/i.m3u8?tc=1&exp=1555736198&rn=218716099&ct=a&cid=fc9bcdc990584
11e8e3bb5484c78cb80&ad.pingf=3&pp2ip=0&ad.cping=1&ad=fw&rays=cdefghiba&v=2&ad.cu
stomer_id=&ad.nw=&ad.prof=&ad.csid=&ad.vip=147.135.22.19&ap.use=0&sig=08c0cdc32e
bf71b2163f25a971874133fb8a015d30c07ab8778ec12a2eaf4247&pbs=922c621bcb0f465db13d9
7eeea9bf5e0'
[hlsnative] Downloading m3u8 manifest
[hlsnative] Total fragments: 1268
[download] Destination: Milla & Charity-5c9c286cd1b3ee536126091c.mp4
[download] 100% of 3.09GiB in 05:07
[debug] ffmpeg command line: ffprobe -show_streams "file:Milla & Charity-5c9c286
cd1b3ee536126091c.mp4"
[ffmpeg] Fixing malformed AAC bitstream in "Milla & Charity-5c9c286cd1b3ee536126
091c.mp4"
[debug] ffmpeg command line: ffmpeg -y -loglevel "repeat+info" -i "file:Milla &
Charity-5c9c286cd1b3ee536126091c.mp4" -c copy -f mp4 "-bsf:a" aac_adtstoasc "fil
e:Milla & Charity-5c9c286cd1b3ee536126091c.temp.mp4"
...
<end of log>
dnlzzxz commented 5 years ago

Hello friend, even though you can't see it in common players, the subtitles are already embedded on the video.

To extract the subtitle to a separate file, .srt for example, you'll need to download and install ccextractor in this link:

https://github.com/CCExtractor/ccextractor/releases/download/v0.87/ccextractor_0.87_windows_installer.exe

After installing you'll be able to use it on the command line. Assuming you don't properly know how to do this, follow the instructions:

1 - create a new text file called ccextractor and paste the following code on it:

for %%A IN (*.mp4) DO ccextractorwin -out=srt -bom -latin1 "%%A"

and save it.

where there's the ".mp4" you change for the extension of the video you are intending to extract the subtitles, and where there's the "-out=srt" you change for the desired format of subtitle file to create. or leave it as it is, since it should work out-of-the-box.

2 - now, you have a file called ccextractor.txt and you must rename it to ccextractor.bat .

3 - now, you have an executable that will extract all the subtitles of the files with the .mp4 extension on the folder.

4 - for this to work put all the videos you want to extract subtitles in a folder, then put the file ccextractor.bat in that folder, then double click on the ccextractor.bat file, and you should see a command line with the percentage of the extraction progress. just wait!

ps: the subtitle file will be the name of the video file + .srt

Hope that helps you.

dnlzzxz commented 5 years ago

don't forget to close the issue!

BlohoJo commented 5 years ago

Thanks very much for the info.

Issue should remain open (unless it's duplicated; I can't find it on a search but I may be missing it), because ytdl returns an error instead of something like a message indicating ssa/ass embedded subtitles. In fact, I believe ytdl should be able to extract them using ffmpeg -f lavfi -i movie="Movie.mp4"[out+subcc] -map 0:1 "Movie.ass". I just tried it and it works, but it's extremely slow.

dnlzzxz commented 5 years ago

Alright then, well there are many ways of extracting the subtitles from the video file, like a said the ccextractor one is the easiest and fastest I found.

The YOUTUBE_DL developers are amazing people, and they provide as much support as they considering that's open-source project and they don't profit over it. So, save then the trouble..

ghost commented 5 years ago

Excatly same problem on any discovery sites. Issue happen at latest video above feb 2019. Old videos still work Is there a way to extract sub without having to download entire video? I use SVP4 and MPV to play video directly with ytdl

ghost commented 5 years ago

I'm using SVP4 too. It's very neat to use to watch without downloading

I used CCextractor GUI and it worked but big disadvange is you need to download whole file which takes 1 and a half hour (1h 30m), excatly like you are watching the video. If you download just 1 min and cancel it, you will only get subtitle up to 1 min.

Problem is the file "https://dsc-dp-encoding.s3.amazonaws.com/longform/2019-06/05/187627.001.01.001-275017.scc" is access denied

I think only behind the browser have access to it somewhat

Could youtube-dl be improved to emulate to try to access scc file?

And are there workaround without downloading entire file like extracting subtitle from cache or memory?

cbussa commented 4 years ago

TL;DR - Video embedded subs: it's a confusing warning that that "--list-subs" shows an available entry that "--write-sub" can't promptly fetch.

Someone might consider change the youtube-dl error: "Unable to download subtitle HTTP Error 403" to maybe: "Unable to fetch independent subtitle file; check to see if subtitles are already embedded within the video".

Hi all. Minor update per above on a TLC video. --- https://www.tlc.com/tv-shows/taken-at-birth/full-episodes/taken-at-birth-sneak-peek

The Youtube-dl program fetches the video just fine -- I didn't realize (or forgot) that it'd do more than just YT, too!

Playing said video with my local programs (VLC, PotPlayer, and (UGH) WMP) all work fine.)

But! None of these seem to display the embedded subtitles, although I see and use these programs on EVERYthing, sometimes with only internal, and sometimes external subtitles (SRT usually.)

Installing ccextractor mentioned above successfully extracts the subtitles to an external file, and suddenly all of the video players suddenly show the subtitles as well.

If the subs are interleaved with the video, then you'll HAVE to get the video, of course.

I'm only slightly confused why my display programs that always seem to work won't seem to display the known-existing pre-embedded subs, while for years they've handled ever other case just fine.

cookieguru commented 4 years ago

See https://github.com/ytdl-org/youtube-dl/issues/19184#issuecomment-487697032 for the structure that Discovery is returning for video stream info. Besides returning the obvious stream URL, it also includes an array of subs. The scc format always returns a 403. Sometimes there are vtt or other formats for subs as well. However Discovery has taken to embedding the subs in the video itself--and the embedded subs are shown in the web player.

Changing the 403 message to something else would be a red herring. The message as-is is indicative of the error that's actually occurring. Rather the logic should be updated to skip the scc subs (see here)

bobbintb commented 4 years ago

Same issue with FoodNetwork, which I believe is part of the same company.

cookieguru commented 4 years ago

@bobbintb The Food Network is indeed part of Discovery but the previous post about having subs embedded still stands.