ytdl-org / youtube-dl

Command-line program to download videos from YouTube.com and other video sites
http://ytdl-org.github.io/youtube-dl/
The Unlicense
132.67k stars 10.06k forks source link

Facebook Closed Captions not included as subtitles #32489

Open Kumole opened 1 year ago

Kumole commented 1 year ago

Checklist

Description

Facebook videos with closed captions do not show up as subtitles when using the commands ">youtube-dl --list-subs" or ">youtube-dl --write-sub". Example video with closed captions for deaf people: https://fb.watch/m7ChEhVZPt/

dirkf commented 1 year ago

On the positive side, FB extraction is actually working. yt-dlp also fails to find the CC as subtitles.

Is FB serving actual text, or are there different versions of the video, one/some with burned-in subtitles?

dirkf commented 1 year ago

OK, I had a look.

There is a og:locale element in the page that supplies a default language. In the video JSON, there is a captions_url that one can associate with that language. There is also a video_available_captions_locales list with each element being one combination of language, subtitle_type, URL.

After a bit of tweaking, this happened:

$ python -m youtube_dl --list-sub 'https://fb.watch/m7ChEhVZPt/'
[generic] m7ChEhVZPt: Requesting header
[redirect] Following redirect to https://www.facebook.com/CCaptions/videos/410568766182023/
[facebook] 410568766182023: Downloading webpage
[download] Downloading playlist: 410568766182023
[facebook] playlist 410568766182023: Collected 1 video ids (downloading 1 of them)
[download] Downloading video 1 of 1
Available automatic captions for 410568766182023:
Language formats
en-US    srt
410568766182023 has no subtitles
[download] Finished downloading playlist: 410568766182023
$

The URL of those auto-generated subtitles was https://scontent.flhr10-2.fna.fbcdn.net/v/t39.2093-6/51501946_410570236181876_6099645393375592448_n.srt?_nc_cat=101&ccb=1-7&_nc_sid=8d539b&_nc_ohc=BqCVL4SVRo4AX9YiNpF&_nc_ht=scontent.flhr10-2.fna&oh=00_AfDlM7XF7AqsJ-i3cpY0FISX2xLjKgw9vMhMD8s9-ffE7A&oe=64CD04B6.