ytdl-org / youtube-dl

Command-line program to download videos from YouTube.com and other video sites
http://ytdl-org.github.io/youtube-dl/
The Unlicense
131.23k stars 9.93k forks source link

Issue on PBS #30205

Open gahooten opened 2 years ago

gahooten commented 2 years ago

No valid format found. https://www.pbs.org/wgbh/nova/video/nova-universe-revealed-milky-way/

dirkf commented 2 years ago

Please follow the Broken site support template.

Also, "This video is currently not available."

dirkf commented 2 years ago

Thanks, you answered one of the questions that OP would have been asked in the template.

The text is what the PBS player pages ('http://player.pbs.org/%s/nova-universe-revealed-milky-way' % (p, ) for p in ('widget/partnerplayer', 'portalplayer')) are telling the extractor, in the UK with a US IP in X-Fowarded-For.

If your browser trace shows a different player page that would potentially explain why the extractor isn't finding the show.

hansworzt commented 2 years ago

Also, "This video is currently not available."

An approach to find the PBS media URL is to use a VPN for the US (if you're outside), reload the page while watching the Network tab in the Firefox browser, looking for the .m3u8 addresses, and change it to the wanted resolution. The Full HD address for the video thus becomes: https://ga.video.cdn.pbs.org/videos/nova/dafd0447-47b1-4c12-b293-2f6a6a0d5114/2000262126/hd-16x9-mezzanine-1080p/nova4816_r-hls-16x9-1080p-1080p-6500k.m3u8 Other possible URL endings for the lower resolutions are '-16x9-1080p-432p-1100k.m3u8', '-16x9-1080p-540p-2000k.m3u8', '-16x9-1080p-720p-3000k.m3u8', and '-16x9-1080p-720p-4500k.m3u8'. The audio address: https://ga.video.cdn.pbs.org/videos/nova/dafd0447-47b1-4c12-b293-2f6a6a0d5114/2000262126/hd-16x9-mezzanine-1080p/nova4816_r-hls-16x9-1080pAudio%20Selector%201.m3u8 Can then be downloaded with yt-dlp without the use of a VPN, and finally muxed (audio + video, into MKV) for offline viewing. It's even possible to download the subtitles and add them to the MKV file.

hansworzt commented 2 years ago

This is the general approach I use to get at the goodies:

PBS/Nova - Universe Revealed - Milky Way: https://www.pbs.org/video/nova-universe-revealed-milky-way-4io957/ The page is georestricted and behind a paywall. However, both restrictions can easily be circumvented ;-)

Step 1: Look at the source code of the page, searching for the keyword 'viralplayer' : https://player.pbs.org/viralplayer/3060574923/

Step 2: Use a free VPN plugin in the browser, and select a US server. Refresh the page https://www.pbs.org/video/nova-universe-revealed-milky-way-4io957/ whilst in VPN mode.

Step 3: Hit the F12 key in Mozilla Firefox, and look at the contents of the Network tab, similar options exist in other browsers: https://image.pbs.org/video-assets/TG0K04g-asset-mezzanine-16x9-I27x8P4.jpg?crop=448x250&format=webp Change to https://image.pbs.org/video-assets/TG0K04g-asset-mezzanine-16x9-I27x8P4.jpg?crop=1280x720&format=webp gives you the cover image in the desired quality and format.

Step 4: https://ga.video.cdn.pbs.org/captions/nova/dafd0447-47b1-4c12-b293-2f6a6a0d5114/captions/LyVOJn_caption.vtt gives you the captions in VTT format. Transform to SRT: ffmpeg.exe -i LyVOJn_caption.vtt captions.srt

Step 5: https://ga.video.cdn.pbs.org/videos/nova/dafd0447-47b1-4c12-b293-2f6a6a0d5114/2000262126/hd-16x9-mezzanine-1080p/nova4816_r-hls-16x9-1080p-432p-1100k.m3u8 https://ga.video.cdn.pbs.org/videos/nova/dafd0447-47b1-4c12-b293-2f6a6a0d5114/2000262126/hd-16x9-mezzanine-1080p/nova4816_r-hls-16x9-1080pAudio%20Selector%201.m3u8 The first URL is the video only, the second URL is the audio only.

Step 6: Getting the video and audio streams no longer requires a VPN.

Step 7: Download the video as 1280/720 px. Change the last part in the video URL to the desired quality (if available):

Step 8: Download the audio. yt-dlp.exe -f 0 "https://ga.video.cdn.pbs.org/videos/nova/dafd0447-47b1-4c12-b293-2f6a6a0d5114/2000262126/hd-16x9-mezzanine-1080p/nova4816_r-hls-16x9-1080pAudio Selector 1.m3u8" -o audio.mp4 The URL must be quoted as the %20 have been substituted by spaces.

Step 9: Extract the video and audio components: ffmpeg.exe -i video.mp4 -vcodec copy -an -sn video.h264 ffmpeg.exe -i audio.mp4 -c:a copy -vn -sn audio.aac

Step 10: Finally, multiplex both streams into "movie.mp4" - losslessly : ffmpeg.exe -i video.mp4 -vcodec copy -an -sn video.h264 ffmpeg.exe -i audio.mp4 -c:a copy -vn -sn audio.aac ffmpeg.exe -loglevel 0 -i video.h264 -i audio.aac -c:v copy -c:a copy movie.mp4

Step 11: Delete the temporary files that are no longer needed.

dirkf commented 2 years ago

Does yt-dl (or possibly yt-dlp) not fetch this show correctly using a VPN?

If the VPN is accessed by a proxy, --geo-verification-proxy ... should allow you to use the VPN for extraction and but not downloading.

dirkf commented 2 years ago

The second page format has a gigantic __NEXT_DATA__ hydration JSON (including the transcript of the show). The props.pageProps.video.data.episodes member is a list of episodes. The item whose .episode.slug starts with the slug from the URL (nova-universe-revealed-milky-way) is the target episode. The item in its .assets with .object_type == 'full_length' is the video for the episode, and the .slug from that item ('nova-universe-revealed-milky-way-4io957') can be used to make the generic PBS URL. Also, .player_code gives the IFRAME enbedding code, including src attribute '//player.pbs.org/partnerplayer/wwGgFRSNeKGrsgjdYh6efQ==/?topbar=false&end=0&endscreen=true&start=0&autoplay=false' that could be downloaded (subject to geo-restriction) and processed with the _extract_video_data() method of PBSIE.

Maybe this can be extended to other .../station/series/... shows?