ytdl-org / youtube-dl

Command-line program to download videos from YouTube.com and other video sites
http://ytdl-org.github.io/youtube-dl/
The Unlicense
129.87k stars 9.8k forks source link

[ThePlatform] Ads in m3u8 file causes incomplete download #25319

Closed rredford6 closed 4 years ago

rredford6 commented 4 years ago

Checklist

Verbose log

I'm using a provider that doesn't have a MSO login. It fails with --cookies:

[debug] System config: []
[debug] User config: []
[debug] Custom config: []
[debug] Command-line args: ['--verbose', 'https://www.nbc.com/the-profit/video/an-inside-look-kota-longboards/4117074', '--cookies', 'D:\\path\\to\\cookies.txt']
[debug] Encodings: locale cp1252, fs mbcs, out cp437, pref cp1252
[debug] youtube-dl version 2020.05.08
[debug] Python version 3.4.4 (CPython) - Windows-10-10.0.17763
[debug] exe versions: ffmpeg N-91481-gb8c4d2b2ed, ffprobe N-91481-gb8c4d2b2ed
[debug] Proxy map: {}
[NBC] 4117074: Downloading JSON metadata
ERROR: This video is only available for users of participating TV providers. Use --ap-mso to specify Adobe Pass Multiple-system operator Identifier and --ap-username and --ap-password or --netrc to provide account credentials.
Traceback (most recent call last):
  File "C:\Users\dst\AppData\Roaming\Build archive\youtube-dl\ytdl-org\tmp4z7swgz7\build\youtube_dl\YoutubeDL.py", line 797, in extract_info
  File "C:\Users\dst\AppData\Roaming\Build archive\youtube-dl\ytdl-org\tmp4z7swgz7\build\youtube_dl\extractor\common.py", line 530, in extract
  File "C:\Users\dst\AppData\Roaming\Build archive\youtube-dl\ytdl-org\tmp4z7swgz7\build\youtube_dl\extractor\nbc.py", line 142, in _real_extract
  File "C:\Users\dst\AppData\Roaming\Build archive\youtube-dl\ytdl-org\tmp4z7swgz7\build\youtube_dl\extractor\adobepass.py", line 1414, in _extract_mvpd_auth
  File "C:\Users\dst\AppData\Roaming\Build archive\youtube-dl\ytdl-org\tmp4z7swgz7\build\youtube_dl\extractor\adobepass.py", line 1379, in raise_mvpd_required
youtube_dl.utils.ExtractorError: This video is only available for users of participating TV providers. Use --ap-mso to specify Adobe Pass Multiple-system operator Identifier and --ap-username and --ap-password or --netrc to provide account credentials.

Since this is just a page with an embedded ThePlayer iframe, document.getElementById('player').src reveals the source to the actual player instance: The real log:

[debug] System config: []
[debug] User config: []
[debug] Custom config: []
[debug] Command-line args: ['--verbose', '--cookies', 'D:\\path\\to\\cookies.txt', 'https://player.theplatform.com/p/HNK2IC/uW4uIUm_KHR6/select/media/guid/snip/4117074?mute=false&ec=f&mParticleId=-4600823416715392143&brand=CNBC&show=The%20Profit&episodeTitle=An%20Inside%20Look%3A%20KOTA%20Longboards&MVPDid=Hulu&params=_fw_ae%3De5e7480e1a090738eeb5c4e2bb47de52%26policy%3Dsnip%26fallbackSiteSectionId%3D9244655%26siteSectionId%3Doneapp_desktop_computer_web_ondemand%26manifest%3Dm3u%26switch%3DHLSOriginSecure%26_fw_h_referer%3Dwww.nbc.com%26schema%3D2.0%26auth%3D%253CsignatureInfo%253Esnip%253CsignatureInfo%253E%253CauthToken%253E%253CsessionGUID%253Esnip%253C%252FsessionGUID%253E%253CrequestorID%253Enbcentertainment%253C%252FrequestorID%253E%253CresourceID%253E%253C!%255BCDATA%255B%253Crss%2520version%253D%25222.0%2522%2520xmlns%253Amedia%253D%2522http%253A%252F%252Fsearch.yahoo.com%252Fmrss%252F%2522%253E%253Cchannel%253E%253Ctitle%253Ecnbc%253C%252Ftitle%253E%253Citem%253E%253Ctitle%253E%253C!%255BCDATA%255BS1%2520E22%2520%257C%252011%252F12%252F19%255D%255D%255D%255D%253E%253E%253C!%255BCDATA%255B%253C%252Ftitle%253E%253Cguid%2520isPermaLink%253D%2522false%2522%253E4117074%253C%252Fguid%253E%253Cmedia%253Arating%2520scheme%253D%2522urn%253Av-chip%2522%253ETV-PG%253C%252Fmedia%253Arating%253E%253C%252Fitem%253E%253C%252Fchannel%253E%253C%252Frss%253E%255D%255D%253E%253C%252FresourceID%253E%253Cttl%253E420000%253C%252Fttl%253E%253CissueTime%253E2020-05-18%252023%253A48%253A13%2520-0700%253C%252FissueTime%253E%253CmvpdId%253EHulu%253C%252FmvpdId%253E%253C%252FauthToken%253E#playerurl=https%3A//www.nbc.com/the-profit/video/an-inside-look-kota-longboards/4117074']
[debug] Encodings: locale cp1252, fs mbcs, out cp437, pref cp1252
[debug] youtube-dl version 2020.05.08
[debug] Python version 3.4.4 (CPython) - Windows-10-10.0.17763
[debug] exe versions: ffmpeg N-91481-gb8c4d2b2ed, ffprobe N-91481-gb8c4d2b2ed
[debug] Proxy map: {}
[ThePlatform] 4117074: Downloading webpage
[ThePlatform] 4117074: Downloading SMIL data
[ThePlatform] 4117074: Downloading m3u8 information
[ThePlatform] 4117074: Downloading JSON metadata
[debug] Default format spec: bestvideo+bestaudio/best
[debug] Invoking downloader on 'https://west.manifest.na.theplatform.com/m/HNK2IC/snip/6.m3u8?sid=snip&policy=snip&date=1589870971918&ip=snip&schema=1.1&cid=snip&host=tvecnbchls.nbcuni.com&manifest=M3U&switch=HLSOriginSecure&_fw_ae=snip&_fw_h_referer=www.nbc.com&siteSectionId=oneapp_desktop_computer_web_ondemand&fallbackSiteSectionId=9244655&player=One+App+-+PDK+6+NBC.com+Instance+of%3A+rational-player-production&sig=snip'
[download] Destination: An Inside Look - KOTA Longboards-4117074.mp4
[debug] ffmpeg command line: ffmpeg -y -loglevel verbose -headers "Cookie: ssuid=snip
Accept-Charset: ISO-8859-1,utf-8;q=0.7,*;q=0.7
Accept-Encoding: gzip, deflate
Accept: text/html,application/xhtml+xml,application/xml;q=0.9,*/*;q=0.8
Accept-Language: en-us,en;q=0.5
User-Agent: Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/70.0.3538.18 Safari/537.36
" -i "https://west.manifest.na.theplatform.com/m/HNK2IC/snip/6.m3u8?sid=snip&policy=snip&date=1589870971918&ip=snip&schema=1.1&cid=snip&host=tvecnbchls.nbcuni.com&manifest=M3U&switch=HLSOriginSecure&_fw_ae=snip&_fw_h_referer=www.nbc.com&siteSectionId=oneapp_desktop_computer_web_ondemand&fallbackSiteSectionId=9244655&player=One+App+-+PDK+6+NBC.com+Instance+of%3A+rational-player-production&sig=snip" -c copy -f mp4 "-bsf:a" aac_adtstoasc "file:An Inside Look - KOTA Longboards-4117074.mp4.part"
ffmpeg version N-91481-gb8c4d2b2ed Copyright (c) 2000-2018 the FFmpeg developers
  built with gcc 7.3.1 (GCC) 20180710
  configuration: --enable-gpl --enable-version3 --enable-sdl2 --enable-bzlib --enable-fontconfig --enable-gnutls --enable-iconv --enable-libass --enable-libbluray --enable-libfreetype --enable-libmp3lame --enable-libopencore-amrnb --enable-libopencore-amrwb --enable-libopenjpeg --enable-libopus --enable-libshine --enable-libsnappy --enable-libsoxr --enable-libtheora --enable-libtwolame --enable-libvpx --enable-libwavpack --enable-libwebp --enable-libx264 --enable-libx265 --enable-libxml2 --enable-libzimg --enable-lzma --enable-zlib --enable-gmp --enable-libvidstab --enable-libvorbis --enable-libvo-amrwbenc --enable-libmysofa --enable-libspeex --enable-libxvid --enable-libaom --enable-libmfx --enable-amf --enable-ffnvcodec --enable-cuvid --enable-d3d11va --enable-nvenc --enable-nvdec --enable-dxva2 --enable-avisynth
  libavutil      56. 18.102 / 56. 18.102
  libavcodec     58. 21.105 / 58. 21.105
  libavformat    58. 17.101 / 58. 17.101
  libavdevice    58.  4.101 / 58.  4.101
  libavfilter     7. 26.100 /  7. 26.100
  libswscale      5.  2.100 /  5.  2.100
  libswresample   3.  2.100 /  3.  2.100
  libpostproc    55.  2.100 / 55.  2.100
[hls,applehttp @ 000001f42cb8af00] HLS request for url 'https://redirect.manifest.theplatform.com/r/HNK2IC/snip?sid=snip&policy=snip&date=1589870972234&ip=snip&schema=1.0&cid=snip&aid=snip&dur=2594000&sig=snip', offset 0, playlist 0
[hls,applehttp @ 000001f42cb8af00] Opening 'https://redirect.manifest.theplatform.com/r/HNK2IC/snip?sid=snip&policy=snip&date=1589870972234&ip=snip&schema=1.0&cid=snip&aid=snip&dur=2594000&sig=snip' for reading
[hls,applehttp @ 000001f42cb8af00] HLS request for url 'https://tvecnbchls.nbcuni.com/tve-adstitch/1956/20/02/14/4117074/snip-2.ts', offset 0, playlist 0
[hls,applehttp @ 000001f42cb8af00] Opening 'https://tvecnbchls.nbcuni.com/tve-adstitch/1956/20/02/14/4117074/snip-2.ts' for reading
[h264 @ 000001f42d588d80] Reinit context to 1920x1088, pix_fmt: yuv420p
Input #0, hls,applehttp, from 'https://west.manifest.na.theplatform.com/m/HNK2IC/LuB15OjZZxlw,WHBrcv5kva_1,Mj4QesgIJ8_c,Gi4Le1F0buS9,IU16ScI_dB_T,YDsC6qk9LY87,kzuZzOsPiBhY/6.m3u8?sid=snip&policy=snip&date=1589870971918&ip=snip&schema=1.1&cid=snip&host=tvecnbchls.nbcuni.com&manifest=M3U&switch=HLSOriginSecure&_fw_ae=e5e7480e1a090738eeb5c4e2bb47de52&_fw_h_referer=www.nbc.com&siteSectionId=oneapp_desktop_computer_web_ondemand&fallbackSiteSectionId=9244655&player=One+App+-+PDK+6+NBC.com+Instance+of%3A+rational-player-production&sig=snip':
  Duration: 00:47:36.33, start: 1.400000, bitrate: N/A
  Program 0
    Metadata:
      variant_bitrate : 0
    Stream #0:0: Audio: aac (LC) ([15][0][0][0] / 0x000F), 44100 Hz, stereo, fltp
    Metadata:
      variant_bitrate : 0
    Stream #0:1: Video: h264 (High), 1 reference frame ([27][0][0][0] / 0x001B), yuv420p(left), 1920x1080 (1920x1088), Closed Captions, 23.98 fps, 23.98 tbr, 90k tbn, 47.95 tbc
    Metadata:
      variant_bitrate : 0
Output #0, mp4, to 'file:An Inside Look - KOTA Longboards-4117074.mp4.part':
  Metadata:
    encoder         : Lavf58.17.101
    Stream #0:0: Video: h264 (High), 1 reference frame (avc1 / 0x31637661), yuv420p(left), 1920x1080 (0x0), q=2-31, 23.98 fps, 23.98 tbr, 90k tbn, 90k tbc
    Metadata:
      variant_bitrate : 0
    Stream #0:1: Audio: aac (LC) (mp4a / 0x6134706D), 44100 Hz, stereo, fltp
    Metadata:
      variant_bitrate : 0
Stream mapping:
  Stream #0:1 -> #0:0 (copy)
  Stream #0:0 -> #0:1 (copy)
Press [q] to stop, [?] for help
[hls,applehttp @ 000001f42cb8af00] HLS request for url 'https://tvecnbchls.nbcuni.com/tve-adstitch/1956/20/02/14/4117074/snip-3.ts', offset 0, playlist 0
[https @ 000001f42cb8f200] Opening 'https://tvecnbchls.nbcuni.com/tve-adstitch/1956/20/02/14/4117074/snip-3.ts' for reading
[hls,applehttp @ 000001f42cb8af00] HLS request for url 'https://tvecnbchls.nbcuni.com/tve-adstitch/1956/20/02/14/4117074/snip-4.ts', offset 0, playlist 0
[https @ 000001f42d0de9c0] Opening 'https://tvecnbchls.nbcuni.com/tve-adstitch/1956/20/02/14/4117074/snip-4.ts' for reading
[hls,applehttp @ 000001f42cb8af00] HLS request for url 'https://redirect.manifest.theplatform.com/r/HNK2IC/snip?sid=snip&policy=snip&date=1589870972234&ip=snip&schema=1.0&cid=snip&aid=snip&dur=2594000&sig=snip', offset 0, playlist 0
[https @ 000001f42d195080] Cannot reuse HTTP connection for different host: tvecnbchls.nbcuni.com:-1 != redirect.manifest.theplatform.com:-1
[AVIOContext @ 000001f42cc2b500] Statistics: 7954468 bytes read, 0 seeks
[hls,applehttp @ 000001f42cb8af00] keepalive request failed for 'https://redirect.manifest.theplatform.com/r/HNK2IC/snip?sid=snip&policy=snip&date=1589870972234&ip=snip&schema=1.0&cid=snip&aid=snip&dur=2594000&sig=snip', retrying with new connection: Invalid argument
[hls,applehttp @ 000001f42cb8af00] Opening 'https://redirect.manifest.theplatform.com/r/HNK2IC/snip?sid=snip&policy=snip&date=1589870972234&ip=snip&schema=1.0&cid=snip&aid=snip&dur=2594000&sig=snip' for reading
^C

ffmpeg downloads about 8 minutes worth of video (should be just over 40); when playing back in a video player only one or two ads are seen.

Notice HLS request for url 'https://tvecnbchls.nbcuni.com/tve-adstitch... which should not be downloaded

The workaround from https://github.com/ytdl-org/youtube-dl/issues/22693#issuecomment-541477289 does not work

dstftw commented 4 years ago

Report this to ffmpeg.

rredford6 commented 4 years ago

@dstftw this isn't a ffmpeg bug. I believe there are three things going on:

  1. youtube-dl is telling ffmpeg to download ads. As you can see from the output above I did not pass --include-ads. This is not a ffmpeg bug.
  2. It appears ThePlatform is splitting the program into multiple m3u8 files; the first one contains an ad, the program, and another ad. It looks like the program is just the original content that airs before the first commercial break. So I would expect the next m3u8 to contain the next segment of the program and another ad.
  3. The 9 second video is probably due to the fact my media player doesn't realize that this is 3 different clips joined together. ffmpeg is effectively concating what is being piped in to it: a 480p ad, 1080p content, and a 720p ad. One video stream cannot be multiple resolutions, so my media player stops when the 480p stream ends, even though the video stream continues.