ytdl-org / youtube-dl

Command-line program to download videos from YouTube.com and other video sites
http://ytdl-org.github.io/youtube-dl/
The Unlicense
130.07k stars 9.81k forks source link

Youtube: Only downloading last 2 hours of 3 hour video. (with hack as a workaround) #26330

Open bittaurus opened 3 years ago

bittaurus commented 3 years ago

Checklist

Verbose log

$ youtube-dl --verbose https://www.youtube.com/watch?v=7q2E_dMf-PA -f 160+139 -o test.mp4
[debug] System config: ['--prefer-free-formats']
[debug] User config: []
[debug] Custom config: []
[debug] Command-line args: ['--verbose', 'https://www.youtube.com/watch?v=7q2E_dMf-PA', '-f', '160+139', '-o', 'test.mp4']
[debug] Encodings: locale UTF-8, fs utf-8, out utf-8, pref UTF-8
[debug] youtube-dl version 2020.05.29
[debug] Python version 3.8.5 (CPython) - Linux-5.7.12-200.fc32.x86_64-x86_64-with-glibc2.2.5
[debug] exe versions: ffmpeg 4.2.4, ffprobe 4.2.4
[debug] Proxy map: {}
[youtube] 7q2E_dMf-PA: Downloading webpage
[youtube] 7q2E_dMf-PA: Downloading m3u8 information
[youtube] 7q2E_dMf-PA: Downloading MPD manifest
[debug] Invoking downloader on 'https://manifest.googlevideo.com/api/manifest/dash/expire/1597463882/ei/6Qg3X-PLPOTpiwTDgZGYCw/ip/[REDACTED]/id/7q2E_dMf-PA.0/source/yt_live_broadcast/requiressl/yes/as/fmp4_audio_clear%2Cwebm_audio_clear%2Cwebm2_audio_clear%2Cfmp4_sd_hd_clear%2Cwebm2_sd_hd_clear/force_finished/1/vprv/1/keepalive/yes/fexp/23883098/beids/23886217/itag/0/playlist_type/DVR/sparams/expire%2Cei%2Cip%2Cid%2Csource%2Crequiressl%2Cas%2Cforce_finished%2Cvprv%2Citag%2Cplaylist_type/sig/AOq0QJ8wRQIhAIYKJYy_vhL6FN2sN8JmwtZUzthfIEC1uyLBsWLFRF1-AiBSi_MHyF2FME9Q96eTwBDWUYjupTJnp_MoVJcxOVW74Q%3D%3D'
[dashsegments] Total fragments: 3600
[download] Destination: test.f160.mp4
[download] 100% of 104.01MiB in 29:48
[debug] Invoking downloader on 'https://manifest.googlevideo.com/api/manifest/dash/expire/1597463882/ei/6Qg3X-PLPOTpiwTDgZGYCw/ip/[REDACTED]/id/7q2E_dMf-PA.0/source/yt_live_broadcast/requiressl/yes/as/fmp4_audio_clear%2Cwebm_audio_clear%2Cwebm2_audio_clear%2Cfmp4_sd_hd_clear%2Cwebm2_sd_hd_clear/force_finished/1/vprv/1/keepalive/yes/fexp/23883098/beids/23886217/itag/0/playlist_type/DVR/sparams/expire%2Cei%2Cip%2Cid%2Csource%2Crequiressl%2Cas%2Cforce_finished%2Cvprv%2Citag%2Cplaylist_type/sig/AOq0QJ8wRQIhAIYKJYy_vhL6FN2sN8JmwtZUzthfIEC1uyLBsWLFRF1-AiBSi_MHyF2FME9Q96eTwBDWUYjupTJnp_MoVJcxOVW74Q%3D%3D'
[dashsegments] Total fragments: 3600
[download] Destination: test.mp4.f139
[download] 100% of 64.95MiB in 30:09
[ffmpeg] Merging formats into "test.mp4"
[debug] ffmpeg command line: ffmpeg -y -loglevel repeat+info -i file:test.f160.mp4 -i file:test.mp4.f139 -c copy -map 0:v:0 -map 1:a:0 file:test.temp.mp4
Deleting original file test.f160.mp4 (pass -k to keep)
Deleting original file test.mp4.f139 (pass -k to keep)

Description

On that page,the youtube player in firefox shows the video is 3:01:49 in length, and plays fully at that length. While youtube-dl downloads and exits cleanly, it only downloads the last 2:00:00 of it. This happens with both the dash and hls manifests I've tried.

$ ffmpeg -i test.mp4 -f null -
...
frame=216000 fps=8600 q=-0.0 Lsize=N/A time=02:00:00.00 bitrate=N/A speed= 287x
...

Is there a workaround or proper method to grab the full video which is available to play on the site? This is likely related to issue #26290.

Edit: the output is from an older youtube-dl, but the results are the same on my boox with v. 2020.07.28 Edit 2: streamlink also gets only 2 hours from hls. How does youtube player do it?! Edit 3: Well, I'm not sure what the youtube player is doing, but I was able to hack the missing fragments into place...

First, I wrote out the info.json for the url. Then I edited the json and found the first fragment in the stream:

      "fragments": [
        {
          "path": "sq/1855/lmt/1597197498473364",
          "duration": 2
        },

Then I generated all the fragments from 0 to 1854:

$ for F in {0..1854} ; do echo -e "\t{\n\t  \"path\": \"sq/${F}\",\n\t  \"duration\": 2\n\t}," ; done > missing.txt
$ head -n 10 missing.txt
    {
      "path": "sq/0",
      "duration": 2
    },
    {
      "path": "sq/1",
      "duration": 2
    },
    {
      "path": "sq/2",

I then inserted these missing fragments before fragment 1855 in the json and saved it. Then I was able to run youtube-dl against this .json with:

youtube-dl  -f 139 --load-info-json test.info.json -o test.m4a

And got the full length stream saved, as the fragments are on the server, just not enumerated in the manifest.

It would seriously suprise me if this is what the web player is doing to get the full stream.

Maybe the youtube plugin can check to see if fragments start at 0, if not check if 0 is on the server, and compensate for the incomplete manifest as a work around?

Fruktlimpa commented 3 years ago

Youtube is definitely up to something new as this phenomenon is something I haven't seen before and it started about 5-6 days ago? As most of you know livestreams used to only display the last 2 hours even on the webpage for X amount of time until they got processed into a VOD. Now it seems like the videos are instantly viewable on the webpage from start to finish with some missing features(Chat replay). However Youtube-dl only seems to grab the last 2 hours regardless.

I might be repeating what has already been said but since the "Processing into a VOD" could be different from person to person depending on your region, I wanted to throw out that I'm experiencing the exact same thing, which likely everybody is.

your-diary commented 3 years ago

Same issue here.

$ command youtube-dl --version
2020.07.28

$ command youtube-dl --verbose 'https://www.youtube.com/watch?v=Mm0KCzYpMhQ'
[debug] System config: []
[debug] User config: ['--ignore-errors', '--no-mtime', '--console-title']
[debug] Custom config: []
[debug] Command-line args: ['--verbose', 'https://www.youtube.com/watch?v=Mm0KCzYpMhQ']
[debug] Encodings: locale UTF-8, fs utf-8, out utf-8, pref UTF-8
[debug] youtube-dl version 2020.07.28
[debug] Python version 3.8.3 (CPython) - Linux-5.7.9-arch1-1-x86_64-with-glibc2.2.5
[debug] exe versions: ffmpeg 4.3.1, ffprobe 4.3.1, rtmpdump 2.4
[debug] Proxy map: {}
[youtube] Mm0KCzYpMhQ: Downloading webpage
[youtube] Mm0KCzYpMhQ: Downloading m3u8 information
[youtube] Mm0KCzYpMhQ: Downloading MPD manifest
[debug] Default format spec: bestvideo+bestaudio/best
[debug] Invoking downloader on 'https://manifest.googlevideo.com/api/manifest/dash/expire/1597700011/ei/S6M6X_zOHpOcs8IPi7SF-Ak/ip/50.999.999.85/id/Mm0KCzYpMhQ.1/source/yt_live_broadcast/requiressl/yes/tx/23908007/txs/23908006%2C23908007/hfr/all/as/fmp4_audio_clear%2Cwebm_audio_clear%2Cwebm2_audio_clear%2Cfmp4_sd_hd_clear%2Cwebm2_sd_hd_clear/force_finished/1/vprv/1/keepalive/yes/fexp/23883098/itag/0/playlist_type/DVR/sparams/expire%2Cei%2Cip%2Cid%2Csource%2Crequiressl%2Ctx%2Ctxs%2Chfr%2Cas%2Cforce_finished%2Cvprv%2Citag%2Cplaylist_type/sig/AOq0QJ8wRQIgaBMCXhRtrGe2SeAHys3agvoV10DHyPqygiCOG-_PfjcCIQCBWI3JoZSHDc-Eu5xkAU2Xi_Jpll4aYzVP1m4HEE0g3A%3D%3D'
[dashsegments] Total fragments: 3600
[download] Destination: 【APEX耐久】ダイアモンドになるまで終わらないラストバトル!!【湊あくあ】-Mm0KCzYpMhQ.f299.mp4
[download] 100% of 3.01GiB in 10:12
[debug] Invoking downloader on 'https://manifest.googlevideo.com/api/manifest/dash/expire/1597700011/ei/S6M6X_zOHpOcs8IPi7SF-Ak/ip/50.999.999.85/id/Mm0KCzYpMhQ.1/source/yt_live_broadcast/requiressl/yes/tx/23908007/txs/23908006%2C23908007/hfr/all/as/fmp4_audio_clear%2Cwebm_audio_clear%2Cwebm2_audio_clear%2Cfmp4_sd_hd_clear%2Cwebm2_sd_hd_clear/force_finished/1/vprv/1/keepalive/yes/fexp/23883098/itag/0/playlist_type/DVR/sparams/expire%2Cei%2Cip%2Cid%2Csource%2Crequiressl%2Ctx%2Ctxs%2Chfr%2Cas%2Cforce_finished%2Cvprv%2Citag%2Cplaylist_type/sig/AOq0QJ8wRQIgaBMCXhRtrGe2SeAHys3agvoV10DHyPqygiCOG-_PfjcCIQCBWI3JoZSHDc-Eu5xkAU2Xi_Jpll4aYzVP1m4HEE0g3A%3D%3D'
[dashsegments] Total fragments: 3600
[download] Destination: 【APEX耐久】ダイアモンドになるまで終わらないラストバトル!!【湊あくあ】-Mm0KCzYpMhQ.f140.m4a
[download] 100% of 151.43MiB in 06:39
[ffmpeg] Merging formats into "【APEX耐久】ダイアモンドになるまで終わらないラストバトル!!【湊あくあ】-Mm0KCzYpMhQ.mp4"
[debug] ffmpeg command line: ffmpeg -y -loglevel repeat+info -i 'file:【APEX耐久】ダイアモンドになるまで終わらないラストバトル!!【湊あくあ】-Mm0KCzYpMhQ.f299.mp4' -i 'file:【APEX耐久】ダイアモンドになるまで終わらないラストバトル!!【湊あくあ】-Mm0KCzYpMhQ.f140.m4a' -c copy -map 0:v:0 -map 1:a:0 'file:【APEX耐久】ダイアモンドになるまで終わらないラストバトル!!【湊あくあ】-Mm0KCzYpMhQ.temp.mp4'
Deleting original file 【APEX耐久】ダイアモンドになるまで終わらないラストバトル!!【湊あくあ】-Mm0KCzYpMhQ.f299.mp4 (pass -k to keep)
Deleting original file 【APEX耐久】ダイアモンドになるまで終わらないラストバトル!!【湊あくあ】-Mm0KCzYpMhQ.f140.m4a (pass -k to keep)

The result was the exactly last 2 hours of the 6:54:07 video. Using Firefox for Linux or YouTube app for Android, the video can be played without any problems though Chat Replay hasn't been made available yet.

Cyame commented 3 years ago

Same problem, actually I spent whole evening on downloading the video repeatedly and found that it's the audio that is limited to 2-hour duration caused the problem. No matter bitrate, both 139 and 140 have a duration of 2:00:00 while the video is exactly 3:17:51 of length.

jmolinski commented 3 years ago

Same problem, but it seems like the problem disappears after up to 2 days since the live stream has been finished (please mind it when trying to reproduce the issue).

InternationalYamAgain commented 3 years ago

OP of the linked issue here (on a new account, don't worry about it I'm just dumb...). Yes, they are basically the same issue. Or rather, your issue is more current because YouTube has literally changed how they handle livestream recordings in the past week. When I posted my issue, livestream recordings were clipped on the YouTube site just like what youtube-dl is giving until the processing finishes. Now the full video is available to browser users but youtube-dl is still somehow finding the clipped one. So I agree this now looks like a genuine bug.

(As an aside, this is not the first time the full recording has been available to regular desktop browser users. There was a period of about a week a month or two ago where I could consistently get the full video, but only in chrome, not firefox. That was possibly a trial run for what we're seeing now. After that period it went back to the 2 hour clipped recording that has been standard for years.)

The work-around you posted by editing the json to include the missing fragments is very neat but without trying a bunch of cases I'm not sure how robust it is. If you would rather just filter these videos and wait for processing to occur (for people with relatively automated workflows), I've had success with using filesize after duration broke. Specifically, the filesize shows as null or NA for fragmented videos but will give the real size after processing. This might break some random old video somewhere though so use it with caution.

As a word of warning for anyone else this comes up for, I've run into some very weird cases experimenting with youtube-dl on unprocessed livestreams over 2 hours in the past week. In one case, the video and audio were both downloaded normally (well, slowly, since downloading fragmented files is slow, but no errors or anything), but when they were combined I was left with just a 15 second long video from somewhere in the middle and the audio was gibberish. I also saw a case where when I called youtube-dl, there were literally 0 formats available, like it saw there was a video but it couldn't get any data at all, but subsequent calls couldn't reproduce that result.

bittaurus commented 3 years ago

I went to grab this livestream 5 hours after it was finished, https://www.youtube.com/watch?v=VMfaFRPGvvM. It downloaded complete with the dash_manifest using youtube-dl.

Interested to see if any changes occurred in the manifest, I wrote out the info.json and inspected it.

The manifest was still in fragments form, but instead of 2 second fragments, it listed 5 second fragments.

This is a change on youtube's end. and might be youtube's way of fixing this problem for us? The previous manifests were showing 2 second fragments with a limit of 3600 fragments (2 hours) before it lost the starting ones.

With 5 second fragments, 3600 fragments would be 5 hours, making the bug only show up on livestreams over 5 hours in length?

[dashsegments] Total fragments: 2510
[download] Destination: VMfaFRPGvvM.f135.mp4
[download] 100% of 149.46MiB in 13:20

At 5 second fragments, that's the full file length.

As an aside, the youtube page source makes references to html5_manifest stuff, which I know nothing about. Wonder if the player is using an html5 video streaming thing while the dash and hls are there for backward compatibility with older browsers.

your-diary commented 3 years ago

@bittaurus

making the bug only show up on livestreams over 5 hours in length?

No. Here is a live archive which has just finished being streamed minutes ago: https://www.youtube.com/watch?v=Esl7kGD5FdE. Although the web player has no problem with it, youtube-dl gives the last 2 hours of 3:26:08, using dash manifest. It occurs even when I use hls manifest:

url=$(curl --silent 'https://www.youtube.com/watch?v=Esl7kGD5FdE' | grep -o 'https:\\/\\/[^"]\+googlevideo.com\\/[^"]\+' | grep 'hls_variant' | sed 's/\\//g')
ffmpeg -i "${url}" out.mkv #=> The same result. The problem is not specific to youtube-dl.
InternationalYamAgain commented 3 years ago

@bittaurus

This behavior existed before; in fact the 2 hour duration is new and was added when YouTube made several improvements to their livestream platform like increasing supported max resolution and adding low latency options. Depending on the settings for the livestream, the fragments can be 1s, 2s, 5s, and maybe other amounts. I have seen the clipped version of the video be a maximum of 2 hours, 4 hours, 5 hours, or maybe more (and it can be less than these if the livestream goes offline and back online again; only the final X hours of the stream, as it was shown in realtime, will be available. So you may see 1:58:30 clipped recordings if there were connection issues, for example). The settings that matter in determining this are (most importantly) the latency setting, the (source) resolution, framerate, and possibly bitrate. Higher settings lead to shorter durations but I do not know exactly what formula YouTube is using. It is not just a fixed number of fragments; you will find 2 hour videos with 3600 fragments or 7200 (2s vs 1s), and a 4 hour clip with 5s fragments is 2880 which I'm pretty sure I've seen.

I have no way of knowing for sure, but it is likely the stream you were looking at was not using low latency settings, and while it is 1080p, it is only 30fps. So being able to get 4 or 5 hours of the video rather than 2 (which was enough for the full video in this case) is not a surprise. That is what you would have seen in a browser as well up until about a week ago.

bittaurus commented 3 years ago

Here's a quick bash script to insert the fragments into a .info.json file. It assumes the same number of fragments are missing from every fragmented stream (which is true for every case I've seen so far.) It relies on gnu seq and gnu sed.

#!/bin/bash
#
# fixfragx [file.info.json]

[ ! -s "${1}" ] && echo -e "Usage: fixfragx [filename.info.json]\n\nInserts missing begining youtube fragments into youtube-dl --write-info-json file." && exit

FRAGS=$(sed -e 's/^.*"fragments": \[{"path": "sq\///' -e 's/[^0-9].*$//' "${1}")
[ "${FRAGS}" == "" ] || [ "${FRAGS}" == "0" ] && echo "no missing fragments found." && exit

((--FRAGS))
echo "missing fragments {0..${FRAGS}} found."

INSERT=$(for F in $(seq 0 ${FRAGS}) ; do echo -n "{\"path\": \"sq/${F}/\"}, " ; done)
sed -i 's_"fragments": \[_"fragments": \['"${INSERT}"'_g' "${1}"

[ "$?" == "0" ] && echo "missing fragments have been inserted." && exit
echo "error with insertion of missing fragments."
exit

It also doesn't determine or insert the fragment duration, but seems to do well enough without it for the full stream. Good luck.

MinkiNTN commented 3 years ago

While we're on the topic of "inserting missing fragments" script, I cooked this up using Python. It's not my strong suit, but it should insert the missing fragments for the file.

import os
import sys
import json
import re
import argparse

def pad():
    with open(args.input) as json_file:
        try:
            data = json.load(json_file)
        except ValueError as err:
            print('Cannot decode this JSON file. Error:', err)
            sys.exit()

        print('Loaded JSON for video ID:', data['id'])
        print('Title:', data['fulltitle'])
        print('Uploader:', data['uploader'])
        print('-----------------------------------------------------------------')
        for format in data['formats']:
            print('Fixing format ID:', format['format_id'])
            # Get the first path ID from the first fragment
            firstFragment = format['fragments'][0]
            firstFragmentID = (re.search('sq/(.*)/lmt/', firstFragment['path'])).group(1)
            # Add the missing fragments
            for id in range(0, int(firstFragmentID)):
                newFragment = {'duration': 2.0, 'path': 'sq/%d' % (id)} 
                format['fragments'].insert(id, newFragment)
        print('-----------------------------------------------------------------')
        print('Writing result to', args.output)
        with open(args.output, 'w') as outfile:
            json.dump(data, outfile)

    json_file.close()

parser = argparse.ArgumentParser(description='Checking YouTube JSON for missing stream fragments, and pad them in.')
parser.add_argument('-i', '--input', type=str, help='path to the input JSON file.', required=True)
parser.add_argument('-o', '--output', type=str, default='output.json', help='path to the output JSON file.')
args = parser.parse_args()

if os.path.exists(args.output):
    print('WARNING:', args.output, 'already exist.')
    overwrite = input('Would you like to overwrite? [y/N] ') or 'n'
    if overwrite.lower() == 'y':
        pad()
    else:
        print('User choose to not overwrite, exiting')
        sys.exit()

I wrote this before the update with 5s fragments, so it would need to be updated for that.

InternationalYamAgain commented 3 years ago

Several further complications I have encountered which I figured should be documented somewhere.

  1. With YouTube's new pipeline, the handling of when a stream goes offline and back online seems to be different. Previously, it would be stored as a single m3u8 playlist with missing fragments in the middle. This meant that in the case of a brief outage, you could still get the last 2 hours realtime of the stream before processing, which usually meant nearly 2 hours of actual footage. Now, with the new system, each time the livestream goes offline and back online, a separate m3u8 playlist is formed, beginning again with fragment 0. I don't know how long the livestream has to go down before this happens but not more than about 1m. There does not seem to be any way to access the fragments for previous part(s) of the stream if you did not save the MPD data before the stream went down. During the processing period, only the final set of fragments will be available, both to the browser and to youtube-dl. This means that you can still find incomplete livestream recordings even on a browser. You can spot when this happens in the browser by looking at the reported duration of the full stream and comparing with the playback duration. After processing, the full video is again available. This means that even if fragment 0 is present, you may miss some portion of the video, both with youtube-dl and with a browser. It is difficult to find examples of this "in the wild" since usually the stream losing connection is an undesirable thing while streaming. The easiest way to test the case further would probably be to set up a dummy stream.

  2. I have started running into cases of videos which are apparently processed but remain fragmented (DASH). The chat replay is available and the other telltale signs also point to this, but it remains as m3u8. Interestingly the cases I have looked at had 5s video fragments and 10s audio fragments, but based on the settings I am fairly sure it was initially streamed with 1s or 2s fragments. In some cases non-DASH versions were available as well but not at all resolutions. I waited a couple days and no further processing occurred. In these cases it is safe to download the video with youtube-dl without worrying about missing part of it.

Unfortunately it still seems there is no completely foolproof automated method for ensuring that the full video will be downloaded or for filtering incomplete videos. Simply filtering based on fragmentation is not viable because of issue 2, and filesize and duration also won't work for the same reason. Checking if fragment 0 is present (and if not adding it and the remaining ones) is not good enough because of issue 1.

ThePirate42 commented 3 years ago

Here is a batch script to generate the missing fragments (you first have to replace the first fragment number, the last fragment number and the duration):

@echo off
for /L %%g in (0,1,6222) do (
echo:        {>>fragments.txt
echo:          "path": "sq/%%g",>>fragments.txt
echo:          "duration": ^2>>fragments.txt
echo:        },>>fragments.txt
)
Bagunda commented 3 years ago

First, I wrote out the info.json for the url. Then I edited the json and found the first fragment in the stream:

      "fragments": [
        {
          "path": "sq/1855/lmt/1597197498473364",
          "duration": 2
        },

Then I generated all the fragments from 0 to 1854:

$ for F in {0..1854} ; do echo -e "\t{\n\t  \"path\": \"sq/${F}\",\n\t  \"duration\": 2\n\t}," ; done > missing.txt
$ head -n 10 missing.txt
  {
    "path": "sq/0",
    "duration": 2
  },
  {
    "path": "sq/1",
    "duration": 2
  },
  {
    "path": "sq/2",

I then inserted these missing fragments before fragment 1855 in the json and saved it. Then I was able to run youtube-dl against this .json with:

Where you get this data? Where you get info.json? Where you found the first fragment?

Tribalwarrior commented 3 years ago

Checklist

  • [x] I'm reporting a broken site support
  • [x] I've verified that I'm running youtube-dl version 2020.07.28
  • [x] I've checked that all provided URLs are alive and playable in a browser
  • [x] I've checked that all URLs and arguments with special characters are properly quoted or escaped
  • [x] I've searched the bugtracker for similar issues including closed ones

Verbose log

$ youtube-dl --verbose https://www.youtube.com/watch?v=7q2E_dMf-PA -f 160+139 -o test.mp4
[debug] System config: ['--prefer-free-formats']
[debug] User config: []
[debug] Custom config: []
[debug] Command-line args: ['--verbose', 'https://www.youtube.com/watch?v=7q2E_dMf-PA', '-f', '160+139', '-o', 'test.mp4']
[debug] Encodings: locale UTF-8, fs utf-8, out utf-8, pref UTF-8
[debug] youtube-dl version 2020.05.29
[debug] Python version 3.8.5 (CPython) - Linux-5.7.12-200.fc32.x86_64-x86_64-with-glibc2.2.5
[debug] exe versions: ffmpeg 4.2.4, ffprobe 4.2.4
[debug] Proxy map: {}
[youtube] 7q2E_dMf-PA: Downloading webpage
[youtube] 7q2E_dMf-PA: Downloading m3u8 information
[youtube] 7q2E_dMf-PA: Downloading MPD manifest
[debug] Invoking downloader on 'https://manifest.googlevideo.com/api/manifest/dash/expire/1597463882/ei/6Qg3X-PLPOTpiwTDgZGYCw/ip/[REDACTED]/id/7q2E_dMf-PA.0/source/yt_live_broadcast/requiressl/yes/as/fmp4_audio_clear%2Cwebm_audio_clear%2Cwebm2_audio_clear%2Cfmp4_sd_hd_clear%2Cwebm2_sd_hd_clear/force_finished/1/vprv/1/keepalive/yes/fexp/23883098/beids/23886217/itag/0/playlist_type/DVR/sparams/expire%2Cei%2Cip%2Cid%2Csource%2Crequiressl%2Cas%2Cforce_finished%2Cvprv%2Citag%2Cplaylist_type/sig/AOq0QJ8wRQIhAIYKJYy_vhL6FN2sN8JmwtZUzthfIEC1uyLBsWLFRF1-AiBSi_MHyF2FME9Q96eTwBDWUYjupTJnp_MoVJcxOVW74Q%3D%3D'
[dashsegments] Total fragments: 3600
[download] Destination: test.f160.mp4
[download] 100% of 104.01MiB in 29:48
[debug] Invoking downloader on 'https://manifest.googlevideo.com/api/manifest/dash/expire/1597463882/ei/6Qg3X-PLPOTpiwTDgZGYCw/ip/[REDACTED]/id/7q2E_dMf-PA.0/source/yt_live_broadcast/requiressl/yes/as/fmp4_audio_clear%2Cwebm_audio_clear%2Cwebm2_audio_clear%2Cfmp4_sd_hd_clear%2Cwebm2_sd_hd_clear/force_finished/1/vprv/1/keepalive/yes/fexp/23883098/beids/23886217/itag/0/playlist_type/DVR/sparams/expire%2Cei%2Cip%2Cid%2Csource%2Crequiressl%2Cas%2Cforce_finished%2Cvprv%2Citag%2Cplaylist_type/sig/AOq0QJ8wRQIhAIYKJYy_vhL6FN2sN8JmwtZUzthfIEC1uyLBsWLFRF1-AiBSi_MHyF2FME9Q96eTwBDWUYjupTJnp_MoVJcxOVW74Q%3D%3D'
[dashsegments] Total fragments: 3600
[download] Destination: test.mp4.f139
[download] 100% of 64.95MiB in 30:09
[ffmpeg] Merging formats into "test.mp4"
[debug] ffmpeg command line: ffmpeg -y -loglevel repeat+info -i file:test.f160.mp4 -i file:test.mp4.f139 -c copy -map 0:v:0 -map 1:a:0 file:test.temp.mp4
Deleting original file test.f160.mp4 (pass -k to keep)
Deleting original file test.mp4.f139 (pass -k to keep)

Description

On that page,the youtube player in firefox shows the video is 3:01:49 in length, and plays fully at that length. While youtube-dl downloads and exits cleanly, it only downloads the last 2:00:00 of it. This happens with both the dash and hls manifests I've tried.

$ ffmpeg -i test.mp4 -f null -
...
frame=216000 fps=8600 q=-0.0 Lsize=N/A time=02:00:00.00 bitrate=N/A speed= 287x
...

Is there a workaround or proper method to grab the full video which is available to play on the site? This is likely related to issue #26290.

Edit: the output is from an older youtube-dl, but the results are the same on my boox with v. 2020.07.28 Edit 2: streamlink also gets only 2 hours from hls. How does youtube player do it?! Edit 3: Well, I'm not sure what the youtube player is doing, but I was able to hack the missing fragments into place...

First, I wrote out the info.json for the url. Then I edited the json and found the first fragment in the stream:

      "fragments": [
        {
          "path": "sq/1855/lmt/1597197498473364",
          "duration": 2
        },

Then I generated all the fragments from 0 to 1854:

$ for F in {0..1854} ; do echo -e "\t{\n\t  \"path\": \"sq/${F}\",\n\t  \"duration\": 2\n\t}," ; done > missing.txt
$ head -n 10 missing.txt
  {
    "path": "sq/0",
    "duration": 2
  },
  {
    "path": "sq/1",
    "duration": 2
  },
  {
    "path": "sq/2",

I then inserted these missing fragments before fragment 1855 in the json and saved it. Then I was able to run youtube-dl against this .json with:

youtube-dl  -f 139 --load-info-json test.info.json -o test.m4a

And got the full length stream saved, as the fragments are on the server, just not enumerated in the manifest.

It would seriously suprise me if this is what the web player is doing to get the full stream.

Maybe the youtube plugin can check to see if fragments start at 0, if not check if 0 is on the server, and compensate for the incomplete manifest as a work around?

i did the same but have only audio. How can i get full stream with video?

mmis1000 commented 3 years ago

It seems YouTube on browser access the video meta from completely different endpoint currently (most likely directly embedded in the page).

The desktop YouTube web itself did not use the dash manifest nor the hls manifest at all.

The page only contains a the base url and duration of the segment. And it use JavaScript to compute the segment index against current video time. And sign it with the same method it used to sign the download url in manifest.

That is probably the reason youtube is able to play the whole vod on browser normally even the dash manifest it returned is incomplete

on1razor commented 3 years ago

Issue is still present =(

ehoogeveen-medweb commented 3 years ago

Based on the workaround above, I wrote a little Node script to add the missing fragments for each format: addMissingFragments.zip

To use:

  1. Install a recent version of Node.js
  2. Extract addMissingFragments.js somewhere and open a command window in that location
  3. Run the following commands, substituting the youtube video URL for [youtube-url]:
    youtube-dl --output video --skip-download --write-info-json [youtube-url]
    node addMissingFragments video.info.json
    youtube-dl --load-info-json video.info.json --format "bestvideo[protocol=http_dash_segments]+bestaudio"

    I also included a little batch file that does the above for you. If you want to download a specific format or add other arguments that affect the output file, add them to the 3rd line (the one that loads the modified json). I've included a format selection that avoids incomplete m3u8 manifests (which this script can't fix).

Note: This script is very simple and wouldn't be able to handle changes to the fragment path structure.

Edit: Fixed argv.length check, thanks ddscentral. Edit 2: Sometimes fragments use ranges instead of indices. Not much can be done about those, so skip those formats. Also try to select the most common duration if there are multiple (though it's ultimately a guess based on the last 2 hours). Edit 3: Added a little batch script to make using this easier in practice. It's still pretty limited (you can only pass the video URL to it) but it's enough for me. Edit 2022-03-03: Added some exceptions to skip incompatible formats.

ddscentral commented 3 years ago

@ehoogeveen-medweb Thanks, I was able to download a complete 7+ hour MPD livestream by adding the missing fragments using your script. There's a small bug with argv.length check, should be "< 3" instead of "< 2".

on1razor commented 3 years ago

Many thanks to @ehoogeveen-medweb for the script to add missing segments. Wrote a simple .bat file to automate the upload process, you need to put the addMissingFragments.js file next to the batch file, run MissingFragments-dl and insert the link

MissingFragments-dl.zip

glubsy commented 3 years ago

Thanks for this report. I faced the same issue recently with this live stream. I would not have noticed the problem if I had not checked the resulting file generated by youtube-dl.

The logs show a very low number of total segments: e.g. [dashsegments] Total fragments: 7159 (which should be around 31000), which matches what the MPD manifest gives as first available segment (probably yt:earliestMediaSequence="24611" among other things).

It would be nice if Youtube-dl could warn the user, and perhaps allow to force downloading from segment number 0 by implementing the hacks mentioned above.

The shell script @bittaurus posted doesn't work with very a very long list of missing segment, sed (or was it the shell?) was not happy:

missing fragments {0..24610} found.
./fix_missing_segments.sh: line 14: /usr/bin/sed: Argument list too long
error with insertion of missing fragments.

The python script @MinkiNTN needed a small fix:

import os
import sys
import json
import re
import argparse

def pad():
    with open(args.input) as json_file:
        try:
            data = json.load(json_file)
        except ValueError as err:
            print('Cannot decode this JSON file. Error:', err)
            sys.exit()

        print('Loaded JSON for video ID:', data['id'])
        print('Title:', data['fulltitle'])
        print('Uploader:', data['uploader'])
        print('-----------------------------------------------------------------')
        for format in data['formats']:
            print('Fixing format ID:', format['format_id'])
            # Get the first path ID from the first fragment
            try:
                firstFragment = format['fragments'][0]
            except KeyError as e:
                continue
            firstFragmentID = (re.search('sq/(.*)/lmt/', firstFragment['path'])).group(1)
            # Add the missing fragments
            for id in range(0, int(firstFragmentID)):
                newFragment = {'duration': 2.0, 'path': 'sq/%d' % (id)} 
                format['fragments'].insert(id, newFragment)
        print('-----------------------------------------------------------------')
        print('Writing result to', args.output)
        with open(args.output, 'w') as outfile:
            json.dump(data, outfile)

    json_file.close()

parser = argparse.ArgumentParser(description='Checking YouTube JSON for missing stream fragments, and pad them in.')
parser.add_argument('-i', '--input', type=str, help='path to the input JSON file.', required=True)
parser.add_argument('-o', '--output', type=str, default='output.json', help='path to the output JSON file.')
args = parser.parse_args()

if os.path.exists(args.output):
    print('WARNING:', args.output, 'already exist.')
    overwrite = input('Would you like to overwrite? [y/N] ') or 'n'
    if overwrite.lower() == 'y':
        pad()
    else:
        print('User choose to not overwrite, exiting')
        sys.exit()
else:
    pad()

That video is probably still being processed by YT at the time of writing, so I'll wait for YT to fix itself on this one but that is not ideal for "emergency archivists" (ie. regarding streams that suddenly get deleted by their author without warning).

mmis1000 commented 3 years ago

I guess compute the segment number and insert id yourself isn't technically a "workaround"?

Because youtube actually never generate a full manifest for it. Even the youtube web page itself compute the segment link by itself.

If you watch a vod on youtube before fully processed. The debug information actually says it is operated under a "manifest-less" mode.

fuomag9 commented 2 years ago

youtube-dl --load-info-json video.info.json --format "bestvideo[protocol=http_dash_segments]+bestaudio"

Thank you, this worked!

DeffoDan commented 1 year ago

2: Sometimes fragments use ranges instead of indices. Not much can be done about those, so skip those formats. Also try to select the most common duration if there are multi

I tried this command a few times and it doesn't seem to download audio. There's no metadata for audio bitrate or anything, it just isn't there. Is there a way I can specify an audio format with this command? Downloading them separately isn't even working for me right now.

dirkf commented 1 year ago

Same problem, but it seems like the problem disappears after up to 2 days since the live stream has been finished (please mind it when trying to reproduce the issue).

That seems very straightforward. But otherwise we need to back-port https://github.com/yt-dlp/yt-dlp/pull/5091.