yt-dlp / yt-dlp

A feature-rich command-line audio/video downloader
https://discord.gg/H5MNcFW63r
The Unlicense
90.09k stars 6.99k forks source link

[Twitch] continue downloading live VOD #6491

Open Cqoicebordel opened 1 year ago

Cqoicebordel commented 1 year ago

DO NOT REMOVE OR SKIP THE ISSUE TEMPLATE

Checklist

Region

No response

Example URLs

https://www.twitch.tv/

Provide a description that is worded well enough to be understood

I'm asking for a feature for continuing downloading the live VODs on Twitch :
When a streamer is live, and has activated the VODs, you can access the VOD of the current live being recorded (→ you have a specific URL for the VOD, and not just the streamer URL, like https://www.twitch.tv/videos/[0-9]*).
But if you use that specific URL to download using yt-dlp, the download stops at the time of the video at the time you started the command (→ if you launched the command at 18h00, it downloads the live until 18h00), even if the streamer is still live, still under the same VOD URL.

What I'm asking for is a way to tell yt-dlp to download the blocks it hasn't already downloaded until the streamer goes offline.
Something like

{
    downloadM3U;
    compareToAlreadyDownloadedBlocks;
    downloadMissingBlocks;
    sleep(10);
}while(isLive)

(without forgetting doing it a few times after the live is offline to get all the blocks)

Use case : sometimes, a streamer who has the VOD activated display inadvertently something Twitch doesn't like (NSFW for example), and decide to disable the VOD for this particular video, at the end of the stream. In that case, I start downloading the live ~15min before the end, to maximize the duration of the stream I can download, but I miss the last 15min.
The feature I ask for would solve that issue.

Provide verbose output that clearly demonstrates the problem

Complete Verbose Output

-
CM55555 commented 1 year ago

I'd like to see this as a built-in feature too, as a Twitch analogue to --live-from-start on Youtube.

In the meantime, there's a workaround using the hls native downloader by mimicking the temporary files it uses for resumability. (If you were to manually cancel the download halfway through and restart it, it would progress to the endpoint of the current state of the vod.) Doing this requires disabling postprocessing with ffmpeg (--fixup "never") and writing the total fragment count {"downloader": {"current_fragment": {"index": 123}}} to a .mp4.ytdl file in the output directory to mark progress. (The mp4.ytdl file is normally deleted once yt-dlp thinks it's done.)

I'm just outlining the general approach since my implementation is pretty cludgy/unreadable. Here's a related thread on the subreddit if you're interested.

DmitryScaletta commented 10 months ago

Add --live-from-start support for twitch

Motivation

On twitch VODs are sometimes avaliable during a live stream.

For example if channel url is https://www.twitch.tv/pestily The VOD can be avaliable here: https://www.twitch.tv/videos/2018654253

So I think it shold be possible to use --live-from-start for both channel and vod links:

yt-dlp https://www.twitch.tv/pestily --live-from-start
yt-dlp https://www.twitch.tv/videos/2018654253 --live-from-start

This code checks if the VOD is from a broadcast than currently live or not.

https://github.com/yt-dlp/yt-dlp/blob/85a2d07c1f82c2082b568963d1c32ad3fc848f61/yt_dlp/extractor/twitch.py#L472-L478

If previewThumbnailURL is this string, it means the channel is currently live

https://vod-secure.twitch.tv/_404/404_processing_90x60.png

One problem

The VODs are not updating instantly, there is some delay (I think about 3-5 minutes). For example you wont be able to see last 3 minutes of the broadcast in the VOD immediately.

So gql VideoMetadata request returns information that the broadcast is ended, but in reality the VOD is not full at that moment. I tested it. An API returns generated previewThumbnailURL right after the end of the stream but the VOD is not full immediately after the broadcast.

Getting link to live VOD from the channel link

We need channel_id.

POST https://gql.twitch.tv/gql
data ```json [ { "operationName": "FFZ_BroadcastID", "variables": { "id": "25604128" }, "extensions": { "persistedQuery": { "version": 1, "sha256Hash": "cc89dfe8fcfe71235313b05b34799eaa519d162ebf85faf0c51d17c274614f0f" } } } ] ```
Response if the VOD is exists ```json [ { "data": { "user": { "id": "25604128", "stream": { "id": "40333637573", "archiveVideo": { "id": "2018832740", "__typename": "Video" }, "__typename": "Stream" }, "__typename": "User" } }, "extensions": { "durationMilliseconds": 60, "operationName": "FFZ_BroadcastID", "requestID": "01HK032ZA4WX7WS23YH39BXCV1" } } ] ```
Response if the VOD is not exists ```json [ { "data": { "user": { "id": "79294007", "stream": { "id": "43309178299", "archiveVideo": null, "__typename": "Stream" }, "__typename": "User" } }, "extensions": { "durationMilliseconds": 63, "operationName": "FFZ_BroadcastID", "requestID": "01HK045NC3CZZXEGP604F71HH9" } } ] ```

How it should work

If it's a channel link, try to get a VOD link first.
while (is_live)
  get VOD info
  download all new segments
  sleep some minimal time in case if there are no new segments

I want to try to implement this functioality by myself. Is it possible to do such things currently in yt-dlp?

pukkandan commented 10 months ago

This can be done, but not easily. See --live-from-start implementation of youtube

DmitryScaletta commented 10 months ago

This can be done, but not easily

Currently downloaders can't give an information from extractors directly. But for this feature (at least for twitch) we need to know is a stream still live after downloading initial fragments. Maybe it can be solved by passing a function like get_is_live to downloaders from extractors. I tried to understand the code but I gave up.

Instead I wrote my own script for twitch from scratch in Node.js: https://github.com/DmitryScaletta/twitch-dlp It requires only Node.js installed and can be run without installing the package itself:

# Download a VOD from start using channel link, continue until stream ends
npx twitch-dlp https://www.twitch.tv/xqc --live-from-start

# Download a VOD. If it's live, continue until stream ends
npx twitch-dlp https://www.twitch.tv/videos/2022789761

It waits at least 5 minutes after the end of the stream to check new fragments before merging them with ffmpeg: stream started date + stream duration + 5 minutes compares with Date.now() Works fine for me so far.

Cqoicebordel commented 10 months ago

Just a quick message to remind that my original request was not "live from start" but rather "continue while live". There is a subtil difference, as we can only provide a VOD link (not a streamer link) with this option, and thus, I feel like yt-dlp has everything it needs : Instead of post processing at the end of the download, it try to download it again, and check if there are new fragments available, if there are, it downloads them, and loop. If not, it waits for a few minutes, and try again, then post-process. The .ytdl contains all the info needed : it holds the last fragment downloaded index.

So again, I feel like it would be somewhat easy to implement. Am I wrong ?

DmitryScaletta commented 10 months ago

There is a subtil difference, as we can only provide a VOD link (not a streamer link) with this option

A VOD link (if exists) can be obtained from a channel link as I mentioned here: https://github.com/yt-dlp/yt-dlp/issues/6491#issuecomment-1872979028

my original request was not "live from start" but rather "continue while live".

I think it's more clear to say like download from start AND continue while live.

P.S. It's exactly how I implemented it in my [twitch-dlp](https://github.com/DmitryScaletta/twitch-dlp) script. You can pass a streamer link with `--live-from-start` or a VOD link and it will download it from start and continue until stream ends.
Cqoicebordel commented 10 months ago

But in case of a VOD, it's the default behavior of yt-dlp to download from the start, so the option "live from the start" is kinda weird.
But in any case, it's just a squabble about vocabulary which won't help build the feature ;)

pukkandan commented 10 months ago

Whether user have to pass the switch or not, @DmitryScaletta's point is that the internal implementation needed is similar to --live-from-start feature.

superbonaci commented 3 months ago

@DmitryScaletta

I tested it. An API returns generated previewThumbnailURL right after the end of the stream but the VOD is not full immediately after the broadcast.

I've been testing when a live broadcast becomes VOD once the live stream ends. I've downloaded the playlist for best quality just a few seconds after a test stream ended, and the same playlist 15 minutes later is exactly the same, no new .ts added after 1241.ts. Also I've started downloading the VOD with TwitchDownloaderCLI about 5 or 10 seconds after the live stream ended, and the output.ts file is exactly the same duration in seconds as the VOD, some tenths of a second vary. And if you check like 30 minutes later, no single frame missing, looks like they have exactly the sam number of frames.

So for what I've seen, the VOD is updated really fast as soon as the stream ends, but maybe that's only for big channels, or happened in the past and doesn't happen anymore, maybe if it's a channel with no subscribers at all and has low priority. Maybe if you stream as unlisted the VOD is not immediately updated as soon as you end the live stream, but couldn't check. Anyone can test this?

Maybe the issue happens with qualities other than source, since the video has to be transcoded by the server and not the user, but didn't test.

If you can provide specific channels where the VOD at some quality is not immediately reflected in the playlist as soon as the live ends, would be great.

I confirm that twitch-dlp does the job. To download a VOD from the start while it's live and then continue downloading like VOD until it ends. I hope it does it always accurately without corruption.

Maybe yt-dlp developers want to keep live separated from VOD and it's fine. The live ID is different than the VOD ID, i already asked https://github.com/streamlink/streamlink/issues/6090 but they told me that to do it you have to combine 2 playlists, and they don't want to do it:

Segments which were removed from the live playlist segment window are only available in the VOD playlist, You can't simply merge different HLS playlists.