lay295 / TwitchDownloader

Twitch VOD/Clip Downloader - Chat Download/Render/Replay
MIT License
2.68k stars 261 forks source link

Download subtitles from videos that have embedded subtitles and option to render it in video #948

Open MrDummyNL opened 8 months ago

MrDummyNL commented 8 months ago

Checklist

Write your feature request here

More streamers are using the subtitles that are send with the video. This is already possible and i use it too. Some of my friends are using it too. This is why it will shame if subtitle data cannot stored in SRT file.

That is why i ask you to add the function to read subtitle data from TS video (which is logical step because you can only read it with full TS video, then you can render it. Optionally, you can render subtitles together with video. So you have 2 options:

I know tools exists to bake subtitles in the video. Should not too hard. I might save some interesting vods, but subtitle data will lost here. That is kinda sad.

Can you make it possible? Thanks in advance.

superbonaci commented 8 months ago

The subtitles are not in the TS video, no idea where you got that from. A different thing is that the subs are video encoded inside the stream, but that's up to the streamer using OBS or other software.

Baking subs inside a video requires more knowledge that it look like, that's usually a whole different matter.

If you mean to extract the chat from the video you need some OCR but it's still alpha software.

ScrubN commented 8 months ago

Someone already requested including subtitles in #750, and I made some very minor progress on it.

The subtitles are not in the TS video

The subtitles are stored in the TS video chunks. My initial implementation used FFmpeg to extract the subtitles which is extremely slow (~1.6s per 10 seconds of video). I felt this was unacceptable performance so I wanted to make a custom solution that is significantly faster, however other bug reports took priority at the time.

superbonaci commented 8 months ago

Do you mean the only-subscribers VODs chats are stored inside the TS parts, with the emojis, colours, gifs and everything?

ScrubN commented 8 months ago

Do you mean the only-subscribers VODs chats are stored inside the TS parts, with the emojis, colours, gifs and everything?

Subtitles image

superbonaci commented 8 months ago

So do you mean the transcription of the streamer's voice? Basically what YouTube does?

ScrubN commented 8 months ago

The subtitles are generated by the streamer and embedded inside the live broadcast during the stream. These subtitles are also available in the VOD after the stream has ended. https://help.twitch.tv/s/article/guide-to-closed-captions?language=en_US

superbonaci commented 8 months ago

I get it but that's a whole different thing from the user's chat. Is the voice of the streamer put into text.

MrDummyNL commented 8 months ago

There is free plugin for OBS (it's also ONLY one plugin) that will send text data inside videostream to Twitch. The player in Twitch support the CC inside video and will show it.

There are other CC solutions, but they are not inside video, but send over second text stream. Those CC are done with browser that is open by streamer side. They will not show up in VOD (means to watch video back) so they wont work. Cannot watched later back because Twitch don't save seperate text stream.

But like other say @ScrubN , it's inside TS video so it can scanned for it and generate first SRT file based on time stamps, and as option it can encoded with ffmpeg inside mp4 video OR make seperated SRT file.

It's not impossible, some videoplayers even support it. We need make correct code to extract text data from TS video. It's something i miss it, i make backups of my videos and my videos have CC inside TS stream. I know they exists.

Edit: the plugin is this one: https://github.com/ratwithacompiler/OBS-captions-plugin

superbonaci commented 8 months ago

Do you mean the only-subscribers VODs chats are stored inside the TS parts, with the emojis, colours, gifs and everything?

So the ID https://www.twitch.tv/videos/1980035805 has embedded subs in the TS? I'll check it I didn't know that was possible.

superbonaci commented 8 months ago

@MrDummyNL which software can be used to extract the subs from the TS?

ScrubN commented 8 months ago

@MrDummyNL which software can be used to extract the subs from the TS?

See this earlier comment https://github.com/lay295/TwitchDownloader/issues/948#issuecomment-1904604563. I still want to implement custom caption extraction (if possible) because FFmpeg is so slow.

MrDummyNL commented 8 months ago

Even it's slow, it's always welcome. I am happy to extract something. I will search around for other solutions, so i let you know soon, otherwise we have @ScrubN option.

All my latest twitch streams (on MrDummy_NL channel) have CC video inside video. (2024 videos) so you can test on mine. My friend MarukaKou is using it too with CC in video lately, so you can also use his VODs to test it,

Update: there is tool on github: https://github.com/kanongil/telxcc and it's also used in other tools like CCExtrator. Seems you can use it. Video editor https://www.nikse.dk/subtitleedit seems can read CC from TS files. Pyhton script: https://pypi.org/project/ts-cc-extractor/

Should nice to link to some 3rd party programs and run it when TS file is completed - or - use github code and ofc credit the creator with it.

superbonaci commented 8 months ago

@MrDummyNL did you try any of these if actually work with the twitch sample? Some of these programs are 10 years old.

superbonaci commented 8 months ago

None of them work, I've reported the issue to all of them.

ScrubN commented 8 months ago

Update: there is tool on github: https://github.com/kanongil/telxcc and it's also used in other tools like CCExtrator. Seems you can use it. Video editor https://www.nikse.dk/subtitleedit seems can read CC from TS files. Pyhton script: https://pypi.org/project/ts-cc-extractor/

telxcc is written in C and provides no prebuilt binaries, and I really don't want to deal with makefiles or linking on linux. subtitleedit is licensed as GPL, so I cannot use their source code as reference without relicensing TwitchDownloader as GPL. ts-cc-extractor is licensed as BSD 2-Clause, so I can reference their source code provided I include a copy of the license with TwitchDownloader.

Also @superbonaci, you cannot extract the subtitles from the concatenated TS file because it is produced by concatenating the raw bytes from all of the parts together. It's honestly a miracle to me that FFmpeg can read the concatenated file because of how f-ed the metadata keys probably are.

superbonaci commented 8 months ago

@ScrubN HandBrake is able to detect the Twitch subtitles as Closed Caption CC608, and embed them in the video (if you wish).

Here's the video: https://www.twitch.tv/videos/1923916260

handbrake_twitch

If you save it as mkv and choose not to Burn into video, you can choose or not as subtitle track from VLC:

handbrake_mkv

What I don't know is if HandBrake used some external command to do it or is all built in to it, but it works great.

As I said the other commands don't work: telxcc, subtitleedit, ts-cc-extractor.

superbonaci commented 8 months ago

@ScrubN here are the bugs I've reported:

ScrubN commented 8 months ago

@ScrubN HandBrake is able to detect the Twitch subtitles as Closed Caption CC608, and embed them in the video (if you wish).

Oh strange. I was only able to detect the subtitles from the individual parts with both FFmpeg and MPV. It seems that VLC also detects the subtitles from the merged video though. Again, this is probably a result of their parsers being overly forgiving to non-standard/corrupted data.

superbonaci commented 8 months ago

@ScrubN HandBrake is able to detect the Twitch subtitles as Closed Caption CC608, and embed them in the video (if you wish).

Oh strange. I was only able to detect the subtitles from the individual parts with both FFmpeg and MPV. It seems that VLC also detects the subtitles from the merged video though. Again, this is probably a result of their parsers being overly forgiving to non-standard/corrupted data.

I'm not sure there is any corrupt data actually, because Video DownloadHelper downloads the same m2ts file as the merged with TwitchDownloader, so it must be correct.

Yes VLC shows several subs tracks but only one works (or there's only one). I'll have to report the issue to VLC and see what they say.

MrDummyNL commented 7 months ago

Any progress here about extracting CC part from TS videos?

ScrubN commented 7 months ago

Any progress here about extracting CC part from TS videos?

Sorry, I have been taking a break from TD to work on a private project with another developer. I have cleaned up a subtitle scanner I had written some time ago and committed it to a draft PR for transparency. Hopefully it should not take too long to finish and get into a working state.

MrDummyNL commented 7 months ago

That is great news! Thank you to make it soon possible!

ScrubN commented 3 months ago

I'm sorry for the wait. I was having a bit of a hard time when I learned that by changing how we concatenate the downloaded parts, the subtitles can be naturally preserved without any extra work. The only issue is that I need to rewrite how trimming is handled, so it may take a little while.

More good news though, this alternative approach will make video finalization MUCH faster and possibly also fix some other issues.

ScrubN commented 3 months ago

Good news:

Bad news:

ScrubN commented 3 months ago

I found a flaw in the new concat method. It only works with whole segments. Trimming the start/end segments with FFmpeg completely corrupts all middle segments.

I found a solution, but it might cause the new finalization to not fix the other issues I mentioned.

ScrubN commented 3 months ago

I'm really annoyed right now. The ffmpeg command I was using that was extracting the subtitles is no longer working. Handbrake does recognize the subtitles, but annoyingly it only lets me burn in the subtitles, not export them. I might actually need to write a custom subtitles parser and I'm not very happy about it.

ScrubN commented 3 months ago

@MrDummyNL you said you currently extract the subtitles from the download cache. What tool(s) do you use to do that?