Use ffmpeg scene detection to improve chunked encoding

tkozybski commented 3 years ago

By splitting the frames evenly between chunks they will start/end in the middle of the scene, lowering compression efficiency and/or quality. I propose to add functionality to detect scene changes and split the chunks based on that. See here or here on how to do this.

For aomenc, first pass stats file could be parsed to get the keyframes for 100% accuracy thus in theory improving quality & parallelism at the same time (by not using multi threading options and encode in chunks instead). Av1an does that.

tkozybski commented 3 years ago

Wrong label...

JJKylee commented 3 years ago

This is very similar to what I'm pondering on these days. I was thinking about composing a StaxRip embedded PowerShell script that generates an I-frame index list. I found an obvious downside - too long process time - in using ffprobe for this purpose, so I turned to the DGIndexNV index file, dgi, instead to get the desired result. See here.

But using the scene option in ffmpeg as proposed by this post seems fit for more general use cases. Extracting the PTS time is not difficult. You just need to run the following DOS command line with INPUT to get OUTPUT.txt containing the result.

ffmpeg -hide_banner -i INPUT -vf "select='gt(scene,0.4)',metadata=print:file='OUTPUT.txt'" -f null NUL

OUTPUT.txt looks like this:

frame:0    pts:9510    pts_time:9.51
lavfi.scene_score=0.557776
frame:1    pts:15016   pts_time:15.016
lavfi.scene_score=0.691152
frame:2    pts:20021   pts_time:20.021
lavfi.scene_score=0.690279
frame:3    pts:21522   pts_time:21.522
lavfi.scene_score=0.532986
frame:4    pts:23524   pts_time:23.524
lavfi.scene_score=0.537670
frame:5    pts:28529   pts_time:28.529
lavfi.scene_score=0.619934
...

What is difficult, though, is how we can put it to work in StaxRip using the generated pts_time info. We need to strip unnecessary part and convert pts_time to a workable format like HH:MM:SS.nnn. But since there's already a tool that does this - PySceneDetect - maybe we better find a way to make use of it. As you may know, Av1an is also utilizing this tool to get the cut info.

That said, another big hurdle is in place. Currently StaxRip is using frame number info (evenly divided total frame numbers) to put it directly in each encoder's parameters that are used for chunk encoding. But in order to adopt this new tool, an overhaul of the code is inevitable since every chunk encoding should be done via mkvextract or ffmpeg to match the cut timecodes, not frame numbers. I think this is really a big matter and will take a lot of time. Big food for thought. 🙄

Last but not least, there's a critical problem with this ffmpeg - scene option approach: it fails on some sources. For example, this Dolby Vision trailer - Chameleon.m2ts on Dolby Trailers - does not work well with this method even after the m2ts file is remuxed to mkv. It yields this error message and OUTPUT.txt file is simply empty.

[hevc @ 00000153cd5aca00] Invalid NAL unit 36, skipping.

I don't know if PySceneDetect is free of this kind of issues, but if not, then it's not reliable to use for general purposes. That's a big hurdle. 🤔

stax76 commented 3 years ago

I wonder if the index file created by ffms2 and L-Smash-Works contains info about I-Frames (I guess so) and if the format of the index file is easy to understand. It could not only be useful for chunk encoding, but also for cutting without re-encoding.

JJKylee commented 3 years ago

@stax76, that’s right. I’m wondering if the authors are willing to change the format. Hmm...

stax76 commented 3 years ago

Probably not. Vapoursynth is modern and powerful, generally has rich metadata support, so a source filter could provide this info so that it can be accessed with the vapoursynth API, maybe it's already supported, or it can be requested from ffms2, l-smash and dgdecnv. But reading it from the index file would be significantly faster, it would not require requesting all frames, maybe the index format isn't so complex.

DJATOM commented 3 years ago

From my experience... Don't try to split and merge open-gop hevc streams, it will produce bad things in result.

JJKylee commented 3 years ago

Yeah, esp. in stream copy. In that respect, I-frame list or scene-detected frame(timecode) list alone may raise an issue for stream copy cutting with open-GOP stream structures like HEVC.

Since chunk encoding also involves stream copy cutting (either by the encoder itself at frame indexes, or via mkvextract/ffmpeg for timecode-based cutting), it may raise an issue in the same vein. 🙄

So at this point, another issue comes up. Can we extract only IDR frames which have good scene values? To do that, maybe we need to include another criterion that identifies whether a given frame is IDR or not. Food for thought. 🤔

JJKylee commented 3 years ago

On second thought, frame index cutting by the encoder may not be a problem. Since the encoder receives decoded frames served by the frameserver via an avs/vpy script, it's not a stream copy cutting.

OTOH, cutting by mkvextract or ffmpeg does not involve any prior decoding process, so it's basically a stream copy cutting.

Therefore, it seems that timecode-based cutting for chunk encoding raises another issue in this regard. Hmm...

Ding-adong commented 2 years ago

Any update news on this.

Presently I use a roundabout way of chunk at scene change.

Vdub to get precise frame number.
staxrip cut the first half and create job - filename1
cut 2nd half and create job - filename2
start filename1
open another instance of staxrip
start filename2
merge both filename 1 and 2

I do wonder if this could be automatically processed?

staxrip / staxrip

Use ffmpeg scene detection to improve chunked encoding #619