Compare intermediate upsampled source against renditions transcoded using ffmpeg CLI

yondonfu commented 4 years ago

A candidate solution to #93 is to apply ffmpeg's FPS filter to upsample/downsample the source to match the rendition FPS. An initial experiment yielded poor results:

- ffmpeg command (with 24fps):
 ffmpeg -re -i official_test_source_2s_keys_24pfs.mp4 st-vcodec libx264 -b:v 1000k -x264-params keyint=48:min-keyint=48 -acodec copy -f flv rtmp://localhost:1935/movie 

- Number of verifications (720p + 360p + 240p): 468
- Number of false negatives: 366
- TPR: 0,217

The renditions were transcoded using a LP orchestrator/transcoder

Broadcaster configuration used in this experiment: livepeer -broadcaster -verifierUrl http://localhost:5000/verify -transcodingOptions P240p30fps16x9,P360p30fps16x9,P720p30fps16x9 -verifierPath ~/Epic/livepeer/verification-classifier/stream -orchAddr 127.0.0.1:8935 -httpAddr :8936

Orchestrator configuration used in this experiment: livepeer -orchestrator -transcoder -pricePerUnit 1 -serviceAddr 127.0.0.1:8935 -cliAddr :7936 -v 99

The original hypothesis was that applying ffmpeg's FPS filter to the source such that the intermediate source and rendition have the same FPS would create alignment between the frames of both videos. However, it could be the case that while the LPMS transcoder (used by a LP orchestrator/transcoder) uses the libavfilter FPS filter (the same as ffmpeg) the implementation might cause frame misalignment between the source and rendition in other ways.

It would be helpful to see if comparing an intermediate upsampled source against renditions transcoded using the ffmpeg CLI (instead of the LPMS transcoder) yield better results. The LPMS transcoder currently does the equivalent of:

ffmpeg -i <INPUT> -vsync 0 -vf fps=<FPS>,scale=w=<WIDTH>=h=<HEIGHT> -c:v libx264 <OUTPUT>

We can create a set of renditions transcoded using the ffmpeg CLI and compare them against an intermediate upsampled source. If we observe better results in this experiment then we can investigate how to accommodate for the LPMS transcoder behavior in the verifier. If we do not observe better results then we'll need to explore other areas of investigation.

ndujar commented 4 years ago

Several experiments were conducted in this regard to try to figure out what is making the downsampling-upsampling misalign frames. FFmpeg outputs were collected after running the downsample-upsample process in order to figure out the final positions of the dropped / duplicated frames.

With my current understanding (derived from analyzing outputs), what happens is as follows: Numbers represent frame positions, left the source, right the rendition. 'x' are dropped frames.

Downsample	Upsample	Result
0->0	0->0	0->0
1->1	1->1	1->1
2->2	2->2	2->2
3->x	2->3	2->3
4->3	4->4	5->4
5->4	5->5	6->5
6->5	6->6	7->6
7->6	7->7	9->7
8->x	8->8	10->8
9->7	8->9	10->9
10->8	10->10	12->10
11->9	11->11	14->11
12->10	12->12	15->12
13->x	...	...
14->11	...	...
15->12	...	...
...	...	...

Upsample	Downsample	Result
0->0	0->0	0->0
1->1	1->1	1->1
1->2	2->x	3->2
3->3	3->2	4->3
4->4	4->3	5->4
5->5	5->4	7->5
6->6	6->x	8->6
6->7	7->5	9->7
8->8	8->6	11->8
9->9	9->7	11->9
10->10	10->x	13->10
11->11	11->8	15->11
11->12	12->9	...
13->13	13->10	...
14->14	14->x	...
15->15	15->11	...
...	...	...

Which clearly explains the shuffling effect. Sample outputs can be found here:

https://app.zenhub.com/files/172597245/fc8f32b1-317e-4501-929d-aac8b1e36ce1/download

https://app.zenhub.com/files/172597245/a0d9f3ca-dfe1-4774-8abc-162150ee1504/download

yondonfu commented 4 years ago

The results described in this comment tells us the following:

Let downsampling be represented by the function d(fps, video) where fps is the FPS of the output video and video is the input video that should be downsampled
Let upsampling be represented by the function u(fps, video) where fps is the FPS of the output video and video is the input video that should be upsampled
Let orig_fps be the FPS of the original video
Let target_fps be the target FPS of the original video
u(orig_fps, d(target_fps, video) != video
d(orig_fps, u(target_fps, video) != video

While the above is useful information, we still want to understand if applying the ffmpeg FPS filter on the source such that it has the same FPS as a rendition transcoded by the ffmpeg CLI (using the ffmpeg command described in the OP) will result in frame alignment between the intermediate source (after applying the FPS filter) and the rendition such that the TPR/FNR scores of the verifier are comparable with the TPR/FNR scores of the verifier when comparing a source and rendition that already have the same FPS.

Looks like the frame-averaging branch has the necessary code for applying the ffmpeg FPS filter on the source so we should be able to just use that for the experiment.

TODO: Get TPR/FNR scores of the verifier when applying the ffmpeg FPS filter on the source and then comparing the intermediate source against the rendition (ex. upsample 25fps source to 30fps intermediate source and comparing the intermediate source with the 30fps rendition).

ndujar commented 4 years ago

TODO: Get TPR/FNR scores of the verifier when applying the ffmpeg FPS filter on the source and then comparing the intermediate source against the rendition (ex. upsample 25fps source to 30fps intermediate source and comparing the intermediate source with the 30fps rendition).

A series of experiments has been conducted with all the values for the rounding parameter of the fps filter. Table below shows the resulting TPRs using an intermediate resampled source created from the original source and passed to the verifier:

FPS->FPS	Zero	Inf	Down	Up	Near
24 -> 30	0.535	0.737	0.631	0.733	0.731
30 -> 25	0.607	0.750	0.627	0.760	0.809
30 -> 30	0.757	0.707	0.752	0.771	0.929

The code that generates the intermediate upsampled/downsampled source is: subprocess.call(['ffmpeg', '-y', '-i', video_file, '-filter:v', 'fps=fps={}:round={}'.format(fps, rounding), resampled_video_file]) Where fps is the target rendition's frame rate and rounding is each of the possible parameters accepted by the fps filter according to the documentation (http://ffmpeg.org/ffmpeg-filters.html#fps) The introduction of the intermediate rendition does indeed seem to give a set of aligned frames, as the outputs of the verifier indicate. However, it is not clear what exactly the fps filter is doing.

In order to discard the possibility of errors in the random sampling algorithm, experiments were done where the intermediate source is not used. We achieved the expected accuracy of 0.988 for no resampled renditions (30fps source -> 30fps rendition).

yondonfu commented 4 years ago

Table below shows the results using an intermediate resampled source created from the original source and passed to the verifier

Just to clarify, are the values in the table the verifier's TPR values?

If the ffmpeg CLI command used to transcode the rendition was:

ffmpeg -i <INPUT> -vsync 0 -vf fps=<FPS>,scale=w=<WIDTH>=h=<HEIGHT> -c:v libx264 <OUTPUT>

then it makes sense that the verifier's TPR values are highest when the intermediate source is created using the FPS filter with the rounding parameter set to near since the ffmpeg CLI transcoding operation should also use the rounding parameter near for the FPS filter.

But, even when the rounding parameter is set to near, it looks like the verifier's TPR values are lower than expected: 0.731 when the source FPS = 24 and the rendition FPS = 30 and 0.809 when the source FPS = 30 and the rendition FPS = 25. You mention that this experiment did yield a set of aligned frames between the intermediate source and the rendition - wouldn't these TPR values say otherwise since they are lower than the TPR value when the source FPS = 30 and the rendition FPS = 30?

ndujar commented 4 years ago

Just to clarify, are the values in the table the verifier's TPR values?

Yes, I have updated the comment. Thanks for pointing it out :)

ndujar commented 4 years ago

But, even when the rounding parameter is set to near, it looks like the verifier's TPR values are lower than expected: 0.731 when the source FPS = 24 and the rendition FPS = 30 and 0.809 when the source FPS = 30 and the rendition FPS = 25. You mention that this experiment did yield a set of aligned frames between the intermediate source and the rendition - wouldn't these TPR values say otherwise since they are lower than the TPR value when the source FPS = 30 and the rendition FPS = 30?

The timestamps are now aligned, as they should, as both source and rendition have the same frame rates in the eyes of the verifier. I just remarked that fact because it is the proof that we are using the intermediate fps-filter-generated source. However, they are indeed not aligned, as the TPR values are indicating. More so when, using the same sampling algorithm, we run the experiments over the 30fps->30fps and verify that the insertion of the intermediate source is causing some artifacts.

My hope was that at least one of the rounding methods would yield useful values, so we could figure out what is it that the fps filter is doing.

yondonfu commented 4 years ago

Spent some time investigating this issue further and I believe that applying the ffmpeg FPS filter on the source to create an intermediate source with the same FPS as a rendition may resolve #93 if:

The -vsync 0 option is used when applying the ffmpeg FPS filter on the source.
The same ffmpeg version is used for both applying the FPS filter on the source and transcoding the rendition.

The -vsync 0 option will cause frames to be passed with their original timestamps from the demuxer to the muxer. In previous experiments, the default value for -vsync was used when applying the FPS filter which is cfr (duplicate/drop frames to achieve a constant frame rate) or vfr (pass through frames or drop them to avoid 2 frames with the same timestamp) depending on the setting. This is important because at the moment transcoding behavior of Livepeer transcoders is most closely matched by supplying the -vsync 0 option when transcoding with the ffmpeg CLI (as noted here).

The same ffmpeg version should be used for both applying the FPS filter on the source and transcoding the rendition to ensure that the FPS adjustment algorithm is the same. See the notes below about issues I encountered when using different ffmpeg versions.

Experiment Overview

The test setup videos were setup using the following commands:

# Download the 1080p 30fps video
wget https://storage.googleapis.com/lp_testharness_assets/bbb_sunflower_1080p_30fps_normal_2min.mp4

# Segment the 2min video into 60 2s segments
ffmpeg -i bbb_sunflower_1080p_30fps_normal_2min.mp4 -map 0 -c copy -f segment -segment_time 2 output_%d.mp4

# Transcode the source to 720p 60fps
for i in {0..59}
do
    ffmpeg -i output_${i}.mp4 -vsync 0 -vf fps=60,scale=1920:h=1080 -c:v libx264 output_720p_60fps_${i}.mp4
done

# Transcode the source to 720p 25fps
for i in {0..59}
do
    ffmpeg -i output_${i}.mp4 -vsync 0 -vf fps=25,scale=1280:h=720 -c:v libx264 output_720p_25fps_${i}.mp4
done

This Python script was used to run the verifier API.

Results

In all of the below scenarios, the source was a 1080p 30fps video segment that was resampled to the rendition FPS before running verification.

Scenario 1

Random frame sampling on
Without -vsync 0 for FPS filter
Different ffmpeg version for FPS filter

720p 60fps Passes: 19 Fails: 41 TPR: .316

720p 25fps Passes: 20 Fails: 40 TPR: 0.33

Scenario 2

Random frame sampling off
With -vsync 0 for FPS filter
Different ffmpeg version for FPS filter

720p 60fps Passes: 57 Fails: 3 TPR: 0.95

720p 25fps Passes: 35 Fails: 25 TPR: 0.58

Scenario 3

Random frame sampling off
With -vsync 0 for FPS filter
Same ffmpeg version for FPS filter

720p 60fps Passes: 57 Fails: 3 TPR: 0.95

720p 25fps Passes: 56 Fails: 4 TPR: 0.93

Scenario 4

Random frame sampling on
With -vsync 0 for FPS filter
Same ffmpeg version for FPS filter

720p 60fps Passes: 58 Fails: 2 TPR: 0.96

720p 25fps Passes: 56 Fails: 4 TPR: 0.93

The verifier API changes for scenario 4 are on this branch which is branched off the frame-averaging. It uses a base image that contains the same version of ffmpeg that LPMS uses.

Some additional areas of investigation/possible improvement:

Whether the encoder used could cause frame misalignment. Would results be different when the renditions are produced by a software vs. a hardware encoder? Related to #98
Running the ffmpeg CLI as a subprocess in the verifier results in 2 passes/decodes of the source (once by OpenCV for feature extraction and once by the ffmpeg CLI when resampling). Instead, there could be a single decode that produces the decoded frames that are passed to the FPS filter and then the feature extraction step

ndujar commented 4 years ago

The verifier API changes for scenario 4 are on this branch which is branched off the frame-averaging. It uses a base image that contains the same version of ffmpeg that LPMS uses. Using the code from the same branch I obtain slightly different results. My test setupuses the verifier connected to the broadcaster node, then reading the number of negative 'tamper' outputs in the verifications.logs file. I understand that to switch random frame sampling 'off' you are leaving max_samples as -1.

My results for scenario 4:

Random frame sampling on (max_samples = 10)
With -vsync 0 for FPS filter
Same ffmpeg version for FPS filter 720p 60fps Passes: 55 Fails: 7 TPR: 0.89

720p 25fps Passes: 43 Fails: 13 TPR: 0.77

And yet a scenario 5:

Random frame sampling on (max_samples = 10)
Without -vsync 0 for FPS filter
Same ffmpeg version for FPS filter

720p 60fps Passes: 17 Fails: 41 TPR: 0.29

720p 25fps Passes: 18 Fails: 44 TPR: 0.29

yondonfu commented 4 years ago

Closed by #111

livepeer / verification-classifier

Compare intermediate upsampled source against renditions transcoded using ffmpeg CLI #95

Experiment Overview

Results