Closed yondonfu closed 4 years ago
Several experiments were conducted in this regard to try to figure out what is making the downsampling-upsampling misalign frames. FFmpeg outputs were collected after running the downsample-upsample process in order to figure out the final positions of the dropped / duplicated frames.
With my current understanding (derived from analyzing outputs), what happens is as follows: Numbers represent frame positions, left the source, right the rendition. 'x' are dropped frames.
Downsample | Upsample | Result |
---|---|---|
0->0 | 0->0 | 0->0 |
1->1 | 1->1 | 1->1 |
2->2 | 2->2 | 2->2 |
3->x | 2->3 | 2->3 |
4->3 | 4->4 | 5->4 |
5->4 | 5->5 | 6->5 |
6->5 | 6->6 | 7->6 |
7->6 | 7->7 | 9->7 |
8->x | 8->8 | 10->8 |
9->7 | 8->9 | 10->9 |
10->8 | 10->10 | 12->10 |
11->9 | 11->11 | 14->11 |
12->10 | 12->12 | 15->12 |
13->x | ... | ... |
14->11 | ... | ... |
15->12 | ... | ... |
... | ... | ... |
Upsample | Downsample | Result |
---|---|---|
0->0 | 0->0 | 0->0 |
1->1 | 1->1 | 1->1 |
1->2 | 2->x | 3->2 |
3->3 | 3->2 | 4->3 |
4->4 | 4->3 | 5->4 |
5->5 | 5->4 | 7->5 |
6->6 | 6->x | 8->6 |
6->7 | 7->5 | 9->7 |
8->8 | 8->6 | 11->8 |
9->9 | 9->7 | 11->9 |
10->10 | 10->x | 13->10 |
11->11 | 11->8 | 15->11 |
11->12 | 12->9 | ... |
13->13 | 13->10 | ... |
14->14 | 14->x | ... |
15->15 | 15->11 | ... |
... | ... | ... |
Which clearly explains the shuffling effect. Sample outputs can be found here:
https://app.zenhub.com/files/172597245/fc8f32b1-317e-4501-929d-aac8b1e36ce1/download
https://app.zenhub.com/files/172597245/a0d9f3ca-dfe1-4774-8abc-162150ee1504/download
The results described in this comment tells us the following:
d(fps, video)
where fps
is the FPS of the output video and video
is the input video that should be downsampledu(fps, video)
where fps
is the FPS of the output video and video
is the input video that should be upsampledorig_fps
be the FPS of the original videotarget_fps
be the target FPS of the original videou(orig_fps, d(target_fps, video)
!= video
d(orig_fps, u(target_fps, video)
!= video
While the above is useful information, we still want to understand if applying the ffmpeg FPS filter on the source such that it has the same FPS as a rendition transcoded by the ffmpeg CLI (using the ffmpeg command described in the OP) will result in frame alignment between the intermediate source (after applying the FPS filter) and the rendition such that the TPR/FNR scores of the verifier are comparable with the TPR/FNR scores of the verifier when comparing a source and rendition that already have the same FPS.
Looks like the frame-averaging branch has the necessary code for applying the ffmpeg FPS filter on the source so we should be able to just use that for the experiment.
TODO: Get TPR/FNR scores of the verifier when applying the ffmpeg FPS filter on the source and then comparing the intermediate source against the rendition (ex. upsample 25fps source to 30fps intermediate source and comparing the intermediate source with the 30fps rendition).
TODO: Get TPR/FNR scores of the verifier when applying the ffmpeg FPS filter on the source and then comparing the intermediate source against the rendition (ex. upsample 25fps source to 30fps intermediate source and comparing the intermediate source with the 30fps rendition).
A series of experiments has been conducted with all the values for the rounding parameter of the fps filter. Table below shows the resulting TPRs using an intermediate resampled source created from the original source and passed to the verifier:
FPS->FPS | Zero | Inf | Down | Up | Near |
---|---|---|---|---|---|
24 -> 30 | 0.535 | 0.737 | 0.631 | 0.733 | 0.731 |
30 -> 25 | 0.607 | 0.750 | 0.627 | 0.760 | 0.809 |
30 -> 30 | 0.757 | 0.707 | 0.752 | 0.771 | 0.929 |
The code that generates the intermediate upsampled/downsampled source is:
subprocess.call(['ffmpeg', '-y', '-i', video_file, '-filter:v', 'fps=fps={}:round={}'.format(fps, rounding), resampled_video_file])
Where fps is the target rendition's frame rate and rounding is each of the possible parameters accepted by the fps filter according to the documentation (http://ffmpeg.org/ffmpeg-filters.html#fps)
The introduction of the intermediate rendition does indeed seem to give a set of aligned frames, as the outputs of the verifier indicate. However, it is not clear what exactly the fps filter is doing.
In order to discard the possibility of errors in the random sampling algorithm, experiments were done where the intermediate source is not used. We achieved the expected accuracy of 0.988 for no resampled renditions (30fps source -> 30fps rendition).
Table below shows the results using an intermediate resampled source created from the original source and passed to the verifier
Just to clarify, are the values in the table the verifier's TPR values?
If the ffmpeg CLI command used to transcode the rendition was:
ffmpeg -i <INPUT> -vsync 0 -vf fps=<FPS>,scale=w=<WIDTH>=h=<HEIGHT> -c:v libx264 <OUTPUT>
then it makes sense that the verifier's TPR values are highest when the intermediate source is created using the FPS filter with the rounding parameter set to near
since the ffmpeg CLI transcoding operation should also use the rounding parameter near
for the FPS filter.
But, even when the rounding parameter is set to near
, it looks like the verifier's TPR values are lower than expected: 0.731 when the source FPS = 24 and the rendition FPS = 30 and 0.809 when the source FPS = 30 and the rendition FPS = 25. You mention that this experiment did yield a set of aligned frames between the intermediate source and the rendition - wouldn't these TPR values say otherwise since they are lower than the TPR value when the source FPS = 30 and the rendition FPS = 30?
Just to clarify, are the values in the table the verifier's TPR values?
Yes, I have updated the comment. Thanks for pointing it out :)
But, even when the rounding parameter is set to near, it looks like the verifier's TPR values are lower than expected: 0.731 when the source FPS = 24 and the rendition FPS = 30 and 0.809 when the source FPS = 30 and the rendition FPS = 25. You mention that this experiment did yield a set of aligned frames between the intermediate source and the rendition - wouldn't these TPR values say otherwise since they are lower than the TPR value when the source FPS = 30 and the rendition FPS = 30?
The timestamps are now aligned, as they should, as both source and rendition have the same frame rates in the eyes of the verifier. I just remarked that fact because it is the proof that we are using the intermediate fps-filter-generated source. However, they are indeed not aligned, as the TPR values are indicating. More so when, using the same sampling algorithm, we run the experiments over the 30fps->30fps and verify that the insertion of the intermediate source is causing some artifacts.
My hope was that at least one of the rounding methods would yield useful values, so we could figure out what is it that the fps filter is doing.
Spent some time investigating this issue further and I believe that applying the ffmpeg FPS filter on the source to create an intermediate source with the same FPS as a rendition may resolve #93 if:
-vsync 0
option is used when applying the ffmpeg FPS filter on the source.The -vsync 0
option will cause frames to be passed with their original timestamps from the demuxer to the muxer. In previous experiments, the default value for -vsync
was used when applying the FPS filter which is cfr
(duplicate/drop frames to achieve a constant frame rate) or vfr
(pass through frames or drop them to avoid 2 frames with the same timestamp) depending on the setting. This is important because at the moment transcoding behavior of Livepeer transcoders is most closely matched by supplying the -vsync 0
option when transcoding with the ffmpeg CLI (as noted here).
The same ffmpeg version should be used for both applying the FPS filter on the source and transcoding the rendition to ensure that the FPS adjustment algorithm is the same. See the notes below about issues I encountered when using different ffmpeg versions.
The test setup videos were setup using the following commands:
# Download the 1080p 30fps video
wget https://storage.googleapis.com/lp_testharness_assets/bbb_sunflower_1080p_30fps_normal_2min.mp4
# Segment the 2min video into 60 2s segments
ffmpeg -i bbb_sunflower_1080p_30fps_normal_2min.mp4 -map 0 -c copy -f segment -segment_time 2 output_%d.mp4
# Transcode the source to 720p 60fps
for i in {0..59}
do
ffmpeg -i output_${i}.mp4 -vsync 0 -vf fps=60,scale=1920:h=1080 -c:v libx264 output_720p_60fps_${i}.mp4
done
# Transcode the source to 720p 25fps
for i in {0..59}
do
ffmpeg -i output_${i}.mp4 -vsync 0 -vf fps=25,scale=1280:h=720 -c:v libx264 output_720p_25fps_${i}.mp4
done
This Python script was used to run the verifier API.
In all of the below scenarios, the source was a 1080p 30fps video segment that was resampled to the rendition FPS before running verification.
Scenario 1
-vsync 0
for FPS filter720p 60fps Passes: 19 Fails: 41 TPR: .316
720p 25fps Passes: 20 Fails: 40 TPR: 0.33
Scenario 2
-vsync 0
for FPS filter720p 60fps Passes: 57 Fails: 3 TPR: 0.95
720p 25fps Passes: 35 Fails: 25 TPR: 0.58
Scenario 3
-vsync 0
for FPS filter720p 60fps Passes: 57 Fails: 3 TPR: 0.95
720p 25fps Passes: 56 Fails: 4 TPR: 0.93
Scenario 4
-vsync 0
for FPS filter720p 60fps Passes: 58 Fails: 2 TPR: 0.96
720p 25fps Passes: 56 Fails: 4 TPR: 0.93
The verifier API changes for scenario 4 are on this branch which is branched off the frame-averaging. It uses a base image that contains the same version of ffmpeg that LPMS uses.
Some additional areas of investigation/possible improvement:
The verifier API changes for scenario 4 are on this branch which is branched off the frame-averaging. It uses a base image that contains the same version of ffmpeg that LPMS uses. Using the code from the same branch I obtain slightly different results. My test setupuses the verifier connected to the broadcaster node, then reading the number of negative 'tamper' outputs in the verifications.logs file. I understand that to switch random frame sampling 'off' you are leaving max_samples as -1.
My results for scenario 4:
720p 25fps Passes: 43 Fails: 13 TPR: 0.77
And yet a scenario 5:
720p 60fps Passes: 17 Fails: 41 TPR: 0.29
720p 25fps Passes: 18 Fails: 44 TPR: 0.29
Closed by #111
A candidate solution to #93 is to apply ffmpeg's FPS filter to upsample/downsample the source to match the rendition FPS. An initial experiment yielded poor results:
The renditions were transcoded using a LP orchestrator/transcoder
Broadcaster configuration used in this experiment:
livepeer -broadcaster -verifierUrl http://localhost:5000/verify -transcodingOptions P240p30fps16x9,P360p30fps16x9,P720p30fps16x9 -verifierPath ~/Epic/livepeer/verification-classifier/stream -orchAddr 127.0.0.1:8935 -httpAddr :8936
Orchestrator configuration used in this experiment:
livepeer -orchestrator -transcoder -pricePerUnit 1 -serviceAddr 127.0.0.1:8935 -cliAddr :7936 -v 99
The original hypothesis was that applying ffmpeg's FPS filter to the source such that the intermediate source and rendition have the same FPS would create alignment between the frames of both videos. However, it could be the case that while the LPMS transcoder (used by a LP orchestrator/transcoder) uses the libavfilter FPS filter (the same as ffmpeg) the implementation might cause frame misalignment between the source and rendition in other ways.
It would be helpful to see if comparing an intermediate upsampled source against renditions transcoded using the ffmpeg CLI (instead of the LPMS transcoder) yield better results. The LPMS transcoder currently does the equivalent of:
ffmpeg -i <INPUT> -vsync 0 -vf fps=<FPS>,scale=w=<WIDTH>=h=<HEIGHT> -c:v libx264 <OUTPUT>
We can create a set of renditions transcoded using the ffmpeg CLI and compare them against an intermediate upsampled source. If we observe better results in this experiment then we can investigate how to accommodate for the LPMS transcoder behavior in the verifier. If we do not observe better results then we'll need to explore other areas of investigation.