xiph / rav1e

The fastest and safest AV1 encoder.
BSD 2-Clause "Simplified" License
3.71k stars 253 forks source link

Disabling CDEF causes a > 80% regression on metrics #1399

Open tmatth opened 5 years ago

tmatth commented 5 years ago

Using this patch:

diff --git a/src/encoder.rs b/src/encoder.rs
index 8734473..4f3f130 100644
--- a/src/encoder.rs
+++ b/src/encoder.rs
@@ -213,7 +213,7 @@ impl Sequence {
       enable_ref_frame_mvs: false,
       enable_warped_motion: false,
       enable_superres: false,
-      enable_cdef: config.speed_settings.cdef && config.chroma_sampling != ChromaSampling::Cs422,
+      enable_cdef: false,
       enable_restoration: config.chroma_sampling != ChromaSampling::Cs422 &&
         config.chroma_sampling != ChromaSampling::Cs444, // FIXME: not working yet
       operating_points_cnt_minus_1: 0,

leads to this mess:

https://beta.arewecompressedyet.com/?job=master-f9ad26f55ebd5e3a827213b18f5b2534550ead39&job=master-no-cdef%402019-06-28T13%3A12%3A08.894Z

This probably points to a bug in our CDEF disabled codepath.

tdaede commented 5 years ago

Did you run cargo test --features=decode_test?

tmatth commented 5 years ago
running 85 tests
test api::test::flush_unlimited_reorder_scene_change_detection ... ok
test cdef::test::check_max_element ... ok
test cdef::test::test_padded_frame_copy ... ok
test cdef::test::test_padded_frame_copy_outside_input ... ok
test context::test::cdf_map ... ok
test context::test::cfl_joint_sign ... ok
test ec::test::booleans ... ok
test ec::test::cdf ... ok
test ec::test::mixed ... ok
test ec::test::update_cdf_4_sse2 ... ok
test encoder::test::check_partition_types_order ... ok
test frame::plane::test::copy_from_raw_u8 ... ok
test frame::plane::test::test_plane_pad ... ok
test api::test::flush_low_latency_scene_change_detection ... ok
test me::test::get_sad_same_u16 ... ok
test predict::test::pred_matches_u8 ... ok
test predict::test::pred_max ... ok
test quantize::test::gen_divu_table ... ok
test me::test::get_sad_same_u8 ... ok
test quantize::test::test_tx_log_scale ... ok
test rate::test::bexp64_vectors ... ok
test api::test::flush_unlimited_low_latency_no_scene_change ... ok
test rate::test::blog64_vectors ... ok
test rdo::estimate_rate_test ... ok
test quantize::test::test_divu_pair ... ok
test api::test::flush_unlimited_reorder_no_scene_change ... ok
test api::test::flush_reorder_scene_change_detection ... ok
test api::test::flush_low_latency_no_scene_change ... ok
test api::test::flush_unlimited_low_latency_scene_change_detection ... ok
test api::test::flush_reorder_no_scene_change ... ok
test rate::test::blog64_bexp64_round_trip ... ok
test test_encode_decode::keyframes_aom ... ok
test test_encode_decode::chroma_sampling_422_aom ... ok
test test_encode_decode::high_bit_depth_12_aom ... ok
test test_encode_decode::large_dimension::dimension_512x512_aom ... ok
test test_encode_decode::odd_size_frame_with_full_rdo_aom ... ignored
test test_encode_decode::bitrate_aom ... ok
test test_encode_decode::chroma_sampling_444_aom ... ok
test test_encode_decode::chroma_sampling_420_aom ... ok
test test_encode_decode::high_bit_depth_10_aom ... test test_encode_decode::high_bit_depth_10_aom has been running for over 60 seconds
test test_encode_decode::large_dimension::dimension_1024x1024_aom ... test test_encode_decode::large_dimension::dimension_1024x1024_aom has been running for over 60 seconds
test test_encode_decode::high_bit_depth_10_aom ... ok
test test_encode_decode::quantizer_100_aom ... ok
test test_encode_decode::large_dimension::dimension_2048x2048_aom ... test test_encode_decode::large_dimension::dimension_2048x2048_aom has been running for over 60 seconds
test test_encode_decode::reordering_short_video_aom ... ok
test test_encode_decode::quantizer_120_aom ... ok
test test_encode_decode::quantizer_60_aom ... ok
test test_encode_decode::quantizer_80_aom ... ok
test test_encode_decode::small_dimension::dimension_256x256_aom ... ok
test test_encode_decode::small_dimension::dimension_258x258_aom ... ok
test test_encode_decode::small_dimension::dimension_260x260_aom ... ok
test test_encode_decode::speed_0_aom ... ignored
test test_encode_decode::speed_10_aom ... ignored
test test_encode_decode::speed_1_aom ... ignored
test test_encode_decode::speed_2_aom ... ignored
test test_encode_decode::speed_3_aom ... ignored
test test_encode_decode::speed_4_aom ... ignored
test test_encode_decode::speed_5_aom ... ignored
test test_encode_decode::speed_6_aom ... ignored
test test_encode_decode::speed_7_aom ... ignored
test test_encode_decode::speed_8_aom ... ignored
test test_encode_decode::speed_9_aom ... ignored
test test_encode_decode::small_dimension::dimension_262x262_aom ... ok
test test_encode_decode::tiny_dimension::dimension_128x128_aom ... ok
test test_encode_decode::tiny_dimension::dimension_16x16_aom ... ok
test test_encode_decode::reordering_aom ... ok
test test_encode_decode::tiny_dimension::dimension_32x32_aom ... ok
test test_encode_decode::tiny_dimension::dimension_8x8_aom ... ok
test tiling::tiler::test::test_tile_area ... ok
test tiling::tiler::test::test_tile_blocks_area ... ok
test tiling::tiler::test::test_tile_blocks_write ... ok
test tiling::tiler::test::test_tile_iter_len ... ok
test tiling::tiler::test::test_tile_motion_vectors_write ... ok
test tiling::tiler::test::test_tile_restoration_edges ... ok
test tiling::tiler::test::test_tile_restoration_write ... ok
test tiling::tiler::test::test_tile_write ... ok
test tiling::tiler::test::test_tiling_info_from_tile_count ... ok
test transform::test::log_tx_ratios ... ok
test transform::test::roundtrips_u16 ... ok
test transform::test::roundtrips_u8 ... ok
test util::sanity ... ok
test test_encode_decode::tiny_dimension::dimension_64x64_aom ... ok
test test_encode_decode::small_dimension::dimension_264x264_aom ... ok
test test_encode_decode::small_dimension::dimension_265x265_aom ... ok
test test_encode_decode::tile_encoding_with_stretched_restoration_units_aom ... ok
test test_encode_decode::low_bit_depth_aom ... ok
test test_encode_decode::large_dimension::dimension_1024x1024_aom ... ok
test test_encode_decode::large_dimension::dimension_2048x2048_aom ... ok

test result: ok. 73 passed; 0 failed; 12 ignored; 0 measured; 0 filtered out

     Running target/debug/deps/rav1e-b6a91d22f794d941

running 0 tests

test result: ok. 0 passed; 0 failed; 0 ignored; 0 measured; 0 filtered out

     Running target/debug/deps/aom-089995dff9b2c60d

running 13 tests
test bindgen_test_layout___fsid_t ... ok
test bindgen_test_layout_aom_codec_ctx ... ok
test bindgen_test_layout_aom_codec_ctx__bindgen_ty_1 ... ok
test bindgen_test_layout_aom_codec_dec_cfg ... ok
test bindgen_test_layout_aom_codec_frame_buffer ... ok
test bindgen_test_layout_aom_codec_stream_info ... ok
test bindgen_test_layout_aom_image_rect ... ok
test bindgen_test_layout_aom_image ... ok
test bindgen_test_layout_aom_inspect_init ... ok
test bindgen_test_layout_aom_postproc_cfg ... ok
test bindgen_test_layout_av1_ref_frame ... ok
test bindgen_test_layout_imaxdiv_t ... ok
test bindgen_test_layout_cfg_options ... ok

test result: ok. 13 passed; 0 failed; 0 ignored; 0 measured; 0 filtered out

   Doc-tests rav1e

running 1 test
test src/lib.rs - version (line 101) ... ok

test result: ok. 1 passed; 0 failed; 0 ignored; 0 measured; 0 filtered out
rzumer commented 5 years ago

No desyncs that I can see with CDEF disabled. The rate is simply much higher for the same quality. The curves make sense at least; high gains at low rates, smaller gains at high rates. I thought that something may be messing with the RD cost calculation in RDO, but disabling cdef_dist did not change results significantly.

Red kayak does not seem to care about CDEF much (disabling cdef_dist actually slightly improves its PSNR with CDEF disabled). It seems that massive losses are concentrated in mostly static scenes.

tmatth commented 5 years ago

No desyncs that I can see with CDEF disabled. The rate is simply much higher for the same quality. Are the numbers really that odd? The curves make sense at least; high gains at low rates, smaller gains at high rates. I thought that something may be messing with the RD cost calculation in RDO, but disabling cdef_dist did not change results significantly.

Red kayak does not seem to care about CDEF much (disabling cdef_dist actually slightly improves its PSNR with CDEF disabled). It seems that massive losses are concentrated in mostly static scenes.

@rzumer so I tested disabling it at the frame header level, and it seems like there are code paths that break with only that change.

rzumer commented 5 years ago

After looking at bit distribution in the analyzer it seems the self-guided filter is running loose with CDEF disabled; every frame allocated over 6000 bits to it at q=252 on Kristen. The normal distribution with CDEF enabled seems to be a few thousand bits on intra frames, and nothing on inter frames.

Comparative run (no LRF + no CDEF => only no CDEF): https://beta.arewecompressedyet.com/?job=master-no-cdef-lrf%402019-07-23T22%3A54%3A26.767Z&job=master-no-cdef%402019-07-23T19%3A16%3A53.957Z

KyleSiefring commented 5 years ago

https://github.com/xiph/rav1e/blob/master/src/rdo.rs#L1454 I believe best_cost[pli] will be < 0 if cdef is disabled. This would cause sgr to always be used.

tmatth commented 5 years ago

https://github.com/xiph/rav1e/blob/master/src/rdo.rs#L1454 I believe best_cost[pli] will be < 0 if cdef is disabled. This would cause sgr to always be used.

@KyleSiefring yeah, accounting for that may have done it (although this fix may be too naive):

diff --git a/src/rdo.rs b/src/rdo.rs
index 4889b54..be42f93 100644
--- a/src/rdo.rs
+++ b/src/rdo.rs
@@ -1497,7 +1497,7 @@ pub fn rdo_loop_decision<T: Pixel>(tile_sbo: TileSuperBlockOffset, fi: &FrameInv
           let rate = cw.count_lrf_switchable(w, &ts.restoration.as_const(), current_lrf, pli);
           let frame_bo = ts.to_frame_super_block_offset(tile_sbo).block_offset(0, 0);
           let cost = compute_rd_cost(fi, frame_bo, BlockSize::BLOCK_64X64, rate, err);
-          if best_cost[pli] < 0. || cost < best_cost[pli] {
+          if (fi.sequence.enable_cdef && best_cost[pli] < 0.) || cost < best_cost[pli] {
             best_cost[pli] = cost;
             best_lrf[pli] = current_lrf;
             lrf_change = true;

AWCY results with the above fix + CDEF disabled:

https://beta.arewecompressedyet.com/?job=master-8b5936e33ea49a9320a13b0ef0103956684340e1&job=cdef-disabled%402019-08-07T04%3A20%3A03.200Z

vibhoothi commented 3 years ago

On checking now, https://beta.arewecompressedyet.com/?job=master-e19269752954379f4aa1033010e84489e85ef73d&job=rav1e-no-cdef%402021-05-23T00%3A26%3A47.352Z

An average of 4-12% is noticed, I think this seems normal