xiph / rav1e

The fastest and safest AV1 encoder.
BSD 2-Clause "Simplified" License
3.72k stars 253 forks source link

Fix and reenable get_sad_16x16_sse2 (crashes when width not mod 64) #1715

Closed Draghmar closed 5 years ago

Draghmar commented 5 years ago

Hi I was doing some tests with rav1e windows build 0.1.0 (20190926-19-ge11f06a) and found one video that I couldn't encode. The process started as usual with:

INFO  rav1e > Using y4m decoder: 550x872p @ 30000/1001 fps, 4:2:0, 8-bit
INFO  rav1e > Encoding settings: keyint_min=12 keyint_max=240 quantizer=150 bitrate=0 min_quantizer=0 low_latency=false tune=Psychovisual rdo_lookahead_frames=40 min_block_size=4x4 multiref=true fast_deblock=false reduced_tx_set=false tx_domain_distortion=false tx_domain_rate=false encode_bottomup=true rdo_tx_decision=true prediction_modes=Complex-All include_near_mvs=true no_scene_detection=false diamond_me=true cdef=true quantizer_rdo=true use_satd_subpel=true non_square_partition=false
INFO  rav1e::api::config > CPU Feature Level: SSSE3
INFO  rav1e::api::internal > Using 4 tiles (1x4)

and stopped here without saying why. Output was not readable by media player and had only 32KB. Is there some debug option I could turn on to get what's happening? The parameters I used are: -v -s 1 --threads 4 --tile-rows 4 --quantizer 150 I tried changing -s and tiles.

Also worth mention is the fact I have the same video but in 1920x1080 and it works just fine. The one here is simply resized and cropped. Maybe it is somehow related to the resolution? Like SvtAv1Enc that works only with dimensions that can be divided by 8?

rzumer commented 5 years ago

I tested encoding a clip with the same resolution and parameters and had no problem, so will need a sample to try to reproduce.

shssoichiro commented 5 years ago

Could you test if it works if you try using 1 tile instead of 4? That is my first suspicion for what might be causing this.

Draghmar commented 5 years ago

@rzumer I was afraid it may have something to do with this specific file. And I can't unfortunately share it because that's some internal thing. :( That's why I asked about some way to debug this, to get an idea why it chooses to stop.

@shssoichiro Nope, it didn't work.

Draghmar commented 5 years ago

I've made some additional tests by using different video with similar parameters (fullHD, not long etc.) and put it through the same pipeline, which is:

ffmpeg -i "%%a" -preset slow -crf 0 -tune film -vf scale=-1:872,crop=550:in_h "in\%%~na_temp.mp4"
ffmpeg -i "%%a" "in\%%~na.y4m"
rav1e -v -s 1 --threads 4 --tile-rows 4 --quantizer 150 "%%a" -o "out\%%~na.ivf"

And the result are the same like with the previous file - encoding stops. Changing -crf doesn't make any difference. The weirdest part is that I tried ffmpeg to econde into AV1 and it stops always at 53 frame. And it work properly some time ago when I was playing with AV1 from ffmpeg, rav1e and SvtAv1Enc.

I will make another attempt when my friend that made this video show up. I will ask him to render it directly to 550x872 to remove ffmpeg from the variables.

Draghmar commented 5 years ago

I got 550x872 mp4 file rendered by Premiere Pro that I converted to y4m with ffmpeg and rav1e still won't encode it. The same procedure works for 1920x1080. So I have no idea what is going on. Especially that I took some stock fullHD clip and do the same thing with it getting the same effect. This one I can attach so here you go. Three files from the three steps: original, cropped and y4m.

rom1v commented 5 years ago

I tried to encode the y4m from your zipfile:

rav1e -v -s 1 --threads 4 --tile-rows 4 --quantizer 150 Pexels_Videos_1572432_mobile.y4m -o test.ivf

It works for me.

 INFO  rav1e > Using y4m decoder: 550x872p @ 24000/1001 fps, 4:2:0, 8-bit
 INFO  rav1e > Encoding settings: keyint_min=12 keyint_max=240 quantizer=150 bitrate=0 min_quantizer=0 low_latency=false tune=Psychovisual rdo_lookahead_frames=40 min_block_size=4x4 multiref=true fast_deblock=false reduced_tx_set=false tx_domain_distortion=false tx_domain_rate=false encode_bottomup=true rdo_tx_decision=true prediction_modes=Complex-All include_near_mvs=true no_scene_detection=false diamond_me=true cdef=true quantizer_rdo=true use_satd_subpel=true non_square_partition=false
 INFO  rav1e::api::config > CPU Feature Level: AVX2

Only happens for SSSE3?

Draghmar commented 5 years ago

Hm...it may be because inspired by your suggestion I did test that stock clip on my tablet that has Intel(R) Core(TM) m3-6Y30 CPU @ 0.90GHz (it's not as slow as one may think XD) and rav1e uses AVX2 for CPU Feature Level here. Is there a way to change/enforce which CPU Feature Level will be used?

shssoichiro commented 5 years ago

You can set the "RAV1E_CPU_TARGET" environment variable (set to "rust" to disable all ASM), although it won't consistently apply to all ASM until #1717 gets merged.

shssoichiro commented 5 years ago

The issue seems to be with one of the SSE2 functions which we also have an AVX function for. I ran this against the #1717 branch:

➜ RAV1E_CPU_TARGET=avx2 cargo test --release --features decode_test_dav1d,check_asm

running 149 tests
...

test result: ok. 130 passed; 0 failed; 19 ignored; 0 measured; 0 filtered out
➜ RAV1E_CPU_TARGET=sse2 cargo test --release --features decode_test_dav1d,check_asm

running 149 tests
test api::test::large_width_assert ... ok
...
test test_encode_decode::low_bit_depth_dav1d ... ignored
test test_encode_decode::odd_size_frame_with_full_rdo_dav1d ... ignored
error: process didn't exit successfully: `/Users/jholmer/repos/rav1e/target/release/deps/rav1e-28480a92d3ad6253` (signal: 11, SIGSEGV: invalid memory reference)
➜ RAV1E_CPU_TARGET=ssse3 cargo test --release --features decode_test_dav1d,check_asm

running 149 tests
test api::test::large_width_assert ... ok
...
test test_encode_decode::low_bit_depth_dav1d ... ignored
test test_encode_decode::odd_size_frame_with_full_rdo_dav1d ... ignored
error: process didn't exit successfully: `/Users/jholmer/repos/rav1e/target/release/deps/rav1e-28480a92d3ad6253` (signal: 11, SIGSEGV: invalid memory reference)
➜ RAV1E_CPU_TARGET=rust cargo test --release --features decode_test_dav1d,check_asm

running 149 tests
...

test result: ok. 130 passed; 0 failed; 19 ignored; 0 measured; 0 filtered out
shssoichiro commented 5 years ago

I've tracked the segfault as best I can down to rav1e_sad16x16_sse2. It does not crash on every call to it, only with certain input data. This can be reproduced against the test test_encode_decode::quantizer_100_dav1d.

shssoichiro commented 5 years ago

I was able to produce a backtrace, below, off of my branch for #1717. I also determined that the condition causing this to happen is when the width of the video is not a multiple of 64. The height of the video does not have any effect. My ASM skills are very lacking, so I'm unclear as to how to fix it, but I get the feeling someone familiar with ASM could fix this easily.

rav1e-0afb1c11afed9f10`rav1e_sad16x16_sse2.loop
->  0x555555d7a8db <+20>: psadbw (%rdi), %xmm1

frame #1: 0x00005555559457b4 rav1e-0afb1c11afed9f10`rav1e::me::full_search::h6a2ee690f2d8e627 at dist.rs:122:8
   119    let dist = match T::type_enum() {
   120      PixelType::U8 => match SAD_FNS[cpu.as_index()][to_index(bsize)] {
   121        Some(func) => unsafe {
-> 122          (func)(
   123            src.data_ptr() as *const _,
   124            T::to_asm_stride(src.plane_cfg.stride),
   125            dst.data_ptr() as *const _,
frame #2: 0x0000555555945226 rav1e-0afb1c11afed9f10`rav1e::me::full_search::h6a2ee690f2d8e627(fi=&0x7ffff014eb10, x_lo=-20, x_hi=22, y_lo=-16, y_hi=16, bsize=BLOCK_16X16, p_org=&0x7ffff7b0d5f0, p_ref=&0x7ffff00c67d8, best_mv=&0x7ffff7b0a248, lowest_cost=&0x7ffff7b0a240, po=PlaneOffset {
x: 2, 
y: 0
}, step=1, lambda=0, pmv=[MotionVector {
row: 0, 
col: 0
}, MotionVector {
row: 0, 
col: 0
}], allow_high_precision_mv=false) at me.rs:976
   973    // Select rectangular regions within search region with vert+horz windows
   974    for vert_window in search_region.vert_windows(blk_h).step_by(step) {
   975      for ref_window in vert_window.horz_windows(blk_w).step_by(step) {
-> 976        let sad = get_sad(
   977          &plane_org,
   978          &ref_window,
   979          bsize,
frame #3: 0x0000555555949bb6 rav1e-0afb1c11afed9f10`rav1e::me::estimate_motion_ss4::h2c1ffaf9be5098c2(fi=&0x7ffff014eb10, ts=&0x7ffff7b0ad68, bsize=BLOCK_64X64, ref_idx=0, tile_bo=TileBlockOffset(BlockOffset {
x: 16, 
y: 0
})) at me.rs:1065:4
   1062     // Divide by 16 to account for subsampling, 0.125 is a fudge factor
   1063     let lambda = (fi.me_lambda * 256.0 / 16.0 * 0.125) as u32;
   1064
-> 1065     full_search(
   1066       fi,
   1067       x_lo,
   1068       x_hi,
frame #4: 0x00005555558b789e rav1e-0afb1c11afed9f10`rav1e::api::internal::ContextInner$LT$T$GT$::compute_lookahead_motion_vectors::_$u7b$$u7b$closure$u7d$$u7d$::h719b05ade4c22c28 at encoder.rs:2750:14
   2747           let r = fi.ref_frames[i] as usize;
   2748           if pmvs[r].is_none() {
   2749             pmvs[r] =
-> 2750               estimate_motion_ss4(fi, ts, BlockSize::BLOCK_64X64, r, bo);
   2751           }
   2752         }
   2753         frame_pmvs.push(pmvs);
frame #5: 0x00005555558b72b4 rav1e-0afb1c11afed9f10`rav1e::api::internal::ContextInner$LT$T$GT$::compute_lookahead_motion_vectors::_$u7b$$u7b$closure$u7d$$u7d$::h719b05ade4c22c28(ctx=TileContextMut<u8> @ 0x00007ffff7b0ad68) at internal.rs:596
   593          let ts = &mut ctx.ts;
   594 
   595          // Compute the quarter-resolution motion vectors.
-> 596          let tile_pmvs = build_coarse_pmvs(fi, ts, &inter_cfg);
   597 
   598          // Compute the half-resolution motion vectors.
   599          let mut half_res_pmvs = Vec::with_capacity(ts.sb_height * ts.sb_width);
ycho commented 5 years ago

Isn't this now resolved by #1758?

shssoichiro commented 5 years ago

The crash is resolved, but the ASM is still broken. I've removed this as a blocker on the release, but I'm leaving this open to represent fixing and reenabling the broken ASM function.

ycho commented 5 years ago

The crash is resolved, but the ASM is still broken. I've removed this as a blocker on the release, but I'm leaving this open to represent fixing and reenabling the broken ASM function.

Right, thanks for revising the title of issue.