xiph / rav1e

The fastest and safest AV1 encoder.
BSD 2-Clause "Simplified" License
3.68k stars 250 forks source link

Valgrind reports tons of uninitialized reads and some memory leaks #2310

Open dwbuiten opened 4 years ago

dwbuiten commented 4 years ago

Log file (too long/large to paste in here): valgrind.log Source file: test_source.zip (zip because GitHub requires it)

Command line used (to reproduce): valgrind --leak-check=full --track-origins=yes ./target/debug/rav1e -o test.ivf --threads 1 --limit 2 --speed 10 test.y4m 2>&1 | tee valgrind.log

lu-zero commented 4 years ago

I'm seeing

==491008==  Uninitialised value was created by a stack allocation
==491008==    at 0x2F6BC4: rav1e::util::align::Aligned<T>::uninitialized (align.rs:35)

That means we aren't writing all the area we are supposedly reading, we should probably doublecheck this. (assuming it isn't a false positive, but I doubt it).

The memory leaks should be lazy-static buffers used by the threadpool and such, e.g.:

==491008== 288 bytes in 1 blocks are possibly lost in loss record 29 of 35
==491008==    at 0x4838B65: calloc (vg_replace_malloc.c:762)
==491008==    by 0x4011661: allocate_dtv (dl-tls.c:286)
==491008==    by 0x4011FAD: _dl_allocate_tls (dl-tls.c:532)
==491008==    by 0x4A21C0E: allocate_stack (allocatestack.c:622)
==491008==    by 0x4A21C0E: pthread_create@@GLIBC_2.2.5 (pthread_create.c:662)
==491008==    by 0x5404AD: std::sys::unix::thread::Thread::new (thread.rs:68)
==491008==    by 0x4591D6: std::thread::Builder::spawn_unchecked (mod.rs:492)
==491008==    by 0x459469: std::thread::Builder::spawn (mod.rs:386)
==491008==    by 0x455DBD: <rayon_core::registry::DefaultSpawn as rayon_core::registry::ThreadSpawn>::spawn (registry.rs:101)
==491008==    by 0x4562FE: rayon_core::registry::Registry::new (registry.rs:257)
==491008==    by 0x45A387: rayon_core::thread_pool::ThreadPool::build (mod.rs:69)
==491008==    by 0x464C64: rayon_core::ThreadPoolBuilder<S>::build (lib.rs:201)
==491008==    by 0x331BAE: rav1e::api::config::Config::new_context (mod.rs:143)
lu-zero commented 4 years ago

MemorySanitizer seems happy though:

# RUSTFLAGS="-Z sanitizer=memory -Z sanitizer-memory-track-origins=2" cargo -Zbuild-std run --target x86_64-unknown-linux-gnu -- -o test.ivf --threads 1 --limit 2 --speed 10 /tmp/test.y4m
tdaede commented 4 years ago

The first is pretty concerning. We do intentionally create uninitialized stack data, but we should never be reading from it.

For the latter, we should check if these are deliberate leaks from rayon. If so, it might make sense to commit a valgrind suppression file in the repo.

barrbrain commented 4 years ago

The errors are somewhat confusing because valgrind is tracking the taint of uninitialised reads. The first such read is from the reconstruction frame.

barrbrain commented 4 years ago

The issue appears to involve intra prediction assembly. With RAV1E_CPU_TARGET=rust, no uninitialised read is reported on the first frame. More specifically, if I disable asm for DC_PRED and CFL_PRED where tx_size.width() < 32 then the errors in diff() and related to get_satd() no longer occur.

diff --git a/src/asm/x86/predict.rs b/src/asm/x86/predict.rs
index 050437e3..a735a59a 100644
--- a/src/asm/x86/predict.rs
+++ b/src/asm/x86/predict.rs
@@ -110,6 +110,8 @@ pub fn dispatch_predict_intra<T: Pixel>(
     return call_rust(dst);
   }

+  let valgrind = true;
+
   unsafe {
     let dst_ptr = dst.data_ptr_mut() as *mut _;
     let stride = dst.plane_cfg.stride as libc::ptrdiff_t;
@@ -121,7 +123,7 @@ pub fn dispatch_predict_intra<T: Pixel>(

     if cpu >= CpuFeatureLevel::AVX2 {
       match mode {
-        PredictionMode::DC_PRED => {
+        PredictionMode::DC_PRED if !valgrind || tx_size.width() >= 32 => {
           (match variant {
             PredictionVariant::NONE => rav1e_ipred_dc_128_avx2,
             PredictionVariant::LEFT => rav1e_ipred_dc_left_avx2,
@@ -187,7 +189,7 @@ pub fn dispatch_predict_intra<T: Pixel>(
         PredictionMode::PAETH_PRED => {
           rav1e_ipred_paeth_avx2(dst_ptr, stride, edge_ptr, w, h, angle);
         }
-        PredictionMode::UV_CFL_PRED => {
+        PredictionMode::UV_CFL_PRED if !valgrind || tx_size.width() >= 32 => {
           let ac_ptr = ac.as_ptr() as *const _;
           (match variant {
             PredictionVariant::NONE => rav1e_ipred_cfl_128_avx2,
barrbrain commented 4 years ago

I ran the same test except at speed 3, and received the get_satd() adjacent error again. So there must be an additional source of "uninitialised reads."