scottlamb / moonfire-nvr

Moonfire NVR, a security camera network video recorder
Other
1.22k stars 137 forks source link

'should always be an unindexed sample' panic #113

Closed jlpoolen closed 3 years ago

jlpoolen commented 3 years ago

This issue is being opened just as a placeholder for the moment. History I installed Moonfire-nvr in June, 2020, on a new clean RaspberryPi 4 purchased just for this software. My set-up included 8 TBs of disk space to hold the feeds of 4 Reolink cameras at high resolutions, e.g. 1920x1080, at high frame rates, e.g. 27+ frames/second. The performance went extremely well. Occasionally, some hiccups and I would restart one or more cameras.

I then upgraded Moonfire-nvr and probably performed several other upgrades to my Raspberry Pi environment. Since then, I have had intermittent problems with one or more cameras' feeds not being preserved and the web interface having little, if anything for some cameras. It did not seem camera specific; however, I have not fully analyzed the matter.

This issue is being opened to start tracking my investigation as to what the problem may be. I opened yesterday Issue #112 to document what I was doing to capture and preserve colorized logs. Scott's comments therein note that he does not colorize output, so the colorized output is coming from ffmpeg.

Current Here is an example of my web interface showing two cameras (garage_west & Peck_west) down: Screenshot_2021-03-09_0836AM_Moonfire NVR Here is a link to a 14 MB HTML formatted log preserving the coloration: https://drive.google.com/file/d/1UrQgLzgetLfCT8681uOOO9u1a_gGGKjU/view?usp=sharing

I have not reviewed the logs carefully, I just wanted to open this issue to set up a place where I can share my findings and our react to suggestions and other comments. Scott had mentioned recreating the ffmpeg command directly in a console against one of the cameras and see if the problems repeat themselves. I want to try building Moonfire-nvr in a Gentoo-based VM where I have more control over my environment and see if the same results occur there as in the RaspberryPi. I'm suspecting the problem I am facing is not necessarily related to Moonfire-nvr and is a problem with dependencies. Since RaspberryPi is a suggested platform for running this software, it merits further investigation.

scottlamb commented 3 years ago

I can think of two potentially-relevant changes in Moonfire NVR itself between the two versions:

I would first try building without the analytics feature and see if that solves the problem. It's not well-tested, potentially uses a lot of CPU which might be a cause of connection drops (especially on a Raspberry Pi), and doesn't do much useful yet (it doesn't save any results of its work to the database yet).

It'd also be helpful to compare to logs from the old version, if you still have any around. In particular, I'd like to see the old log's match of the following new log lines, so we know what's changed on ffmpeg's side.

I0304 104038.168 main moonfire_ffmpeg] Initialized ffmpeg. Versions:
avutil: running=56.22.100 compiled=56.22.100
avcodec: running=58.35.100 compiled=58.35.100
avformat: running=58.20.100 compiled=58.20.100
scottlamb commented 3 years ago

I also just noticed this error:

thread 's-peck_west-main' panicked at 'should always be an unindexed sample', /usr/local/src/moonfire-nvr/server/db/writer.rs:750:54
note: run with `RUST_BACKTRACE=1` environment variable to display a backtrace

That's definitely a bug in Moonfire NVR itself. Could you set the environment variable it mentions to get some more info?

jlpoolen commented 3 years ago

Having been running for 1.5 hours with " RUST_BACKTRACE=1", spotted a few tracebacks. Current web interface has all four cameras showing, one with an infinity entry, otherwise all look normal.

1.5 hours of log at: https://pastebin.com/9KCUEagm

jlpoolen commented 3 years ago

Here's a log file representing 19 hours. It is just the log file, with ANSI, color codes, and continues from the previous log file I posted in https://github.com/scottlamb/moonfire-nvr/issues/113#issuecomment-794558888

https://drive.google.com/file/d/1D9ZRn6aWfbuLqFca-gozmMuwO7d5JB-4/view?usp=sharing

The current state of the web interface is that 2 cameras (this time garage_east & garage_west) have not content. The Reolink client display all four cameras and the Reolink cached events have files from all four cameras; there is no indication from the 307 cached files for March 10th (8+ hours) to suggest a failure. Screenshot_2021-03-10_0822_ Moonfire_NVR

If you think you have identified what may be causing the problem and want to try pushing a change into a development version, I can clone such and build and run to test. Or I can patch my existing instance:

jlpoole@raspberrypi:/usr/local/src/moonfire-nvr $ git show
commit ed521521a411d97c40cee67dd9831237b2fef6a4 (HEAD -> master, origin/master, origin/HEAD)
Author: Scott Lamb <slamb@slamb.org>
Date:   Thu Feb 11 20:27:12 2021 -0800

    fix SQLite3 integrity check

diff --git a/server/db/check.rs b/server/db/check.rs
index e6b96f5..bbd814f 100644
--- a/server/db/check.rs
+++ b/server/db/check.rs
@@ -60,9 +60,15 @@ pub fn run(conn: &mut rusqlite::Connection, opts: &Options) -> Result<i32, Error
     let mut printed_error = false;

     info!("Checking SQLite database integrity...");
-    if let Err(e) = conn.execute("pragma check_integrity", params![]) {
-        error!("Database integrity error: {}", e);
-        printed_error = true;
+    {
+        let mut stmt = conn.prepare("pragma integrity_check")?;
+        let mut rows = stmt.query(params![])?;
+        while let Some(row) = rows.next()? {
+            let e: String = row.get(0)?;
+            if e == "ok" { continue; }
+            error!("{}", e);
+            printed_error = true;
+        }
     }
     info!("...done");

jlpoole@raspberrypi:/usr/local/src/moonfire-nvr $
scottlamb commented 3 years ago

Haven't figured this out yet, but if you run the latest version the error messages will be prettier. 🤷‍♂️ I'm going to look more today.

jlpoolen commented 3 years ago

Sorry, I'm slow on being mindful to pick-up your latest builds; thank you for the hint. To that end I performed the following:

  sudo pi
  cd /usr/local/src/moonfire-nvr
  git pull
  git status
  cd server
 cargo build --release
  exit

and I have launched a new session:

sudo moonfire-nvr
cd /usr/local/src/moonfire-nvr/server/target/release 
screen -t MoonfireShell
export START_TIME=`date +"%Y-%b-%d_%H_%M"`
export AV_LOG_FORCE_COLOR=1
export RUST_BACKTRACE=1

script --flush /tmp/moonfire-nvr_${START_TIME}.log
./moonfire-nvr run

[To leave screen (and script): Ctrl-d a]
scottlamb commented 3 years ago

No worries; I was just making fun of myself for fixing the cosmetic stuff before the bug. Those changes shouldn't be necessary to figure out what's going on. I'll let you know if I do need you to pick up logging changes for htat.

scottlamb commented 3 years ago

The problem has to be here:

https://github.com/scottlamb/moonfire-nvr/blob/ed521521a411d97c40cee67dd9831237b2fef6a4/server/db/writer.rs#L662

The comment above says we must restore the invariant on all exit paths, but I missed one. If the offset from the previous pts doesn't fit in a u32 (it jumps forward by 231 or more, or backward by more than 231), the invariant isn't restored.

scottlamb commented 3 years ago

e66a88a should fix this.