scottlamb / moonfire-nvr

Moonfire NVR, a security camera network video recorder
Other
1.22k stars 137 forks source link

shutdown should end retry loops #117

Closed scottlamb closed 2 years ago

scottlamb commented 3 years ago

As discussed in #84 (see this comment), Moonfire NVR doesn't shut down properly on SIGTERM or SIGINT (as when pressing ctrl-C) if it's retrying writes to recording files. This can happen when out of space or due to filesystem corruption. The symptom is that you see a bunch of log lines like this:

W20210401 11:21:02.362 s-driveway-main moonfire_base::clock] sleeping for Duration { secs: 1, nanos: 0 } after error: No space left on device (os error 28)

(set environment variable RUST_BACKTRACE=1 to see backtraces)
W20210401 11:21:03.362 s-driveway-main moonfire_base::clock] sleeping for Duration { secs: 1, nanos: 0 } after error: No space left on device (os error 28)

(set environment variable RUST_BACKTRACE=1 to see backtraces)
W20210401 11:21:04.363 s-driveway-main moonfire_base::clock] sleeping for Duration { secs: 1, nanos: 0 } after error: No space left on device (os error 28)

(set environment variable RUST_BACKTRACE=1 to see backtraces)
W20210401 11:21:05.363 s-driveway-main moonfire_base::clock] sleeping for Duration { secs: 1, nanos: 0 } after error: No space left on device (os error 28)

(set environment variable RUST_BACKTRACE=1 to see backtraces)
W20210401 11:21:06.364 s-driveway-main moonfire_base::clock] sleeping for Duration { secs: 1, nanos: 0 } after error: No space left on device (os error 28)
...

so you tell it to shut down. It logs this:

I20210401 11:21:07.542 main moonfire_nvr::cmds::run] Shutting down streamers.

but never actually shuts down, and keeps logging the sleeping message. Sending another SIGINT or SIGTERM does nothing.

The problem is the streamer threads and sample file syncer threads get into a "retry forever" loop, and they really mean forever. Instead, they should give up after shut down is requested.

Workaround: send it a SIGQUIT (ctrl-\ if Moonfire NVR is running from a terminal) or SIGKILL. These will kill it immediately.

scottlamb commented 3 years ago

It'd also be nice if it spammed the logs less during the retry loop. It's pretty verbose now, especially if you have backtraces enabled. Maybe only log once per minute instead of on every single attempt.