scottlamb / moonfire-nvr

Moonfire NVR, a security camera network video recorder
Other
1.22k stars 137 forks source link

repeated failures with Reolink cameras #144

Closed jlpoolen closed 3 years ago

jlpoolen commented 3 years ago

Describe the bug 6 Reolink cameras, 3 models, system on Docker image, within a 2 minute session: 2824 "panics" in the log. [edit by scottlamb: the logs refer to panic handler stack frames but the system didn't actually panic.] Recordings are subject to gaps in time, some for 9 minutes of more. The system looks to be working well, but I started to notice some major gaps, e.g. several minutes in the recording sequences.

Running moonfire-nvr in docker. Can start and stop and through web interface all seems well, works for the most part. However there are gaps in recordings and the log shows panics.

To Reproduce Steps to reproduce the behavior:

See below "Consoles" See log attached and end of this entry

Note: nvr2 differs by having additional line for LVM mount

Expected behavior no "panics" and no gaps of several minutes in recordings

Bug policy https://github.com/scottlamb/moonfire-nvr/blob/master/guide/troubleshooting.md#panic-errors states: Errors like the one below indicate a serious bug in Moonfire NVR. Please file a bug if you see one.

(Should the word "panic" be the trigger to file a bug, or when "panic" occurs from writer.rs?)

Screenshots Here is a screenshot showing a raspberry pi console streaming "date" results overlaying a live stream via the Moonfire-nvr Live mode displaying the camera's white date time stamp.
2021-08-14_09-47-35

Server (please complete the following information): Raspberry Pi 4 8GB Complete Starter Kit - 32GB Storage (purchased May/June 2020) Western Digital 12TB WD Purple Surveillance Internal Hard Drive HDD - 7200 RPM, SATA 6 Gb/s, 256 MB Cache, 3.5" - WD121PURZ [Independent Control Switch] SISUN Aluminum USB 3.0 5 Bay 3.5 inch SATA Hard Drive Enclosure Support 5 x 12TB Drive + 2X USB3.0 HUB (USB 3.0/5 Bay Hard Drive Enclosure-Black)

Rebuilt (on same SD HC) August, 2021; crashed Winter of 2021 - defective SD HC??

 jlpoole@raspberrypi:~ $ uname -a
 Linux raspberrypi 5.10.52-v7l+ #1441 SMP Tue Aug 3 18:11:56 BST 2021 armv7l GNU/Linux
 jlpoole@raspberrypi:~ $

See Log at end of this entry.

I'm not sure I needed to set RUST_BACKTRACE when launching from Docker as it looks like the Docker log has backtraces. Nevertheless, I did as requested:

CONSOLE 1:
jlpoole@raspberrypi:~ $ echo $RUST_BACKTRACE
1
jlpoole@raspberrypi:~ $ date; sudo nvr2 start
Sat 14 Aug 2021 11:00:56 AM PDT
moonfire-nvr
jlpoole@raspberrypi:~ $ date; sudo nvr2 stop
Sat 14 Aug 2021 11:01:39 AM PDT
moonfire-nvr
jlpoole@raspberrypi:~ $

CONSOLE 2:
jlpoole@raspberrypi:/usr/local/src/moonfire-nvr $ date;sudo docker logs --follow --timestamps --since=10s moonfire-nvr >/tmp/session.log 2>&1; date;
Sat 14 Aug 2021 11:00:58 AM PDT
Sat 14 Aug 2021 11:01:41 AM PDT
jlpoole@raspberrypi:/usr/local/src/moonfire-nvr $

Camera (please complete the following information): I have six cameras, 2 of three different models:

1) Device Name: GarageWest Model: Reolink RLC-420 Build No.: build 17112700 Hardware No.: IPC_3816M Configuration Version: v2.0.0.0 Firmware Version: v2.0.0.889_17112700 Details: IPC_3816M110000000000000 Client Version: v1.0.227 2) Device Name: GarageEast Model: RLC-420 Build No.: build 17112700 Hardware No.: IPC_3816M Configuration Version: v2.0.0.0 Firmware Version: v2.0.0.889_17112700 Details: IPC_3816M110000000000000 Client Version: v1.0.227

3) Device Name: Peck Alley West Model: Reolink RLC-420-5MP Build No.: build 19013001 Hardware No.: IPC_51516M5M Configuration Version: v2.0.0.0 Firmware Version: v2.0.0.354_19013001 Details: IPC_51516M5M110000000100000 Client Version: v1.0.239

4) Device Name: Peck Alley East Model: Reolink RLC-420-5MP Build No.: build 19013001 Hardware No.: IPC_51516M5M Configuration Version: v2.0.0.0 Firmware Version: v2.0.0.354_19013001 Details: IPC_51516M5M110000000100000 Client Version: v1.0.239

5) Camera Name: PeckPear Model: Reolink RLC-410-5MP UID: 95270002JS5EWXMZ Build No.build: 20121100 Hardware No.: IPC_515B16M5M Config Version: v3.0.0.0 Firmware Version: v3.0.0.136_20121100 Details: IPC_515B16M5MS10E1W01100000001

6) Camera Name: Maple1 Model: Reolink RLC-410-5MP UID: 95270002JS5IEVPW Build No.: build 20121100 Hardware No.: IPC_515B16M5M Config Version: v3.0.0.0 Firmware Version: v3.0.0.136_20121100 Details: IPC_515B16M5MS10E1W01100000001

Desktop (please complete the following information): NA

Smartphone (please complete the following information): NA

Additional context I've been seeing cameras go offline for several minutes, e.g. 9 minutes or more

Below may be helpful if time synchronization is involved:

I have all the reolink cameras using the time sync feature pointing to time-b.nist.gov. 2021-08-14_10-04-35

Log File (numbered columns added) session_numbered.zip

For others using Docker: I accessed the log using this command:

 sudo docker logs --follow --timestamps --since=1s moonfire-nvr

You want to use the "--since=[# of seconds back]s", otherwise you may get a dump of a very long log. Consider the "--since" flag as a "tail" type truncator.

scottlamb commented 3 years ago

Errors like this one?

``` W20210814 11:00:57.434 s-GarageWest-main moonfire_nvr::streamer] GarageWest-main: sleeping for Duration { secs: 1, nanos: 0 } after error: Expected ssrc=Some(21cd90d7) seq=Some(8f4a) got ssrc=6d87df1a seq=88b9 ts=1866533263 (mod-2^32: 1866533263), npt 0.000 at [192.168.1.88:33514(me)->192.168.1.48:554@2021-08-14T11:00:57 pos=1510@2021-08-14T11:00:57] Backtrace: 0: backtrace::backtrace::libunwind::trace at .cargo/registry/src/github.com-1ecc6299db9ec823/backtrace-0.3.59/src/backtrace/libunwind.rs:90:5 backtrace::backtrace::trace_unsynchronized at .cargo/registry/src/github.com-1ecc6299db9ec823/backtrace-0.3.59/src/backtrace/mod.rs:66:5 backtrace::backtrace::trace at .cargo/registry/src/github.com-1ecc6299db9ec823/backtrace-0.3.59/src/backtrace/mod.rs:53:14 1: backtrace::capture::Backtrace::create at .cargo/registry/src/github.com-1ecc6299db9ec823/backtrace-0.3.59/src/capture.rs:176:9 backtrace::capture::Backtrace::new_unresolved at .cargo/registry/src/github.com-1ecc6299db9ec823/backtrace-0.3.59/src/capture.rs:170:9 2: failure::backtrace::internal::InternalBacktrace::new at .cargo/registry/src/github.com-1ecc6299db9ec823/failure-0.1.8/src/backtrace/internal.rs:46:44 3: failure::backtrace::Backtrace::new at .cargo/registry/src/github.com-1ecc6299db9ec823/failure-0.1.8/src/backtrace/mod.rs:121:35 ::default at .cargo/registry/src/github.com-1ecc6299db9ec823/failure-0.1.8/src/backtrace/mod.rs:140:13 4: >::from at .cargo/registry/src/github.com-1ecc6299db9ec823/failure-0.1.8/src/error/error_impl.rs:19:17 >::from at .cargo/registry/src/github.com-1ecc6299db9ec823/failure-0.1.8/src/error/mod.rs:36:18 failure::error_message::err_msg at .cargo/registry/src/github.com-1ecc6299db9ec823/failure-0.1.8/src/error_message.rs:12:5 retina::client::rtp::StrictSequenceChecker::rtp at .cargo/registry/src/github.com-1ecc6299db9ec823/retina-0.0.4/src/client/rtp.rs:128:13 5: retina::client::Session::handle_data at .cargo/registry/src/github.com-1ecc6299db9ec823/retina-0.0.4/src/client/mod.rs:728:41 as futures_core::stream::Stream>::poll_next at .cargo/registry/src/github.com-1ecc6299db9ec823/retina-0.0.4/src/client/mod.rs:760:31 6: ::poll_next at .cargo/registry/src/github.com-1ecc6299db9ec823/retina-0.0.4/src/client/mod.rs:824:55 7: as futures_core::stream::Stream>::poll_next at .cargo/registry/src/github.com-1ecc6299db9ec823/futures-core-0.3.15/src/stream.rs:120:9 futures_util::stream::stream::StreamExt::poll_next_unpin at .cargo/registry/src/github.com-1ecc6299db9ec823/futures-util-0.3.15/src/stream/stream/mod.rs:1416:9 as core::future::future::Future>::poll at .cargo/registry/src/github.com-1ecc6299db9ec823/futures-util-0.3.15/src/stream/stream/next.rs:32:9 moonfire_nvr::stream::RetinaOpener::play::{{closure}} at src/server/src/stream.rs:394:51 8: as core::future::future::Future>::poll at /rustc/53cb7b09b00cbea8754ffb78e7e3cb521cb8af4b/library/core/src/future/mod.rs:80:19 as core::future::future::Future>::poll at .cargo/registry/src/github.com-1ecc6299db9ec823/tokio-1.6.0/src/time/timeout.rs:149:33 ::open::{{closure}} at src/server/src/stream.rs:304:21 as core::future::future::Future>::poll at /rustc/53cb7b09b00cbea8754ffb78e7e3cb521cb8af4b/library/core/src/future/mod.rs:80:19 9: tokio::runtime::task::core::CoreStage::poll at .cargo/registry/src/github.com-1ecc6299db9ec823/tokio-1.6.0/src/runtime/task/core.rs:225:13 tokio::runtime::task::harness::poll_future::{{closure}} at .cargo/registry/src/github.com-1ecc6299db9ec823/tokio-1.6.0/src/runtime/task/harness.rs:422:23 as core::ops::function::FnOnce<()>>::call_once at /rustc/53cb7b09b00cbea8754ffb78e7e3cb521cb8af4b/library/std/src/panic.rs:344:9 std::panicking::try::do_call at /rustc/53cb7b09b00cbea8754ffb78e7e3cb521cb8af4b/library/std/src/panicking.rs:379:40 std::panicking::try at /rustc/53cb7b09b00cbea8754ffb78e7e3cb521cb8af4b/library/std/src/panicking.rs:343:19 std::panic::catch_unwind at /rustc/53cb7b09b00cbea8754ffb78e7e3cb521cb8af4b/library/std/src/panic.rs:431:14 tokio::runtime::task::harness::poll_future at .cargo/registry/src/github.com-1ecc6299db9ec823/tokio-1.6.0/src/runtime/task/harness.rs:409:19 10: tokio::runtime::task::harness::Harness::poll_inner at .cargo/registry/src/github.com-1ecc6299db9ec823/tokio-1.6.0/src/runtime/task/harness.rs:89:9 tokio::runtime::task::harness::Harness::poll at .cargo/registry/src/github.com-1ecc6299db9ec823/tokio-1.6.0/src/runtime/task/harness.rs:59:15 11: tokio::runtime::task::raw::RawTask::poll at .cargo/registry/src/github.com-1ecc6299db9ec823/tokio-1.6.0/src/runtime/task/raw.rs:66:18 tokio::runtime::task::Notified::run at .cargo/registry/src/github.com-1ecc6299db9ec823/tokio-1.6.0/src/runtime/task/mod.rs:171:9 tokio::runtime::thread_pool::worker::Context::run_task::{{closure}} at .cargo/registry/src/github.com-1ecc6299db9ec823/tokio-1.6.0/src/runtime/thread_pool/worker.rs:349:13 tokio::coop::with_budget::{{closure}} at .cargo/registry/src/github.com-1ecc6299db9ec823/tokio-1.6.0/src/coop.rs:106:9 std::thread::local::LocalKey::try_with at /rustc/53cb7b09b00cbea8754ffb78e7e3cb521cb8af4b/library/std/src/thread/local.rs:376:16 std::thread::local::LocalKey::with at /rustc/53cb7b09b00cbea8754ffb78e7e3cb521cb8af4b/library/std/src/thread/local.rs:352:9 12: tokio::coop::with_budget at .cargo/registry/src/github.com-1ecc6299db9ec823/tokio-1.6.0/src/coop.rs:99:5 tokio::coop::budget at .cargo/registry/src/github.com-1ecc6299db9ec823/tokio-1.6.0/src/coop.rs:76:5 tokio::runtime::thread_pool::worker::Context::run_task at .cargo/registry/src/github.com-1ecc6299db9ec823/tokio-1.6.0/src/runtime/thread_pool/worker.rs:348:9 13: tokio::runtime::thread_pool::worker::Context::run 14: tokio::runtime::thread_pool::worker::run::{{closure}} at .cargo/registry/src/github.com-1ecc6299db9ec823/tokio-1.6.0/src/runtime/thread_pool/worker.rs:303:17 tokio::macros::scoped_tls::ScopedKey::set at .cargo/registry/src/github.com-1ecc6299db9ec823/tokio-1.6.0/src/macros/scoped_tls.rs:61:9 15: tokio::runtime::thread_pool::worker::run at .cargo/registry/src/github.com-1ecc6299db9ec823/tokio-1.6.0/src/runtime/thread_pool/worker.rs:300:5 16: tokio::runtime::thread_pool::worker::Launch::launch::{{closure}} at .cargo/registry/src/github.com-1ecc6299db9ec823/tokio-1.6.0/src/runtime/thread_pool/worker.rs:279:45 as core::future::future::Future>::poll at .cargo/registry/src/github.com-1ecc6299db9ec823/tokio-1.6.0/src/runtime/blocking/task.rs:42:21 tokio::runtime::task::core::CoreStage::poll::{{closure}} at .cargo/registry/src/github.com-1ecc6299db9ec823/tokio-1.6.0/src/runtime/task/core.rs:235:17 tokio::loom::std::unsafe_cell::UnsafeCell::with_mut at .cargo/registry/src/github.com-1ecc6299db9ec823/tokio-1.6.0/src/loom/std/unsafe_cell.rs:14:9 17: tokio::runtime::task::core::CoreStage::poll at .cargo/registry/src/github.com-1ecc6299db9ec823/tokio-1.6.0/src/runtime/task/core.rs:225:13 tokio::runtime::task::harness::poll_future::{{closure}} at .cargo/registry/src/github.com-1ecc6299db9ec823/tokio-1.6.0/src/runtime/task/harness.rs:422:23 as core::ops::function::FnOnce<()>>::call_once at /rustc/53cb7b09b00cbea8754ffb78e7e3cb521cb8af4b/library/std/src/panic.rs:344:9 std::panicking::try::do_call at /rustc/53cb7b09b00cbea8754ffb78e7e3cb521cb8af4b/library/std/src/panicking.rs:379:40 std::panicking::try at /rustc/53cb7b09b00cbea8754ffb78e7e3cb521cb8af4b/library/std/src/panicking.rs:343:19 18: std::panic::catch_unwind at /rustc/53cb7b09b00cbea8754ffb78e7e3cb521cb8af4b/library/std/src/panic.rs:431:14 tokio::runtime::task::harness::poll_future at .cargo/registry/src/github.com-1ecc6299db9ec823/tokio-1.6.0/src/runtime/task/harness.rs:409:19 tokio::runtime::task::harness::Harness::poll_inner at .cargo/registry/src/github.com-1ecc6299db9ec823/tokio-1.6.0/src/runtime/task/harness.rs:89:9 tokio::runtime::task::harness::Harness::poll at .cargo/registry/src/github.com-1ecc6299db9ec823/tokio-1.6.0/src/runtime/task/harness.rs:59:15 19: tokio::runtime::task::raw::RawTask::poll at .cargo/registry/src/github.com-1ecc6299db9ec823/tokio-1.6.0/src/runtime/task/raw.rs:66:18 tokio::runtime::task::Notified::run at .cargo/registry/src/github.com-1ecc6299db9ec823/tokio-1.6.0/src/runtime/task/mod.rs:171:9 tokio::runtime::blocking::pool::Inner::run at .cargo/registry/src/github.com-1ecc6299db9ec823/tokio-1.6.0/src/runtime/blocking/pool.rs:278:17 20: tokio::runtime::blocking::pool::Spawner::spawn_thread::{{closure}} at .cargo/registry/src/github.com-1ecc6299db9ec823/tokio-1.6.0/src/runtime/blocking/pool.rs:258:17 std::sys_common::backtrace::__rust_begin_short_backtrace at /rustc/53cb7b09b00cbea8754ffb78e7e3cb521cb8af4b/library/std/src/sys_common/backtrace.rs:125:18 21: std::thread::Builder::spawn_unchecked::{{closure}}::{{closure}} at /rustc/53cb7b09b00cbea8754ffb78e7e3cb521cb8af4b/library/std/src/thread/mod.rs:481:17 as core::ops::function::FnOnce<()>>::call_once at /rustc/53cb7b09b00cbea8754ffb78e7e3cb521cb8af4b/library/std/src/panic.rs:344:9 std::panicking::try::do_call at /rustc/53cb7b09b00cbea8754ffb78e7e3cb521cb8af4b/library/std/src/panicking.rs:379:40 std::panicking::try at /rustc/53cb7b09b00cbea8754ffb78e7e3cb521cb8af4b/library/std/src/panicking.rs:343:19 std::panic::catch_unwind at /rustc/53cb7b09b00cbea8754ffb78e7e3cb521cb8af4b/library/std/src/panic.rs:431:14 std::thread::Builder::spawn_unchecked::{{closure}} at /rustc/53cb7b09b00cbea8754ffb78e7e3cb521cb8af4b/library/std/src/thread/mod.rs:480:30 core::ops::function::FnOnce::call_once{{vtable.shim}} at /rustc/53cb7b09b00cbea8754ffb78e7e3cb521cb8af4b/library/core/src/ops/function.rs:227:5 22: as core::ops::function::FnOnce>::call_once at /rustc/53cb7b09b00cbea8754ffb78e7e3cb521cb8af4b/library/alloc/src/boxed.rs:1546:9 as core::ops::function::FnOnce>::call_once at /rustc/53cb7b09b00cbea8754ffb78e7e3cb521cb8af4b/library/alloc/src/boxed.rs:1546:9 std::sys::unix::thread::Thread::new::thread_start at /rustc/53cb7b09b00cbea8754ffb78e7e3cb521cb8af4b/library/std/src/sys/unix/thread.rs:71:17 ```

That's not actually a panic, despite it using the word in the middle. tokio has some code for catching panics and handling them relatively gracefully (keeping the thread running) and you're seeing stack frames relating to that.

I think this log message is complaining about the thing I mentioned in this Apr 29th comment. I think at the Retina layer I can add a feature that just ignores these (except maybe logging a packet/byte count every minute or something).

jlpoolen commented 3 years ago

Yes. I apologize, here's what I faced. I have web UI rows which are showing intermittent files for a camera, sometime several hours in succession are sequentially recorded, then there is a gap. I looked at your troubleshooting readme and when I watched the logs go scrolling by, I immediately focused on creating a log file for the first 2 minutes. After doing that, I searched in it for the word "panic" and when the count returned with over 2,000 found, I just assumed something is horribly wrong and I'll let you look. I had forgotten at the problem you described that Reolink cameras have.

What I'll do now is generate a new log file and try to whittle down the entries that has documented gaps in recordings. I'll try to craft a Perl regular expression to cull the entries that have to do with sleep so we can see if there is also something else. The backtrace on each error makes filtering the log a challenge. (I'm hoping I can halt the back tracing for an initial run by setting to "0" or removing it, but I suspect in this Docker instance, it's wired to be active.)

I'll see if I can assess what a log run has statistically: are they all sleeps?, and then report back here.

Thank you.

scottlamb commented 3 years ago

If you feel like playing with a new tool, this sort of analysis is what lnav is good for.

jlpoolen commented 3 years ago

1232 "Opening output" entries for 3 cameras out of six in 24 minutes.

I ran moonfire-nvr for 24 minutes (no RUST_BACKTRACE). I filtered the log to show only those lines with "Opening output".

ran for 24 minutes 1232 "Opening Input"
start: 2021-08-15T02:20:20.390944000Z I20210814 19:20:20.390
end:  2021-08-15T02:44:26.078480000Z I20210814 19:44:26.077

jlpoole@raspberrypi:~ $ cat /tmp/session_stderr.log |grep "Opening input" |wc
   1232   11088  215423
jlpoole@raspberrypi:~ $

over 3 cameras: GarageEast, GarageWest & PeckAlleyEast.
The "Garage" cameras are the oldest.  The PeckAlley are about 2 years old.

session_OpeningInput_only.zip

I can provide the log with the additional entry after an "Opening input" that has what was received; zipped up that log is 4.8 MBs.

Question: what can I provide you to give you the ability to assess the problem more thoroughly. I can redo a run, use RUST_BACKTRACE, or whatever modifications to the run time environment in docker you want.

The "Garage" may exhibit slower network response time given that they are connected via a Ubiquiti aerial connection. But I do not recall in earlier version that this issue manifested itself to such a degree as it does now. The PeckAlley are on a direct connection.

I'm going to follow-up on your suggestion for replacing the SD HC with a USB driven drive, I want to read your posted recommendation more closely. I'm also seeing my 5-disk array is no longer available, the company seems to have disappeared, and there are some bad review, so the USB circuitry for that array may be a contributing factor to problems. For people reading this, the lesson learned is that video is a very high demand on your hardware so cutting corners and pushing the envelope may end up costing more in terms of everyone's time than dollars saved.

Note: I did fine the PeckAlleyEast running at 100% CPU Loading using the Reolink admin interface, so I rebooted it.

scottlamb commented 3 years ago

Question: what can I provide you to give you the ability to assess the problem more thoroughly.

I think your earlier log has everything I need for now. All of the errors in there look consistent with something that scottlamb/retina#17 should improve significantly. You'll likely still have occasional disconnects—that seems inevitable with a flaky wifi connection—but I think it's going into a spiral after a disconnect now where after that change it will recover properly.

I also (finally) ordered my own Reolink camera so I can keep experimenting with it. It should arrive tomorrow.

The "Garage" may exhibit slower network response time given that they are connected via a Ubiquiti aerial connection. But I do not recall in earlier version that this issue manifested itself to such a degree as it does now.

The difference is probably the RTSP library change then. You can go back to the old behavior by adding --rtsp-library=ffmpeg to the docker run command in /usr/local/bin/nvr, then running nvr stop; nvr rm; nvr run to make it take effect.

I'm also seeing my 5-disk array is no longer available, the company seems to have disappeared, and there are some bad review, so the USB circuitry for that array may be a contributing factor to problems.

That's too bad. :-( 2021 is a tough year for ordering hardware too. I think the ODROID-H2+ is an excellent compromise between reliability and performance vs price, power, and space but it's just not available now. When the chip situation gets back to normal, there will likely be some other nice SBCs coming out too, but not much new is being released now and stocks are low / prices are up for stuff that already was released.

Some of these problems can be worked around. Eg, I put in the docs several places a recommendation to disable UAS with USB SATA adapters because of problems with my own hardware. Maybe that would help with yours too.

scottlamb commented 3 years ago

Okay, I believe if you build from git HEAD (and undo the --rtsp-library=ffmpeg if you did that), it will work much better. The Unexpected RTSP message Data(...) and Expected ssrc=Some(...) seq=Some(...) got ssrc=... seq=... causes of connection failure should be eliminated. Can you give it a try and let me know how it goes? If it's a significant improvement, I'll prepare a release soon after.

jlpoolen commented 3 years ago

I'll try this evening. I'm on a new image on Raspberry. I successfully built moonfire-nvr, but ran into an error in the npm ui, but it may be a harmless error. Haven't fired up moonfire-nvr in the new environment, yet, I still have to configure and mount my purple drive. I did not try reverting to the ffmpeg in the former environment (on the SD HC card).

jlpoolen commented 3 years ago

While I flopped around trying to get the web UI working, I was starting and stopping nvr many times. I noticed that when I first started up, everything in the log looked clean. I then Control-C'd to stop. And then I started up within a minute or two. On the 2nd run, it looked like every camera was out of sync. I halted nvr. Waited several minutes, and then when I started up for a third time, things looked good. I did this kind of start/stop with interspersed waits and concluded that one has to wait several minutes before restarting to assure a clean start. I suspect this is because the cameras are still sending packets from the prior closed sessions. I'll let it run through the night and update tomorrow.

scottlamb commented 3 years ago

Hmm, I'd hoped that cooling-off period would be no longer necessary.

The brand-new Reolink camera I just bought—unlike the one I borrowed from a neighbor a while ago—doesn't display exactly these symptoms. So I haven't directly tested it on a problematic camera myself.

jlpoolen commented 3 years ago

Aug_19_clean_start_then_failures.zip Ouch. No sooner was I getting ready to retire when all of a sudden yellow went blazing across the screen. Here's the log file. I stopped the server and will resume tomorrow morning.

scottlamb commented 3 years ago

Shoot. Thanks for trying it out, and sorry it's still not working well.

W20210819 21:57:47.115 s-PeckPear-main moonfire_nvr::streamer] PeckPear-main: sleeping for Duration { secs: 1, nanos: 0 } after error: [192.168.1.88:51560(me)->192.168.1.54:554@2021-08-19T21:52:15, 115975788@2021-08-19T21:57:47] RTSP framing error: Invalid RTSP message; next 128 of 651 buffered bytes are:
0000:   b9 e1 69 a7  3a d0 b9 d4  e6 e5 f3 d5  33 10 54 6c   ..i.:.......3.Tl
0010:   ae fd 33 83  3d 7f 99 74  c8 2b eb ec  5d 80 c3 d2   ..3.=..t.+..]...
0020:   7d 7b 0a 7d  1f 18 a5 30  9f 89 5e 95  0f e7 13 37   }{.}...0..^....7
0030:   42 e1 be c3  63 6b 3d 45  c4 bf 73 85  33 ce 10 d1   B...ck=E..s.3...
0040:   f0 1b 5f 5d  73 72 f1 b3  d7 8d 20 f8  ea e1 d0 d4   .._]sr.... .....
0050:   24 6d 99 00  8e df e9 36  b2 ca 6f 8d  92 af 19 ce   $m.....6..o.....
0060:   2c 13 be 84  4d 0c bf 7c  4d 50 56 53  a7 c4 8c c7   ,...M..|MPVS....
0070:   02 d7 b0 d2  c7 a1 2a 69  66 1b 90 5a  b3 81 ca 1b   ......*if..Z....

Not sure yet if this is a bug in the client (Moonfire/Retina) or server (Reolink). I'll take a look at it. I might need a packet capture or similar to figure this out. The log shows the following bytes (which indeed aren't the start of a correct RTSP message) but I might need the preceding ones also. Maybe Retina erroneously dropped half a message or something.

scottlamb commented 3 years ago

Looks like if Reolink is sending malformed RTSP messages, they wouldn't be the first. There's a gstreamer issue at gst-plugins-base#653 to work around a problem like that. They didn't say what brand; my best guess is Uniview from looking at the URLs in the bug reporter's packet capture.

scottlamb commented 3 years ago

As this is now understood as an RTSP-level interop problem with these buggy cameras, let's continue discussion at https://github.com/scottlamb/retina/issues/17 as necessary. Current best solution is to use the new --rtsp-transport=udp argument.