scottlamb / moonfire-nvr

Moonfire NVR, a security camera network video recorder
Other
1.22k stars 137 forks source link

support "monitor mode" / live view of unrecorded streams #120

Open IronOxidizer opened 3 years ago

IronOxidizer commented 3 years ago

Describe the bug Live view produces a black screen when record is disabled in config. This can be worked around by enabling record, but I would prefer that the data never touches the drives in consideration of drive longevity. No errors are reported on client and server, websocket is connected with response 101, and pings are sent regularly to keep the connection alive.

To Reproduce Steps to reproduce the behavior:

  1. enable sub stream in config
  2. disable record
  3. start server
  4. go to live view, black screen

Expected behavior Live stream working without recording. Resembles a "monitoring" mode in other NVRs

Server (please complete the following information):

Platform:

Additional context Seems to be originating from here as a result of the Receiver never yielding a LiveSegment

https://github.com/scottlamb/moonfire-nvr/blob/master/server/src/web.rs#L495

scottlamb commented 3 years ago

I'm going to call this enhancement rather than bug because this is an expected part of the "just record everything" design today.

This can be worked around by enabling record, but I would prefer that the data never touches the drives in consideration of drive longevity.

Do you have any evidence that lowering the write rate significantly improves the reliability of classic (PMR, perpendicular magnetic recording) disks? SMR ones (shingled magnetic recording) have a significantly lower rated workload, but the hardware guide already notes that and cautions against them. I think this rating is because they may have an onboard SSD that everything gets staged through, or if they have a PMR staging area, then they do a lot of extra seeking and rewriting.

My assumption to date has been that SMR drive failures don't happen due to write throughput (predominately or maybe at all). They're because of other things like manufacturing defect (defect in the platter or seal), catastrophic events (mishandling), bearings wearing out (sometimes they just don't spin up after a power outage), and maybe to a lesser extent high seek rate (head movement). I just skimmed the wikipedia article on hard drive failures and don't see any mention of writes accelerating failure.

Starting from this assumption, I decided to just record everything, and that shows up several times in the current design:

I've broadened my use cases a bit though since originally designing this and would like to eventually support monitor mode. A couple reasons for the change of heart:

Anyway, off-hand to support monitor mode, these changes need to happen:

An alternative design would be to write the sample files to tmpfs and perhaps change less of the code. Never flush those recordings to the SQLite database either; keep them pending until they're dropped. But that has its own compromises; it'd either use a lot of RAM or would need shorter recordings which might cause some complications like "anchoring" the stream's wall time sooner (thus having the time further from correct). (or maybe they'd never be anchored. maybe live view doesn't really need anchoring anyway.)

I don't think this refactoring is in my idea of a "minimum viable product" feature set, so I don't plan to work on it for a while. If you want to work on it, I'd happily help you through it. It's a significant project though so it might be wise to tackle a couple small bugs or something to acclimate more to the codebase before getting into this.

IronOxidizer commented 3 years ago

Do you have any evidence that lowering the write rate significantly improves the reliability of classic (PMR, perpendicular magnetic recording) disks? SMR ones (shingled magnetic recording) have a significantly lower rated workload, but the hardware guide already notes that and cautions against them. I think this rating is because they may have an onboard SSD that everything gets staged through, or if they have a PMR staging area, then they do a lot of extra seeking and rewriting.

I didn't have any evidence, mostly went on the notion that some drives are rated to a maximum amount of writes per day. Did some further research and found this paper from micron comparing the endurance of SSD vs HDD.

https://www.micron.com/-/media/client/global/documents/products/white-paper/5210_ssd_vs_hdd_endurance_white_paper.pdf

They don't specifically say that HDDs have a rated DRWPD, but they do use DRWPD as a measure of hard drive endurance.

I don't think this refactoring is in my idea of a "minimum viable product" feature set, so I don't plan to work on it for a while. If you want to work on it, I'd happily help you through it. It's a significant project though so it might be wise to tackle a couple small bugs or something to acclimate more to the codebase before getting into this.

Seems like quite the large undertaking, might revisit this in the future. For now I'll see if I can help with things more relevant to the 1.0 release.

scottlamb commented 3 years ago

Interesting paper. They say "as their capacities increased, some HDD designs began to adopt a 'workload limit' rating as part of their standard specifications. " I think the HDD designs in question are all SMR, but I'm not certain. Another reason SMR drives may have a workload rating (in addition to what I said above) is that I think they need some slack spindle time for rebalancing stuff behind the scenes. They don't need it at any particular instant, but it needs to happen sometime, thus the talk of workload over long timescales rather than being limited to x MB/s.

scottlamb commented 3 years ago

fwiw, I've been running Moonfire NVR on a WD Purple for five years, a WD Green for five years, and a WD Purple for two years, without problems. I also tried it on a couple Seagate SMR drives, less successfully. In one case, I exceeded the rated workload, and the drive just couldn't keep up (writes just took so long that the rtsp connections would time out), so I gave up on it within days. In another case, I ran it for a couple 3 years, keeping it under the rated workload, but the drive failed twice (once under warranty, once not) by rapid buildup of bad sectors. I recently replaced it with a cheap refurbished WP Arsenal 14TB PMR drive. Small sample size, but the continuous recording with Moonfire NVR doesn't seem too harmful to PMR drives. It might kill SMR drives, or maybe they just suck no matter how you use them.

scottlamb commented 3 years ago

I might implement this while adding audio support (#34). I'll likely store a GOP of video frames in a row, then the audio that happened while they were received, and repeat. That will require building them up in a buffer and being able to continue doing live view from that in-memory buffer. That's essentially the same work I described doing for monitor mode in #120.

scottlamb commented 2 years ago

I'm dropping the "schema change" label for this. That part was done with schema version 7. The stream config is now a more flexible JSON object with a string mode (rather than a boolean record). We can claim the mode monitor for this feature.