scottlamb / moonfire-nvr

Moonfire NVR, a security camera network video recorder
Other
1.24k stars 137 forks source link

Capacity Problems & Ehancement Requests #84

Closed jlpoolen closed 3 years ago

jlpoolen commented 4 years ago

Running on a Raspberry Pi 4. I have only one camera designated and I allocated 2 GB of space for saving recordings at /var/tmp/video/garage_west. My camera is at maximum resolution (2560 x 1440) and frames per second (30). Yes, this is an unreasonable capacity setting given the frame size and maximum number of frames per second. But I'd like to learn now, rather than later, how a "full" condition is handled.

The problems encountered is that the list of videos in the Web interface are empty and have the infinity symbol.

chrome_2020-06-08_11-49-59

I would expect that older recordings would be removed to allow room for the new recordings.

Also, if I select one of the early recordings, when there was sufficient disk space, my attempt to view it is prevented after displaying the first second.

2020-06-08_11-50-56

Perhaps the disk allocation to the camera also serves as a working area for the MP4 creation and with the working area at capacity, the MP4 stream halts?

Also, when there is a filled up database, what can a user do to remove files and/or entries? How about a server command that the administrator would run to "flush" older content with settings such as percentages to minimize loss of older recordings?

I had a power outage yesterday and the Raspberry Pi was not on a protected circuit or have shutdown protocols established (yet). I found my database corrupted, tried to export my camera and disk allocation using Sqlitebrowser. I ended up creating a new database and then using the Sqlitebrowser (while no moonriver-nvr process was running, of course) tried to import my settings. I currently have 4 cameras and expect to end up with 10 or 12. It would be helpful to have some way to preserves,export, and import entries so that I do not have to manually enter everything all over again.

scottlamb commented 4 years ago

The problems encountered is that the list of videos in the Web interface are empty and have the infinity symbol.

I think this means that you have a recording with exactly one video frame. RTSP streams don't give the duration of each video frame; they just give its starting timestamp. So Moonfire NVR calculates the duration of each frame using the time of the following frame and assumes 0 for the final frame. You should see a total recording duration of 0 if and only if there's only one video frame in the whole recording. Duration of 0 would cause it to show frames per second of infinity.

This much is working as intended.

I would expect that older recordings would be removed to allow room for the new recordings. ... Perhaps the disk allocation to the camera also serves as a working area for the MP4 creation and with the working area at capacity, the MP4 stream halts?

That's right. Segments that are currently in progress aren't accounted for. guide/install.md says "Leave a little slack" for this reason. I think the guidance of "100 MB per camera" is incorrect in your case. I see 18.2 Mbps in one entry in your screenshot. The first recording in a run might be up to 2 minutes long; that'd be 273 MB at that rate. There should be at least that much slack.

Actually, I can't remember off the top of my head, but segments that are fully written to disk but not flushed to the SQLite database might also not be accounted for. So higher flush_if_sec might further increase the requirements for slack space.

Currently the code will stubbornly try to write each video frame (to the OS's buffer) right after it's received it, on error retrying once per second forever. It doesn't do anything like trying to free more old recordings if it seems full. So yeah, it's pretty much expected for it to get stuck until you free up some space.

The docs at least can be improved.

As for code changes...I could modify it to account for the segments being written. The catch is that I don't want to force frequent database flushes (increasing load on the SSD), so say freeing up exactly as much space as needed for the next video frame doesn't seem like the right approach. It'd require some thought, which is why I've punted on it so far.

One idea: it can maintain an estimate of how much slack space is required for what's in flight. After a write hits ENOSPC it could abort the current segment, wait some cooling-off period, increase that estimate, ensure there's enough space, and try again. Although it might go through this whole cooling off period a couple times on every restart, which would be annoying. Every idea I've had so far has had some caveat to it.

I'd be interested to know if there's some other NVR system that solves this better and how.

Also, if I select one of the early recordings, when there was sufficient disk space, my attempt to view it is prevented after displaying the first second.

This probably happened because the recording was deleted as you were viewing it. You could confirm this by restarting the daemon with the --read-only argument. Then everything that's in the list should stay viewable as long as you're running in this mode.

This is a UI problem...as I've mentioned the current UI is really a prototype. I'd like to have a slick scrub bar UI (#32). I'd expect the scrub bar to shrink and your viewing position jump if video is deleted as you are viewing it. Right now if one of the recordings backing a .mp4 file is deleted, your next HTTP request on that .mp4 file will fail. Or even your current HTTP request can be abruptly aborted if it reaches a byte position that requires a disk file that no longer exists.

Also, when there is a filled up database, what can a user do to remove files and/or entries? How about a server command that the administrator would run to "flush" older content with settings such as percentages to minimize loss of older recordings?

You can shut down the server, free up space via moonfire-nvr config, and restart. (This procedure didn't used to work; see #65. But it should be fine now.) There's no API to do this yet on a running server, but I expect to implement one as part of #35.

I had a power outage yesterday and the Raspberry Pi was not on a protected circuit or have shutdown protocols established (yet). I found my database corrupted...

I want the database to be robust to power failure. No UPS or "proper" shutdown required. Does your Raspberry Pi have journaling enabled on its filesystem? You can check via dumpe2fs. If it does, please open a separate issue about this corruption with more details. It shouldn't happen.

Looking at SQLite's docs, I wonder if I should upgrade from pragma synchronous = 2 to pragma synchronous = 3. Otherwise I think Moonfire NVR does about everything possible to maintain integrity.

It would be helpful to have some way to preserves,export, and import entries so that I do not have to manually enter everything all over again.

Feel free to add a note on that (sub)feature at #35. That'd certainly make sense as part of a good config UI.

jlpoolen commented 4 years ago

Update: since opening this ticket, I let the moonfire-nvr just run and then did a refresh using the Chrome browser a few minutes ago. (This ticket was opened over an hour ago.) Looks like clean-up did occur as depicted by this screenshot:

chrome_2020-06-08_13-15-19

jlpoolen commented 4 years ago

Answering question re: journaled file system.

Caution: I'm running on a fresh-out-of-the-box Raspberry Pi 4 w/8GBs. I do not have a hard disk attached, as yet (won't have until 6/10), so everything is working off of the SDHC card that came with the unit. Perhaps running off an SDHC card is unreasonable? As I read the documentation, the hard drive is optional.

Can't run dumpe2fs against the live partition. However, to answer your question about a journaled file system, I referenced dmesg which reveals:

`[ 0.563929] Waiting for root device /dev/mmcblk0p7...

[ 0.610016] random: fast init done

[ 0.622715] mmc1: new high speed SDIO card at address 0001

[ 0.664867] mmc0: new ultra high speed DDR50 SDHC card at address aaaa

[ 0.665959] mmcblk0: mmc0:aaaa SL32G 29.7 GiB

[ 0.669952] mmcblk0: p1 p2 < p5 p6 p7 >

[ 0.689231] EXT4-fs (mmcblk0p7): mounted filesystem with ordered data mode. Opts: (null)

[ 0.689275] VFS: Mounted root (ext4 filesystem) readonly on device 179:7. ` ext4 is a journaled file system; it is the file system shipping on the SDHC card. I also recall after the power outage, my root file system was full, so I pell mell removed the storage subdirectories under /var/tmp/video to reclaim space.

jlpoolen commented 4 years ago

Just found this command which can run againt a live partition: pi@raspberrypi:/home/jlpoole $ sudo debugfs -R features /dev/mmcblk0p7 debugfs 1.44.5 (15-Dec-2018)

Filesystem features: has_journal ext_attr resize_inode dir_index filetype needs_recovery extent flex_bg sparse_super large_file uninit_bg dir_nlink extra_isize

pi@raspberrypi:/home/jlpoole $

scottlamb commented 4 years ago

Perhaps running off an SDHC card is unreasonable?

Maybe. I've never tried it. Obviously the capacity is severely limited. I'm not sure about the card's write cycle rating; you might burn out your SDHC card pretty quick that way. I'm also not sure about bandwidth; you check that rating too, and maybe do some IO benchmarks with it in the Raspberry Pi, to make sure you're far away from that limit. Otherwise I think it would work.

Also, having the video samples on the same filesystem as the database and the rest of your system means that my idea above about reacting to ENOSPC is really no good. It would have to be more proactively stay within its limits or database commits won't go through, other parts of your system will start to break, etc.

I also recall after the power outage, my root file system was full, so I pell mell removed the storage subdirectories under /var/tmp/video to reclaim space.

Okay, if what you mean by corrupted is just that it referred to a bunch of files you'd removed (and certainly moonfire-nvr check would complain about this), there's no bug.

jlpoolen commented 4 years ago

I've been running now for several weeks against a Western Digital Purple allocating 2TB per camera. Guess what, all four cameras used up their allocated space. putty_2020-07-08_19-53-50 It looks like I may have killed my session when I negligently closed my puTTY session to the Raspberry Pi. I tried starting up and the console displays continual error messages: W0708 200435.447 s-garage_west-main moonfire_base::clock] sleeping for Duration { secs: 1, nanos: 0 } after error: Os { code: 28, kind: Other, message: "No space left on device" }

So, what do I do now to minimize loss of the several weeks of video? I'd like to be able to remove the earliest and have moonfire-nvr manage the "buffer" of 2TB per camera so it can erase old making room for new.

I thought by having a value of 2 minutes (120 seconds) for the flush_if_sec would take care of any potential lock-ups. Perhaps my sudden closure of my remote session and the moonfire-nvr process thereunder threw a monkey wrench into the system?

jlpoolen commented 4 years ago

Another interesting item: for the same camera "garage_west" whose "Edit camera" screen shot is included in the posting immediately prior to this one, I have this screen shot of the garage_west file directories:

putty_2020-07-08_20-11-51

What is interesting is that the "usage" column shows "1T 1023G 988M 397K" with the following column having a limit of "2T". I guess the "usage" column depicts amounts in Terabytes, Gigabyte, Megabytes, and Kilobytes?

scottlamb commented 4 years ago

So, what do I do now to minimize loss of the several weeks of video? I'd like to be able to remove the earliest and have moonfire-nvr manage the "buffer" of 2TB per camera so it can erase old making room for new.

From that "Edit retention" screen in moonfire-nvr config, you should be able to just lower the limit a little. It will delete only what it needs to get under the new limit, and it always deletes starting from the earliest. After that, restarting the server should work fine.

Perhaps my sudden closure of my remote session and the moonfire-nvr process thereunder threw a monkey wrench into the system?

That shouldn't cause any lasting trouble.

jlpoolen commented 4 years ago

The task of resetting the current limits to something slightly less worked. I took screen shots to document. What would be the best way to submit some documentation? The easiest for me would be to create a wiki page on my fork of the project and then make a pull request. I think the "smart" memory specifications interface needs to show correct and incorrect attempts. The silent rejection of invalid values, e.g. "1TB 900GB 200M 100k" (lowercase "k" is unacceptable) by simply highlighting the Cancel button instead of the Confirm left me wondering why my values were not acceptable. It would be helpful to have some blaring feedback "The values you entered: "1TB 900GB 200M 100k" are illegal/unacceptable/unrecognized.

My original settings for space were simply "2T", Now, to get out of the doldrums, my settings are "1TB 900GB 200M 100K" or some other value slightly less than 2T. What I'll do is safely stop the resurrected process, go back into the menu to set the spaces and change the values back to "2T".

Thank you!

On Wed, Jul 8, 2020 at 9:53 PM Scott Lamb notifications@github.com wrote:

So, what do I do now to minimize loss of the several weeks of video? I'd like to be able to remove the earliest and have moonfire-nvr manage the "buffer" of 2TB per camera so it can erase old making room for new.

From that "Edit retention" screen in moonfire-nvr config, you should be able to just lower the limit a little. It will delete only what it needs to get under the new limit, and it always deletes starting from the earliest. After that, restarting the server should work fine.

Perhaps my sudden closure of my remote session and the moonfire-nvr process thereunder threw a monkey wrench into the system?

That shouldn't cause any lasting trouble.

— You are receiving this because you authored the thread. Reply to this email directly, view it on GitHub https://github.com/scottlamb/moonfire-nvr/issues/84#issuecomment-655898932, or unsubscribe https://github.com/notifications/unsubscribe-auth/AABXP4JCLIXSCAUYO3XIB63R2VEOJANCNFSM4NYV7I6A .

-- John L. Poole

707-812-1323 jlpoole56@gmail.com

jlpoolen commented 4 years ago

Oops. Although moonfire started up, it soon resumed to printing errors such as: W0709 041655.160 s-garage_east-main moonfire_base::clock] sleeping for Duration { secs: 1, nanos: 0 } after error: Os { code: 28, kind: Other, message: "No space left on device" } I was running the session in the utility "screen" and my control-Cs were ignored -- probably a race condition? I ended up having to open a new console and performing a: sudo kill -HUP 7268

I'm going to revise the space settings first by reducing them 1 GB, then running moonfire-nvr so it can do its housekeeping, then close moonfire-nvr and restore my "2T" settings for each of the drives.

An aside: I have four cameras so I have to go through the change settings four times. I wonder if there might be a feature that does a global change?

jlpoolen commented 4 years ago

A sign of a healthy resurrection of the moonfire-nvr server after a run-out-of-disk-space event is: `I0709 042931.124 sync-/video/peck_east moonfire_db::db] Flush 7 (why: 120 sec after start of 1 minute 1 second peck_east-main recording 3/38570): /video/peck_east: added 24M 819K 551B in 1 recordings (3/38570), deleted 24M 775B in 1 (3/50), GCed 1 recordings (3/49). /video/peck_west: added 23M 1023K 623B in 1 recordings (4/38605), deleted 23M 498K 359B in 1 (4/81), GCed 1 recordings (4/80). /video/garage_east: added 58M 574K 134B in 1 recordings (2/39115), deleted 58M 610K 361B in 1 (2/1238), GCed 1 recordings (2/1237). /video/garage_west: added 58M 582K 132B in 1 recordings (1/39453), deleted 58M 613K 424B in 1 (1/1929), GCed 1 recordings (1/1928).

` above. In my set-up, I have four (4) cameras which are referenced as "1","2","3", and "4" above in the snipped from the on-going log. The posting of "added [file size] in 1 recordings ([camera number]/[space limit?]), deleted [space size?] in 1 ([not sure what the ratio is here], GCed 1 recordings ([again, not sure what the ratio represents])" appears to be a good sign of health. I've been running the window for over 8 minutes which is 4 times the 120 seconds interals I have specified, so it looks like the logic that does garbage collecting ("GC") is working. I'll now gracefully shutdown the process and reinstate my "2T" limits on each of the disk space settings for all cameras.

Lesson learned: run the process in a "screen" session to prevent crashing the process due to an unexpcted closure of a ssh session.

jlpoolen commented 4 years ago

Gotcha: I allocated 8 TBs of a 12 TB disk (Western Digital Purple) and exposed them to my Raspberry Pi under the mount /mnt/purple. What I was not wary of that my attempt to allocate 8 TB really translates into 7.6 TBs due to allocation for managing the space segments.
jlpoole@raspberrypi:~ $ df -lh Filesystem Size Used Avail Use% Mounted on /dev/root 27G 17G 8.2G 68% / devtmpfs 3.8G 0 3.8G 0% /dev tmpfs 3.9G 0 3.9G 0% /dev/shm tmpfs 3.9G 401M 3.5G 11% /run tmpfs 5.0M 4.0K 5.0M 1% /run/lock tmpfs 3.9G 0 3.9G 0% /sys/fs/cgroup /dev/mmcblk0p6 253M 52M 202M 21% /boot tmpfs 797M 0 797M 0% /run/user/1000 tmpfs 797M 4.0K 797M 1% /run/user/1001 /dev/mapper/vgpurple-cameras 8.0T 7.6T 11G 100% /mnt/purple [redacted other entries not relevant] jlpoole@raspberrypi:~ $ When I went into the four disk settings, 1 disk for each camera, I was able to successfully allocate 2 TBs to the first two cameras, but was prevented from allocating 2TBs to the others because there really was not 4 TBs left to allocate, there was something less, e.g. 3.6 TBs. Screenshot from 2020-07-09 04-42-33

The above screenshot show that the bottom line indicates the available space remaining and thus sets an upper limit as to how much space you can specify. In the screenshot above, my only action available was "Cancel", so I had to change my memory allocation to something like 1TG 380GB in order to successfully activate the "Confirm" button.

What's weird is that I thought I had successfully allocated 2TB for all four cameras when I first brought online the /mnt/purple mount of 8TBs. Perhaps my use and full consumption of the space on the 8TB mount actually caused a reduction of "available" space in later sessions of the disk space configuration interface?

scottlamb commented 4 years ago

I only have a few minutes right now, but I'm trying to list all the ideas here:

Did I get everything?

fwiw, I only use one filesystem and sample file directory per hard drive. So only being able to edit retention for one sample file directory at a time is less of a problem for me (two directories for six cameras right now), and I can easily shift capacity from one camera to another.

scottlamb commented 4 years ago

Oh, and your question here:

What's weird is that I thought I had successfully allocated 2TB for all four cameras when I first brought online the /mnt/purple mount of 8TBs. Perhaps my use and full consumption of the space on the 8TB mount actually caused a reduction of "available" space in later sessions of the disk space configuration interface?

I think so? If I'm understanding correctly, you set one up, then adjusted the volume group to break out a new logical volume, and so on? I'm not really an expert on LVM but I can imagine this might have shrunk the existing filesystems.

jlpoolen commented 4 years ago

On 7/9/2020 7:11 AM, Scott Lamb wrote:

I only have a few minutes right now, but I'm trying to list all the ideas here:

  • As discussed on Jun 8 https://github.com/scottlamb/moonfire-nvr/issues/84#issuecomment-640862000: Moonfire NVR needs some slack between the limit you set and the true capacity of the filesystem. I agree it'd be much more polished to not need this. It isn't super easy because there are several reasons it needs slack. I think most of the slack is because it doesn't account for in-flight recordings. I'm thinking now maybe it can get the max bitrate from the camera on startup and use that to get a pessimistic estimate of how much space the extra recordings will count for. It's not quite top of my list; in the meantime doc improvements would be welcome: making it better describe how much slack you need, making the section more obvious.
  • There should be docs about how to recover from an out-of-space condition—maybe in |guide/troubleshooting.md|?
  • bad UI in |moonfire nvr config| o this comment https://github.com/scottlamb/moonfire-nvr/issues/84#issuecomment-655873717 describes not knowing how to read what it's saying. o this comment https://github.com/scottlamb/moonfire-nvr/issues/84#issuecomment-656065208 mentions it doesn't give good feedback when your data entry doesn't meet its (strict and somewhat arbitrary) standards. o this comment https://github.com/scottlamb/moonfire-nvr/issues/84#issuecomment-656070096 suggests allowing a change in retention for all sample file directories at once. I think this should be possible. o one I've noticed myself: if you're deleting tons of files it should give a progressbar. Right now it seems hung/broken.
  • From this comment https://github.com/scottlamb/moonfire-nvr/issues/84#issuecomment-656065208: need some...meta-documentation / how to contribute documentation. (As for the wiki: can you directly edit it on my repo? I thought that was the idea of github wikis but I haven't actually tested it...)

Did I get everything?

fwiw, I only use one filesystem and sample file directory per hard drive. So only being able to edit retention for one sample file directory at a time is less of a problem for me (two directories for six cameras right now), and I can easily shift capacity from one camera to another.

— You are receiving this because you authored the thread. Reply to this email directly, view it on GitHub https://github.com/scottlamb/moonfire-nvr/issues/84#issuecomment-656150961, or unsubscribe https://github.com/notifications/unsubscribe-auth/AABXP4P27ULPBSKIOPD3JA3R2XFXVANCNFSM4NYV7I6A.

Yes, I think you've covered all my points.

I returned to my session hours later which had been operating fine after the readjustment of the space limits and encountered new out-of-disk error messages fleeting by on the console.  I shut down the process and restarted and moonfire-nvr and it seems to be running normally now, new files are being added an old ones removed.  All four cameras (garage_east, garage_west, peck_east, peck_west) accounted for:

I0709 092017.132 sync-/video/garage_east moonfire_db::db] Flush 42
(why: 120 sec after start of 59 seconds *garage_east*-main recording
2/39228):
/video/*peck_east*: added 59M 998K 833B in 1 recordings (3/38683),
deleted 0B in 0 (), GCed 0 recordings ().
/video/*peck_west*: added 59M 1017K 566B in 1 recordings (4/38718),
deleted 0B in 0 (), GCed 0 recordings ().
/video/*garage_east*: added 117M 205K 685B in 2 recordings (2/39228,
2/39229), deleted 117M 161K 696B in 2 (2/1275, 2/1276), GCed 1
recordings (2/1274).
/video/*garage_west*: added 58M 596K 544B in 1 recordings (1/39566),
deleted 58M 535K 696B in 1 (1/1969), GCed 1 recordings (1/1968).

I've set my puTTY console to 50k lines to see if I can capture the moment when it starts to run out of disk space should that condition reappear.

-John

jlpoolen commented 4 years ago

Following up on my previous post, things are looking well. The disk space management is deleting old files to make room for the new, there has not been any "No space left of device" messages. Here's a milestone after 24 hours of operation:

I0710 133732.996 sync-/video/peck_east moonfire_db::db] Flush 1447 (why: 120 sec after start of 59 seconds peck_east-main recording 3/40390):
/video/peck_east: added 119M 1023K 954B in 2 recordings (3/40390, 3/40391), deleted 118M 860K 946B in 2 (3/1945, 3/1946), GCed 1 recordings (3/1944).
/video/peck_west: added 60M 850B in 1 recordings (4/40415), deleted 60M 997B in 1 (4/1928), GCed 1 recordings (4/1927).
/video/garage_east: added 58M 622K 835B in 1 recordings (2/40924), deleted 58M 584K 794B in 1 (2/2979), GCed 1 recordings (2/2978).
/video/garage_west: added 58M 608K 207B in 1 recordings (1/41262), deleted 58M 247K 355B in 1 (1/3958), GCed 1 recordings (1/3957).`
jlpoolen commented 4 years ago

I decided to go ahead and install moonfire-nvr as a service. I Controlled-C and it gracefully shut down as follows:

I0710 181801.890 sync-/video/garage_west moonfire_db::db] Flush 1679 (why: 120 sec after start of 59 seconds garage_west-main recording 1                                                             /41542):
/video/peck_east: added 60M 3K 60B in 1 recordings (3/40671), deleted 60M 153B in 1 (3/2227), GCed 1 recordings (3/2226).
/video/peck_west: added 60M 2K 224B in 1 recordings (4/40696), deleted 60M 2K 238B in 1 (4/2208), GCed 2 recordings (4/2207, 4/2206).
/video/garage_east: added 58M 609K 783B in 1 recordings (2/41204), deleted 58M 588K 275B in 1 (2/3688), GCed 1 recordings (2/3687).
/video/garage_west: added 117M 302K 998B in 2 recordings (1/41542, 1/41543), deleted 117M 162K 326B in 2 (1/4238, 1/4239), GCed 1 recordi                                                             ngs (1/4237).
^CI0710 181847.551 main moonfire_nvr::cmds::run] Shutting down streamers.
I0710 181847.553 s-peck_east-main moonfire_nvr::streamer] peck_east-main: shutting down
I0710 181847.636 s-peck_west-main moonfire_nvr::streamer] peck_west-main: shutting down
I0710 181847.644 s-garage_east-main moonfire_nvr::streamer] garage_east-main: shutting down
I0710 181847.686 s-garage_west-main moonfire_nvr::streamer] garage_west-main: shutting down
I0710 181847.784 main moonfire_nvr::cmds::run] Waiting for HTTP requests to finish.
I0710 181847.784 main moonfire_nvr::cmds::run] Exiting.
I0710 181847.806 main moonfire_db::db] Flush 1680 (why: drop):
/video/peck_east: added 77M 175K 908B in 2 recordings (3/40672, 3/40673), deleted 59M 1006K 30B in 1 (3/2228), GCed 1 recordings (3/2227)                                                             .
/video/peck_west: added 61M 94K 539B in 2 recordings (4/40697, 4/40698), deleted 60M 528B in 1 (4/2209), GCed 1 recordings (4/2208).
/video/garage_east: added 89M 124K 697B in 2 recordings (2/41205, 2/41206), deleted 58M 642K 1010B in 1 (2/3689), GCed 1 recordings (2/36                                                             88).
/video/garage_west: added 45M 150K 871B in 1 recordings (1/41544), deleted 0B in 0 (), GCed 2 recordings (1/4238, 1/4239).
I0710 181847.962 main moonfire_nvr] Success.
jlpoole@raspberrypi:~ $ sudo systemctl start moonfire-nvr
[sudo] password for jlpoole:
jlpoole@raspberrypi:~ $ 

then I tailed /var/log/daemon and saw errors and issued a "sudo systemctl stop moonfire-nvr". Here's the log at the end of the stop:

Jul 10 18:25:20 raspberrypi moonfire-nvr[13006]: s-garage_east-main moonfire_base::clock] sleeping for Duration { secs: 1, nanos: 0 } after error: Os { code: 28, kind: Other, message: "No space left on device" }
Jul 10 18:25:20 raspberrypi moonfire-nvr[13006]:    0: backtrace::backtrace::libunwind::trace
Jul 10 18:25:20 raspberrypi moonfire-nvr[13006]:              at home/pi/.cargo/registry/src/github.com-1ecc6299db9ec823/backtrace-0.3.46/src/backtrace/libunwind.rs:86
Jul 10 18:25:20 raspberrypi moonfire-nvr[13006]:       backtrace::backtrace::trace_unsynchronized
Jul 10 18:25:20 raspberrypi moonfire-nvr[13006]:              at home/pi/.cargo/registry/src/github.com-1ecc6299db9ec823/backtrace-0.3.46/src/backtrace/mod.rs:66
Jul 10 18:25:20 raspberrypi moonfire-nvr[13006]:       backtrace::backtrace::trace
Jul 10 18:25:20 raspberrypi moonfire-nvr[13006]:              at home/pi/.cargo/registry/src/github.com-1ecc6299db9ec823/backtrace-0.3.46/src/backtrace/mod.rs:53
Jul 10 18:25:20 raspberrypi moonfire-nvr[13006]:    1: backtrace::capture::Backtrace::create
Jul 10 18:25:20 raspberrypi moonfire-nvr[13006]:              at home/pi/.cargo/registry/src/github.com-1ecc6299db9ec823/backtrace-0.3.46/src/capture.rs:164
Jul 10 18:25:20 raspberrypi moonfire-nvr[13006]:       backtrace::capture::Backtrace::new_unresolved
Jul 10 18:25:20 raspberrypi moonfire-nvr[13006]:              at home/pi/.cargo/registry/src/github.com-1ecc6299db9ec823/backtrace-0.3.46/src/capture.rs:158
Jul 10 18:25:20 raspberrypi moonfire-nvr[13006]:    2: failure::backtrace::internal::InternalBacktrace::new
Jul 10 18:25:20 raspberrypi moonfire-nvr[13006]:              at home/pi/.cargo/registry/src/github.com-1ecc6299db9ec823/failure-0.1.7/src/backtrace/internal.rs:46
Jul 10 18:25:20 raspberrypi moonfire-nvr[13006]:    3: failure::backtrace::Backtrace::new
Jul 10 18:25:20 raspberrypi moonfire-nvr[13006]:              at home/pi/.cargo/registry/src/github.com-1ecc6299db9ec823/failure-0.1.7/src/backtrace/mod.rs:121
Jul 10 18:25:20 raspberrypi moonfire-nvr[13006]:       <failure::backtrace::Backtrace as core::default::Default>::default
Jul 10 18:25:20 raspberrypi moonfire-nvr[13006]:              at home/pi/.cargo/registry/src/github.com-1ecc6299db9ec823/failure-0.1.7/src/backtrace/mod.rs:140
Jul 10 18:25:20 raspberrypi moonfire-nvr[13006]:    4: <failure::error::error_impl::ErrorImpl as core::convert::From<F>>::from
Jul 10 18:25:20 raspberrypi moonfire-nvr[13006]:              at home/pi/.cargo/registry/src/github.com-1ecc6299db9ec823/failure-0.1.7/src/error/error_impl.rs:19
Jul 10 18:25:20 raspberrypi moonfire-nvr[13006]:       <failure::error::Error as core::convert::From<F>>::from
Jul 10 18:25:20 raspberrypi moonfire-nvr[13006]:              at home/pi/.cargo/registry/src/github.com-1ecc6299db9ec823/failure-0.1.7/src/error/mod.rs:36
Jul 10 18:25:20 raspberrypi moonfire-nvr[13006]:       <T as core::convert::Into<U>>::into
Jul 10 18:25:20 raspberrypi moonfire-nvr[13006]:              at rustc/49cae55760da0a43428eba73abcb659bb70cf2e4/src/libcore/convert/mod.rs:559
Jul 10 18:25:20 raspberrypi moonfire-nvr[13006]:       moonfire_base::clock::retry_forever
Jul 10 18:25:20 raspberrypi moonfire-nvr[13006]:              at usr/local/src/moonfire-nvr/base/clock.rs:64
Jul 10 18:25:20 raspberrypi moonfire-nvr[13006]:    5: moonfire_db::writer::Writer<C,D>::write
Jul 10 18:25:20 raspberrypi moonfire-nvr[13006]:              at usr/local/src/moonfire-nvr/db/writer.rs:751
Jul 10 18:25:20 raspberrypi moonfire-nvr[13006]:    6: moonfire_nvr::streamer::Streamer<C,S>::run_once
Jul 10 18:25:20 raspberrypi moonfire-nvr[13006]:              at src/streamer.rs:189
Jul 10 18:25:20 raspberrypi moonfire-nvr[13006]:       moonfire_nvr::streamer::Streamer<C,S>::run
Jul 10 18:25:20 raspberrypi moonfire-nvr[13006]:              at src/streamer.rs:100
Jul 10 18:25:20 raspberrypi moonfire-nvr[13006]:    7: moonfire_nvr::cmds::run::run::{{closure}}::{{closure}}
Jul 10 18:25:20 raspberrypi moonfire-nvr[13006]:              at src/cmds/run.rs:259
Jul 10 18:25:20 raspberrypi moonfire-nvr[13006]:       std::sys_common::backtrace::__rust_begin_short_backtrace
Jul 10 18:25:20 raspberrypi moonfire-nvr[13006]:              at rustc/49cae55760da0a43428eba73abcb659bb70cf2e4/src/libstd/sys_common/backtrace.rs:130
Jul 10 18:25:20 raspberrypi moonfire-nvr[13006]:    8: std::thread::Builder::spawn_unchecked::{{closure}}::{{closure}}
Jul 10 18:25:20 raspberrypi moonfire-nvr[13006]:              at rustc/49cae55760da0a43428eba73abcb659bb70cf2e4/src/libstd/thread/mod.rs:475
Jul 10 18:25:20 raspberrypi moonfire-nvr[13006]:       <std::panic::AssertUnwindSafe<F> as core::ops::function::FnOnce<()>>::call_once
Jul 10 18:25:20 raspberrypi moonfire-nvr[13006]:              at rustc/49cae55760da0a43428eba73abcb659bb70cf2e4/src/libstd/panic.rs:318
Jul 10 18:25:20 raspberrypi moonfire-nvr[13006]:       std::panicking::try::do_call
Jul 10 18:25:20 raspberrypi moonfire-nvr[13006]:              at rustc/49cae55760da0a43428eba73abcb659bb70cf2e4/src/libstd/panicking.rs:331
Jul 10 18:25:20 raspberrypi moonfire-nvr[13006]:       std::panicking::try
Jul 10 18:25:20 raspberrypi moonfire-nvr[13006]:              at rustc/49cae55760da0a43428eba73abcb659bb70cf2e4/src/libstd/panicking.rs:274
Jul 10 18:25:20 raspberrypi moonfire-nvr[13006]:       std::panic::catch_unwind
Jul 10 18:25:20 raspberrypi moonfire-nvr[13006]:              at rustc/49cae55760da0a43428eba73abcb659bb70cf2e4/src/libstd/panic.rs:394
Jul 10 18:25:20 raspberrypi moonfire-nvr[13006]:       std::thread::Builder::spawn_unchecked::{{closure}}
Jul 10 18:25:20 raspberrypi moonfire-nvr[13006]:              at rustc/49cae55760da0a43428eba73abcb659bb70cf2e4/src/libstd/thread/mod.rs:474
Jul 10 18:25:20 raspberrypi moonfire-nvr[13006]:       core::ops::function::FnOnce::call_once{{vtable.shim}}
Jul 10 18:25:20 raspberrypi moonfire-nvr[13006]:              at rustc/49cae55760da0a43428eba73abcb659bb70cf2e4/src/libcore/ops/function.rs:232
Jul 10 18:25:20 raspberrypi moonfire-nvr[13006]:    9: <alloc::boxed::Box<F> as core::ops::function::FnOnce<A>>::call_once
Jul 10 18:25:20 raspberrypi moonfire-nvr[13006]:              at rustc/49cae55760da0a43428eba73abcb659bb70cf2e4/src/liballoc/boxed.rs:1008
Jul 10 18:25:20 raspberrypi moonfire-nvr[13006]:       <alloc::boxed::Box<F> as core::ops::function::FnOnce<A>>::call_once
Jul 10 18:25:20 raspberrypi moonfire-nvr[13006]:              at rustc/49cae55760da0a43428eba73abcb659bb70cf2e4/src/liballoc/boxed.rs:1008
Jul 10 18:25:20 raspberrypi moonfire-nvr[13006]:       std::sys::unix::thread::Thread::new::thread_start
Jul 10 18:25:20 raspberrypi moonfire-nvr[13006]:              at src/libstd/sys/unix/thread.rs:87
Jul 10 18:25:21 raspberrypi systemd[1]: moonfire-nvr.service: State 'stop-sigterm' timed out. Killing.
Jul 10 18:25:21 raspberrypi systemd[1]: moonfire-nvr.service: Killing process 13006 (moonfire-nvr) with signal SIGKILL.
Jul 10 18:25:21 raspberrypi systemd[1]: moonfire-nvr.service: Main process exited, code=killed, status=9/KILL
Jul 10 18:25:21 raspberrypi systemd[1]: moonfire-nvr.service: Failed with result 'timeout'.
Jul 10 18:25:21 raspberrypi systemd[1]: Stopped Moonfire NVR.
Jul 10 18:25:21 raspberrypi systemd[1]: moonfire-nvr.service: Consumed 6.860s CPU time.
jlpoolen commented 4 years ago

I decided rather than run as a service, I'd execute a command from the console. The "No space left on device" resumed. When I tried to Control-C, nothing happened. I waited for about 5 seconds, and then in another console I executed:

jlpoole@raspberrypi:~ $ ps -a
  PID TTY          TIME CMD
  910 tty1     00:00:00 bash
13194 pts/0    00:00:00 sudo
13201 pts/0    00:00:05 moonfire-nvr
13217 pts/4    00:00:00 ps
jlpoole@raspberrypi:~ $ sudo kill -HUP 13201

Here's the start-up from the console that I had to kill with the preceding command:

jlpoole@raspberrypi:/var/log $ sudo systemctl stop  moonfire-nvr
jlpoole@raspberrypi:/var/log $ sudo -u moonfire-nvr moonfire-nvr run
[sudo] password for jlpoole:
I0710 190737.822 main moonfire_db::db] Loading video sample entries
I0710 190737.822 main moonfire_db::db] Loaded 2 video sample entries
I0710 190737.822 main moonfire_db::db] Loading sample file dirs
I0710 190737.823 main moonfire_db::db] Loaded 5 sample file dirs
I0710 190737.823 main moonfire_db::db] Loading cameras
I0710 190737.824 main moonfire_db::db] Loaded 4 cameras
I0710 190737.824 main moonfire_db::db] Loading streams
I0710 190737.824 main moonfire_db::db] Loaded 4 streams
I0710 190737.825 main moonfire_db::db] Loading recordings for camera garage_west stream MAIN
I0710 190738.283 main moonfire_db::db] Loaded 37305 recordings for camera garage_west stream MAIN
I0710 190738.283 main moonfire_db::db] Loading recordings for camera garage_east stream MAIN
I0710 190738.709 main moonfire_db::db] Loaded 37517 recordings for camera garage_east stream MAIN
I0710 190738.710 main moonfire_db::db] Loading recordings for camera peck_east stream MAIN
I0710 190739.147 main moonfire_db::db] Loaded 38445 recordings for camera peck_east stream MAIN
I0710 190739.147 main moonfire_db::db] Loading recordings for camera peck_west stream MAIN
I0710 190739.585 main moonfire_db::db] Loaded 38489 recordings for camera peck_west stream MAIN
W0710 190739.585 main moonfire_base::clock] database operation took PT1.763225458S!
I0710 190739.585 main moonfire_nvr::cmds::run] Database is loaded.
I0710 190739.618 main moonfire_nvr::cmds::run] Directories are opened.
I0710 190739.618 main moonfire_nvr::cmds::run] Resolved timezone: America/Los_Angeles
I0710 190739.619 main moonfire_ffmpeg] Initialized ffmpeg. Versions:
avutil: running=56.22.100 compiled=56.22.100
avcodec: running=58.35.100 compiled=58.35.100
avformat: running=58.20.100 compiled=58.20.100
I0710 190739.619 main moonfire_nvr::cmds::run] Starting syncer for path /video/garage_west
I0710 190739.619 main moonfire_nvr::cmds::run] Starting syncer for path /video/garage_east
I0710 190739.619 main moonfire_nvr::cmds::run] Starting syncer for path /video/peck_east
I0710 190739.619 main moonfire_nvr::cmds::run] Starting syncer for path /video/peck_west
I0710 190739.717 main moonfire_db::db] Flush 1 (why: synchronous deletion): no recording changes
I0710 190739.802 main moonfire_db::db] Flush 2 (why: synchronous deletion): no recording changes
I0710 190739.886 main moonfire_db::db] Flush 3 (why: synchronous deletion): no recording changes
I0710 190739.957 main moonfire_db::db] Flush 4 (why: synchronous deletion): no recording changes
I0710 190739.958 main moonfire_nvr::cmds::run] Starting streamer for garage_west-main
I0710 190739.958 main moonfire_nvr::cmds::run] Starting streamer for garage_east-main
I0710 190739.958 main moonfire_nvr::cmds::run] Starting streamer for peck_east-main
I0710 190739.958 main moonfire_nvr::cmds::run] Starting streamer for peck_west-main
I0710 190739.958 s-garage_west-main moonfire_nvr::streamer] garage_west-main: Opening input: rtsp://admin:redacted@192.168.1.48:554/h264Preview_01_main
I0710 190739.959 main moonfire_nvr::cmds::run] Ready to serve HTTP requests
I0710 190739.958 s-garage_east-main moonfire_nvr::streamer] garage_east-main: Opening input: rtsp://admin:redacted@192.168.1.49:554/h264Preview_01_main
I0710 190739.959 s-peck_west-main moonfire_nvr::streamer] peck_west-main: Opening input: rtsp://admin:redacted@192.168.1.180:554/h264Preview_01_main
I0710 190739.959 s-peck_east-main moonfire_nvr::streamer] peck_east-main: Opening input: rtsp://admin:redacted@192.168.1.181:554/h264Preview_01_main
I0710 190741.243 s-garage_west-main moonfire_nvr::stream] Discarding the first packet to work around https://trac.ffmpeg.org/ticket/5018
W0710 190741.243 s-garage_west-main moonfire_base::clock] opening rtsp://admin:redacted@192.168.1.48:554/h264Preview_01_main took PT1.284280002S!
I0710 190741.646 s-garage_east-main moonfire_nvr::stream] Discarding the first packet to work around https://trac.ffmpeg.org/ticket/5018
W0710 190741.646 s-garage_east-main moonfire_base::clock] opening rtsp://admin:redacted@192.168.1.49:554/h264Preview_01_main took PT1.686763223S!
I0710 190741.663 s-peck_west-main moonfire_nvr::stream] Discarding the first packet to work around https://trac.ffmpeg.org/ticket/5018
W0710 190741.663 s-peck_west-main moonfire_base::clock] opening rtsp://admin:redacted@192.168.1.180:554/h264Preview_01_main took PT1.703539613S!
I0710 190741.747 s-peck_east-main moonfire_nvr::stream] Discarding the first packet to work around https://trac.ffmpeg.org/ticket/5018
W0710 190741.747 s-peck_east-main moonfire_base::clock] opening rtsp://admin:redacted@192.168.1.181:554/h264Preview_01_main took PT1.787758814S!
[rtsp @ 0xaf1008f0] CSeq 5 expected, 0 received.
[rtsp @ 0xaf1008f0] CSeq 5 expected, 0 received.
[rtsp @ 0xaf1008f0] CSeq 5 expected, 0 received.
[rtsp @ 0xaf1008f0] CSeq 5 expected, 0 received.
W0710 190742.407 s-garage_west-main moonfire_base::clock] sleeping for Duration { secs: 1, nanos: 0 } after error: Os { code: 28, kind: Other, message: "No space left on device" }
W0710 190742.886 s-garage_east-main moonfire_base::clock] sleeping for Duration { secs: 1, nanos: 0 } after error: Os { code: 28, kind: Other, message: "No space left on device" }
W0710 190743.022 s-peck_west-main moonfir

and the result of receiving the SIGHUP:

W0710 190830.426 s-garage_west-main moonfire_base::clock] sleeping for Duration { secs: 1, nanos: 0 } after error: Os { code: 28, kind: Other, message: "No space left on device" }
W0710 190830.906 s-garage_east-main moonfire_base::clock] sleeping for Duration { secs: 1, nanos: 0 } after error: Os { code: 28, kind: Other, message: "No space left on device" }
Hangup
jlpoole@raspberrypi:/var/log $ 

I then wanted to experiment with sending a INT, so I started moonfire-nvr in the console, and again, it went into a cycle of errors that could not be interrupted by Control-C, so in another terminal:

jlpoole@raspberrypi:~ $ ps -a
  PID TTY          TIME CMD
  910 tty1     00:00:00 bash
13235 pts/0    00:00:00 sudo
13240 pts/0    00:00:06 moonfire-nvr
13254 pts/4    00:00:00 ps
jlpoole@raspberrypi:~ $ sudo kill -INT 13201
kill: (13201): No such process
jlpoole@raspberrypi:~ $ sudo kill -INT 13240
jlpoole@raspberrypi:~ $ sudo kill -HUP 13240
jlpoole@raspberrypi:~ $

The -INT didn't work (I'm punting here and not certain "-INT" is a valid signal acronym, nothing happened in the console where the process was running, so I had to send a "-HUP" to stop it. I saw in run.rs Lines 278-9 you handled "SIGINT" and "SIGTERM".

Since I have this fortunate circumstance of being in a mode where moonfire-nvr just runs into disk space problems, I thought I'd test the various interrupts:

jlpoole@raspberrypi:/var/log $ kill -L
 1) SIGHUP       2) SIGINT       3) SIGQUIT      4) SIGILL       5) SIGTRAP
 6) SIGABRT      7) SIGBUS       8) SIGFPE       9) SIGKILL     10) SIGUSR1
11) SIGSEGV     12) SIGUSR2     13) SIGPIPE     14) SIGALRM     15) SIGTERM
16) SIGSTKFLT   17) SIGCHLD     18) SIGCONT     19) SIGSTOP     20) SIGTSTP
21) SIGTTIN     22) SIGTTOU     23) SIGURG      24) SIGXCPU     25) SIGXFSZ
26) SIGVTALRM   27) SIGPROF     28) SIGWINCH    29) SIGIO       30) SIGPWR
31) SIGSYS      34) SIGRTMIN    35) SIGRTMIN+1  36) SIGRTMIN+2  37) SIGRTMIN+3
38) SIGRTMIN+4  39) SIGRTMIN+5  40) SIGRTMIN+6  41) SIGRTMIN+7  42) SIGRTMIN+8
43) SIGRTMIN+9  44) SIGRTMIN+10 45) SIGRTMIN+11 46) SIGRTMIN+12 47) SIGRTMIN+13
48) SIGRTMIN+14 49) SIGRTMIN+15 50) SIGRTMAX-14 51) SIGRTMAX-13 52) SIGRTMAX-12
53) SIGRTMAX-11 54) SIGRTMAX-10 55) SIGRTMAX-9  56) SIGRTMAX-8  57) SIGRTMAX-7
58) SIGRTMAX-6  59) SIGRTMAX-5  60) SIGRTMAX-4  61) SIGRTMAX-3  62) SIGRTMAX-2
63) SIGRTMAX-1  64) SIGRTMAX
jlpoole@raspberrypi:/var/log $

jlpoole@raspberrypi:~ $ sudo kill -15 13465
jlpoole@raspberrypi:~ $ sudo kill -2 13465
jlpoole@raspberrypi:~ $ sudo kill -1 13465
jlpoole@raspberrypi:~ $

Only "-1" = "HUP" worked, the other two did not. For what it's worth.

jlpoolen commented 4 years ago

Here are my settings:

short_name      path                    MB      Flush
garage_west     /video/garage_west      2199023 120
garage_east     /video/garage_east      2199023 120
peck_east       /video/peck_east        1953136 120
peck_west       /video/peck_west        1952062 120

from

 1  select c.short_name, d.path,
 2  s.retain_bytes/1000000 as "MB",
 3  s.flush_if_sec as "Flush"
 4  from camera c, sample_file_dir d,
 5  stream s
 6  where s.camera_id = c.id
 7  and s.sample_file_dir_id = d.id

and usage:

jlpoole@raspberrypi:/var/log $ sudo du -h -L -c /video
1.8T    /video/peck_west
16K     /video/lost+found
2.1T    /video/garage_west
2.1T    /video/garage_east
1.8T    /video/peck_east
7.6T    /video
7.6T    total
jlpoole@raspberrypi:/var/log $
scottlamb commented 4 years ago

Interesting point about the signal behavior on out-of-space. Did it log a Shutting down streamers. line? I don't see it in your logs in this comment, but I see you cut the logs up into chunks, so maybe it was just lost along the way.

The code is written to set shutdown_streamers—that is, ask the streamer threads to shut down—then wait for them to honor that request. Likewise, it waits for the syncer threads to drain their queue and finish, and it does a last database flush if necessary. And it waits for any HTTP requests in flight to finish. The goal is to finish everything it can and write it fully out to disk. But if any thread is stuck in a retry_forever loop that's not resolving, the shutdown won't complete. You should still see should the Shutting down streamers. log line in response to the first signal.

kill -INT is valid, by the way.

SIGHUP worked because my code doesn't handle it. The default action (as you can see in the table in man 7 signal) is to terminate the process immediately. There are several other signals that should do the same. SIGQUIT is particularly handy because you can send it from the terminal by pressing ctrl-\.

I probably should make the retry loop stop on shutdown_streamers, so the threads properly exit even if the error persists forever.

Another thing some programs do is to understand a second termination signal as "stop trying to shut down gracefully. Just shut down right now."

jlpoolen commented 4 years ago

Yes, I selected portions of the log since so much of it seemed repetitive. To answer your question in terms of the daemon run (I did some runs directly from the console thereafter):

 jlpoole@raspberrypi:/var/log $ sudo tail -n 200000 /var/log/daemon.log |nl| grep -i Shutting
 180133  Jul 10 18:23:50 raspberrypi moonfire-nvr[13006]: main moonfire_nvr::cmds::run] Shutting down streamers.
 jlpoole@raspberrypi:/var/log $

Let me know if you want me to go through a full console run, I can redirect the output to a log and share via pastebin. I'm thinking this is an excellent opportunity to try some things as creating the predicament I'm in is probably not an easy task.

scottlamb commented 4 years ago

Okay, then all makes sense. And no need to save all the logs. I think we can recreate this behavior at will. It's just a matter of being willing to fly quite close to the sun in terms of set limits vs filesystem capacity. ;-)

jlpoolen commented 4 years ago

I noticed that the allocation I gave is 100% utilized, so maybe the inability to write to the directories is thwarting a cleanup attempt? /mnt/purple goes to my 12 TB Western Digital purple where I created 8 TB. I have a link /video to /mnt/purple..

jlpoole@raspberrypi:/var/log $ df -lh
Filesystem                    Size  Used Avail Use% Mounted on
/dev/root                      27G   16G   10G  61% /
devtmpfs                      3.8G     0  3.8G   0% /dev
tmpfs                         3.9G     0  3.9G   0% /dev/shm
tmpfs                         3.9G  401M  3.5G  11% /run
tmpfs                         5.0M  4.0K  5.0M   1% /run/lock
tmpfs                         3.9G     0  3.9G   0% /sys/fs/cgroup
/dev/mmcblk0p6                253M   52M  202M  21% /boot
tmpfs                         797M     0  797M   0% /run/user/1000
tmpfs                         797M  4.0K  797M   1% /run/user/1001
/dev/mapper/vgpurple-cameras  8.0T  7.6T     0 100% /mnt/purple
[redacted]
jlpoole@raspberrypi:/var/log $
scottlamb commented 4 years ago

Did a remove operation fail? I thought it was just that it isn't trying to remove files at the moment. Either it's currently under the limit (and as mentioned before it only compares the fully written/flushed files, so if you don't have enough slack for files that are in flight or being garbage-collected it can be "under the limit" even when the filesystem is completely full) or the remove operation is queued behind a write that's in a retry loop.

btw, keep in mind that raw size of the partition can be pretty different from how much is available to non-root users to write. There's filesystem overhead, and there's normally a 5% reserve for root (configurable in the mkfs.ext4 command). The latter is why df says 7.6T / 8.0T (aka 95%) is 100% use. Moonfire NVR handles this part right; it won't let you set limits that total more than 7.6T.

jlpoolen commented 4 years ago

When I first configured the new disk (spanking new), I'm pretty sure I had been able to specify "2T" for all four cameras; it was a very quick walk-through the ncurses menu operation specifying "2T". I do not remember running into a problem with the last one after setting the first three to "2T". But, it is possible that in my enthusiasm to get moonfire-nvr running on the new disk space I might have concluded, yes, I've hit the reserve limit on this last, #4, disk and I'll deal with it later. I'm pretty sure I would have made a mental note of that. Later on, when I was reducing the sizes to work around the frozen state after all four disks had filled up, I did encounter the reduced space availability on the space settings and then had to assess the existing space looking at your Terabyte-Gigabyte-Megabyte-Kilobyte syntax ("TGMK") and determining what I would enter in the field using the TGMK syntax that would have less space. I remember thinking about the TGMK syntax and considering it when I first came across it and that was during the lowering the specifications, not when I was at the 4th disk specifying the allocations for the first time. This leads me to wonder if an existing drive filled up with files (for instance, the garage_west drive has 37,307 files) gives a different status than an area of 2T specified? I did use LVM.

A test on this discrepancy might be for me to try and specify four identical spaces using logical volumes https://wiki.gentoo.org/wiki/LVM#LV_.28Logical_Volume.29 ("lvcreate") within the remaining the 4Terabytes of my 12 TB disk and using values which, if added, up would exceed the entire partition. But see below, I do not have 4TB! The administrative overhead appears to have consumed space leaving less than 4TB, e.g. "3.20 TB". Here's my current status (I have a Western Digital red in my external disk housing, reports of which are redacted below):

jlpoole@raspberrypi:~ $ sudo pvdisplay --units R --- Physical volume --- PV Name /dev/sdb VG Name vgpurple PV Size 12.00 TB / not usable 4.19 MB Allocatable yes PE Size 4.19 MB Total PE 2861055 Free PE 763903 Allocated PE 2097152 PV UUID ye603Z-iPMW-8dQA-2DFi-NQOY-cP0k-UuZXAy

--- Physical volume --- PV Name /dev/sda VG Name vgred

jlpoole@raspberrypi:~ $ sudo vgdisplay --units R --- Volume group --- VG Name vgpurple System ID Format lvm2 Metadata Areas 1 Metadata Sequence No 3 VG Access read/write VG Status resizable MAX LV 0 Cur LV 1 Open LV 1 Max PV 0 Cur PV 1 Act PV 1 VG Size 12.00 TB PE Size 4.19 MB Total PE 2861055 Alloc PE / Size 2097152 / <8.80 TB Free PE / Size 763903 / 3.20 TB VG UUID ziVCX7-xVv2-Hecn-j0BQ-dQoh-fIrU-mLD1c9

--- Volume group --- VG Name vgred

jlpoole@raspberrypi:~ $ sudo lvdisplay --units R --- Logical volume --- LV Path /dev/vgpurple/cameras LV Name cameras VG Name vgpurple LV UUID swI29O-uPH0-cxJN-KXVt-6oBn-TZdM-1aLgZr LV Write Access read/write LV Creation host, time raspberrypi, 2020-06-10 13:43:34 -0700 LV Status available

open 1

LV Size <8.80 TB Current LE 2097152 Segments 1 Allocation inherit Read ahead sectors auto

jlpoole@raspberrypi:~ $

In the above outputs, I'm seeing administrative space has exceeded the 8TBs, e.g. "8.80 TB". I'm pretty certain I had thought I was allocating 8TB because that would be four 2TB and it would be a nice sum in my mind: 4 x 2 = 8. I would not have used some amount like 10 or 11 that was not integer-like divisible by 4; and I know I would have thought I was leaving some space, e.g. 4TB, as extra for testing or for some of the other cameras I plan to bring online into my network. So, 12 TB - 8TB (allocated) != 4 TB remaining. I guess more careful planning and accounting for the actual sizes and the 5% you referenced was in order. This is a kind of trap others might blythely run into, so I'm hoping the details I'm sharing here will serve as a cautionary tale.

On Fri, Jul 10, 2020 at 10:20 PM Scott Lamb notifications@github.com wrote:

Did a remove operation fail? I thought it was just that it isn't trying to remove files at the moment. Either it's currently under the limit (and as mentioned before it only compares the fully written/flushed files, so if you don't have enough slack for files that are in flight or being garbage-collected it can be "under the limit" even when the filesystem is completely full) or the remove operation is queued behind a write that's in a retry loop.

btw, keep in mind that raw size of the partition can be pretty different from how much is available to non-root users to write. There's filesystem overhead, and there's normally a 5% reserve for root (configurable in the mkfs.ext4 command). That's why df says 7.6T / 8.0T (in other words, 95%) and 100%. Moonfire NVR does handle this part right; it won't let you set limits that total more than 7.6T.

— You are receiving this because you authored the thread. Reply to this email directly, view it on GitHub https://github.com/scottlamb/moonfire-nvr/issues/84#issuecomment-656992824, or unsubscribe https://github.com/notifications/unsubscribe-auth/AABXP4MXBUT4BR4AVPD3P5LR27ZDJANCNFSM4NYV7I6A .

-- John L. Poole

707-812-1323 jlpoole56@gmail.com

jlpoolen commented 4 years ago

For reference: Using the Disk Capacity Calculator at http://www.endmemo.com/data/diskcapacity.php I compiled this table.

 Advertised Actual  GB Actual TB  Percent
 12 TB = 11175.8709 GB 10.9139TB  9.05
 8  TB = 7450.5806  GB 7.276  TB  9.05
 4  TB = 3725.2903  GB 3.638  TB  9.05
 2  TB = 1862.6451  GB 1.819  TB  9.05
 1  TB = 931.3226   GB 0.9095 TB  9.05

The current version of the instructions for setting up disk space are:

Assign disk space to your cameras back in "Directories and retention". Leave a little slack (at least 100 MB per camera) between the total limit and the filesystem capacity, even if you store nothing else on the disk. There are several reasons this is needed:

The limit currently controls fully-written files only. There will be up to two minutes of video per camera of additional video.
The rotation happens after the limit is exceeded, not proactively.
Moonfire NVR currently doesn't account for the unused space in the final filesystem block at the end of each file.
Moonfire NVR doesn't account for the space used for directory listings.
If a file is open when it is deleted (such as if a HTTP client is downloading it), it stays around until the file is closed. Moonfire NVR currently doesn't account for this.

When I have in mind specifying 2TB, should I have used used a value of 1819GB instead of 2TB as the value?

scottlamb commented 4 years ago

When I have in mind specifying 2TB, should I have used used a value of 1819GB instead of 2TB as the value?

Yeah, that's part of where your space went as well. There's no documentation/help text explaining this now, but the config interface's T, G, M, and K correspond to the binary prefixed units TiB, GiB, MiB, and KiB, respectively, decimal prefix units TB, GB, MB, KB like the hard drive manufacturers use.

Maybe it'd be better to just use powers-of-10 units. The hard drive manufacturers use these, and network bandwidth (as in the cameras' bitrate) is always expressed in powers-of-10 units. The comparison with powers-of-2 df -h will seem off, but I just realized there's also a df -H variant that uses powers of 10. /shruggie

What's your goal now? Are you trying to determine the correct amount of slack space to use? I'd say:

  1. start from Moonfire NVR's idea of how much filesystem capacity there is. This will be expressed in the same units you enter the configured amount back in, and it will have already subtracted the appropriate amount for filesystem overhead and the superuser reserve.
  2. Then subtract out the right amount of slack space.

The "100 MB per camera" guesstimate of required slack space was probably too low. I have a lot more slack on my own main system's two drives:

image

(That's ~100 GiB or ~2% slack on that drive.)

image

(That's ~300 GiB or ~5% slack on that drive. Usage hasn't hit the limit yet because I replaced a failed drive not that long ago.)

What's the configured max bitrate on your cameras? (and are you just doing the main stream on each camera or also the sub stream?) I think you want to have enough space for whatever recording is being written—up to two minutes per camera at that bitrate—plus however many recordings are pending removal. Maybe another recording per camera—so now we're up to four minutes—because it's not proactive. And then add in about flush_if_sec seconds because the garbage collection can get deferred by up to that long. So now we're at 4*60 + flush_if_sec seconds at that bitrate. And then maybe add in that 100 MB again to conservatively handle the other things. I think this is a pretty modest overhead on a 2 TB (or 2 TiB) partition.

I guess there's also the filesystem block overhead part, which isn't really a fixed overhead but on average half a block times the number of recordings. It should be relatively easy though for me to just fix the logic for this part. I can keep track of the space for each recording rounded up to a filesystem block, and then this doesn't have to be documented / manually accounted for anymore.

scottlamb commented 4 years ago

Maybe it'd be easier if the config interface displayed and accepted everything in terms of percentages of filesystem capacity, instead of byte sizes.

jlpoolen commented 4 years ago

I like the idea of determining bitrate and using that as a value in arriving at a total. It is also something a user ought to address. The concept of maximum bit-rate is really like knowing how many gallons a fire hose shoots out and an assessment of that ought to be made by the person configuring the storage space. I had not given it much thought until now. My bit rates at ~2:00 p.m. in the afternoon are 8.2 Mbps; at night it gets lower. garage_west main StartEndResolutionFPSStorageBitRate 07/12/20 13:50 07/12/20 13:58 2560x1440 30 556.4 MB 8.2 Mbps garage_east main StartEndResolutionFPSStorageBitRate 07/12/20 13:50 07/12/20 13:58 2560x1440 30 541.7 MB 8.2 Mbps peck_east main StartEndResolutionFPSStorageBitRate 07/12/20 13:50 07/12/20 13:58 2560x1920 30 542 MB 8.4 Mbps peck_west main StartEndResolutionFPSStorageBitRate 07/12/20 13:50 07/12/20 13:58 2560x1920 30 526.8 MB 8.4 Mbps

I have not addressed substreams as I'm not sure what those are. I'm only aware of a stream of video and a stream of audio. Maybe my camera offers more compressed streams at lower bit rates, but I'm pushing the limit using the maximums. My streams are as above using the h.264 codec. All of this analysis should include audio so at such time as Moonfire-nvr supports audio storage, then one would not have to reconfigure. The tricky part looks to be alloting enough scratch/working space to handle the deleting the old and buffering the just-captured along with the garbage collection. Your point about the garbage collecting being deferred for up to the flush_if_sec which in my case is 120 seconds is a good one.

I think I want to set up a really small test area on the remaining space of my purple drive and work through the calculations and incorporate your points. It looks like the number of cameras utilizing the same disk is going to be part of the equation.

I feel badly distracting you with this area, it takes away your focus on things like incorporating audio streams (I like your suggestion of simply being a pass-through for audio - once you have that, then improvements such as compression can come later).

Cheers. On Sun, Jul 12, 2020 at 1:51 PM Scott Lamb notifications@github.com wrote:

Maybe it'd be easier if the config interface displayed and accepted everything in terms of percentages of filesystem capacity, instead of byte sizes.

— You are receiving this because you authored the thread. Reply to this email directly, view it on GitHub https://github.com/scottlamb/moonfire-nvr/issues/84#issuecomment-657273401, or unsubscribe https://github.com/notifications/unsubscribe-auth/AABXP4ODYSYHVY3IRZHIWLDR3IO4DANCNFSM4NYV7I6A .

-- John L. Poole

707-812-1323 jlpoole56@gmail.com

scottlamb commented 4 years ago

I feel badly distracting you with this area

No worries about that. These sorts of polish things are important to figure out before any kind of release announcement. I don't think other folks will be as patient with these rough edges are you are.

scottlamb commented 4 years ago

My bit rates at ~2:00 p.m. in the afternoon are 8.2 Mbps; at night it gets lower.

You should be able to find in your camera's UI a configured maximum rate which is probably a little higher than this. It's actually possible for Moonfire NVR to learn it from the camera in either of a couple ways:

Really we should start doing the latter when we are writing to automatically calculate the worst-case bytes we'll write before the next flush.

I just pushed a commit to take care of the filesystem block usage thing, although I realize now it's a tiny contribution to the required slack space.

scottlamb commented 3 years ago

I think everything here has been spun off into other issues or addressed now. In particular, of the stuff I mentioned in this comment:

Please let me know if I missed anything.