Open scottlamb opened 3 years ago
Actually, that's a lot better than ffmpeg already. /shruggie With ffmpeg for a minute, there are ~8000 times per second the buffer filled.
$ sudo ./recvsize.bt "$(pidof moonfire-nvr)"
Attaching 3 probes...
@full_read_sizes:
[1] 161433 |@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@|
[2, 4) 159932 |@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@ |
[4, 8) 59 | |
[8, 16) 12876 |@@@@ |
[16, 32) 5692 |@ |
[32, 64) 27575 |@@@@@@@@ |
[64, 128) 11172 |@@@ |
[128, 256) 6422 |@@ |
[256, 512) 7465 |@@ |
[512, 1K) 10387 |@@@ |
[1K, 2K) 78285 |@@@@@@@@@@@@@@@@@@@@@@@@@ |
Looking at actual CPU rates (by diffing /sys/fs/cgroup/cpu/system.slice/moonfire-nvr.service/cpuacct.usage
):
config | cpu usage (% of one core) |
---|---|
ffmpeg | 16% |
retina, multi-threaded tokio runtime, 4 threads | 21% |
retina, multi-threaded tokio runtime, 1 thread | 14% |
retina, current thread tokio runtime | 12% |
tl;dr version of investigating where that CPU is going: has more to do Moonfire NVR's current tokio and thread setup than with retina.
flamegraph says tokio wastes a bit of CPU on sched_yield
here; it's less with fewer threads, and it goes away with the current-thread runtime. I think this sched_yield loop is silly and maybe will eventually convince tokio folks of that.
There's also a fair bit of thread handoffs because currently I'm using retina in the tokio threads and handing off to a thread per stream every frame to write data. Eventually I'll have one writer thread per sample file directory (2 instead of 12 in this deployment) and only write once per GOP (every 1 or 2 seconds instead of every 1/10th to 1/30th of a second), which will reduce memory.
I was mildly surprised Moonfire NVR's CPU usage didn't go down noticeably when switching from ffmpeg to retina. There's not much CPU used in retina's code itself but I think it's making too many syscalls because it's using buffers that have too little available space. The histogram below counts reads that filled the buffer (and thus will require a follow-up syscall) bucketed by the available space in the buffer when the read started:
That's ~230 times per second the buffer filled (across 12 video streams); in most cases it was reading into a buffer with less than 1 KiB available.
I'm letting
tokio_util::codec::Framed
do the buffer management now, but I think I should do it myself instead. Or at least callreserve(4096)
before returning fromCodec::decode
(regardless of whether it was able to pull a full message).