use larger read buffers

scottlamb / retina

High-level RTSP multimedia streaming library, in Rust

Apache License 2.0

244 stars 48 forks source link

[slamb@nuc ~]$ cat recvsize.bt #!/usr/bin/bpftrace tracepoint:syscalls:sys_enter_recvfrom /pid == (uint64) $1/ { @sizes[tid] = (int64) args->size; } tracepoint:syscalls:sys_exit_recvfrom /pid == (uint64) $1/ { if (@sizes[tid] > 0 && args->ret == @sizes[tid]) { @full_read_sizes = hist(@sizes[tid]); } delete(@sizes[tid]); } interval:s:60 { exit() } [slamb@nuc ~]$ sudo ./recvsize.bt "$(pidof moonfire-nvr)" Attaching 3 probes... @full_read_sizes: [1] 1 | | [2, 4) 0 | | [4, 8) 6 | | [8, 16) 16 | | [16, 32) 47 | | [32, 64) 125 | | [64, 128) 267 |@ | [128, 256) 531 |@@@ | [256, 512) 1119 |@@@@@@ | [512, 1K) 8319 |@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@| [1K, 2K) 3132 |@@@@@@@@@@@@@@@@@@@ | [2K, 4K) 149 | | [4K, 8K) 42 | |

Actually, that's a lot better than ffmpeg already. /shruggie With ffmpeg for a minute, there are ~8000 times per second the buffer filled.

$ sudo ./recvsize.bt "$(pidof moonfire-nvr)"
Attaching 3 probes...

@full_read_sizes:
[1]               161433 |@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@|
[2, 4)            159932 |@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@ |
[4, 8)                59 |                                                    |
[8, 16)            12876 |@@@@                                                |
[16, 32)            5692 |@                                                   |
[32, 64)           27575 |@@@@@@@@                                            |
[64, 128)          11172 |@@@                                                 |
[128, 256)          6422 |@@                                                  |
[256, 512)          7465 |@@                                                  |
[512, 1K)          10387 |@@@                                                 |
[1K, 2K)           78285 |@@@@@@@@@@@@@@@@@@@@@@@@@                           |

Looking at actual CPU rates (by diffing /sys/fs/cgroup/cpu/system.slice/moonfire-nvr.service/cpuacct.usage):

config	cpu usage (% of one core)
ffmpeg	16%
retina, multi-threaded tokio runtime, 4 threads	21%
retina, multi-threaded tokio runtime, 1 thread	14%
retina, current thread tokio runtime	12%

tl;dr version of investigating where that CPU is going: has more to do Moonfire NVR's current tokio and thread setup than with retina.

flamegraph says tokio wastes a bit of CPU on sched_yield here; it's less with fewer threads, and it goes away with the current-thread runtime. I think this sched_yield loop is silly and maybe will eventually convince tokio folks of that.

There's also a fair bit of thread handoffs because currently I'm using retina in the tokio threads and handing off to a thread per stream every frame to write data. Eventually I'll have one writer thread per sample file directory (2 instead of 12 in this deployment) and only write once per GOP (every 1 or 2 seconds instead of every 1/10th to 1/30th of a second), which will reduce memory.

scottlamb / retina

use larger read buffers #5