motioneye-project / motioneye

A web frontend for the motion daemon.
GNU General Public License v3.0
3.93k stars 652 forks source link

RPi camera modules | libcamerify causes motion sh childs to become zombies #2900

Open kni-bo opened 8 months ago

kni-bo commented 8 months ago

Hello,

Brief description:

with motioneye dev I have the problem that zobie processes remain in the system. From about 700 zobies the memory runs full and the system stops.

Detailed description:

As hardware I have a raspberry Pi 3 B+ with a V1 IR camera (ov5647). The software I use is Raspberry Pi OS Lite 64-bit with motioneye dev. I had to use libcamerify and python-venv to get this running. As reported for the Raspberry Pi 4, the codec H.264/OMX does not work with the Pi 3, too. I use H.264. The camera is detected and I can record images and videos.

One minute after restarting motioneye (systemctl restart motioneye.service), the processes look like this:

root@pi3:~# ps aux | head -1 # only for header
USER         PID %CPU %MEM    VSZ   RSS TTY      STAT START   TIME COMMAND
root@pi3:~# ps aux | grep motion
motion       604  1.8  4.7 477116 44568 ?        SLsl Dec21  20:10 /usr/bin/motion
motion    134807  1.8  9.8 357920 91972 ?        Ssl  13:46   0:03 /home/python-venv/bin/python /home/python-venv/bin/meyectl startserver -c /etc/motioneye/motioneye.conf
motion    134836 22.7  7.8 849104 73076 ?        SLl  13:46   0:40 /usr/bin/motion -n -c /etc/motioneye/motion.conf -d 5
motion    134849  0.0  8.0 136496 74712 ?        S    13:46   0:00 /home/python-venv/bin/python /home/python-venv/bin/meyectl startserver -c /etc/motioneye/motioneye.conf
motion    134902  0.0  0.0      0     0 ?        Zs   13:47   0:00 [sh] <defunct>
motion    134960  0.0  0.0      0     0 ?        Zs   13:47   0:00 [sh] <defunct>
motion    134961  0.0  0.0      0     0 ?        Zs   13:47   0:00 [sh] <defunct>
root      135191  0.0  0.2   6088  1928 pts/0    S+   13:49   0:00 grep motion

After a few hours or a day (depending on the number of motions detected) there are over 700 zombies. Then the zombies stop growing, but the memory fills up and the system stops.

I have written a small watchdog so that I don't always have to pull the power plug. The watchdog reboots the system before the memory is full and the system stops and collects some data. Then the processes look like this:

root@pi3:~# ps aux | head -1 # only for header
USER         PID %CPU %MEM    VSZ   RSS TTY      STAT START   TIME COMMAND
root@pi3:~/watchdog# grep motion debug_2023-12-21_19-26-02.txt | head
motion       569  0.2  5.2 368968 48840 ?        Ssl  Dec20   2:35 /home/python-venv/bin/python /home/python-venv/bin/meyectl startserver -c /etc/motioneye/motioneye.conf
motion       586  1.8  1.8 477116 17572 ?        SLsl Dec20  21:42 /usr/bin/motion
motion       970 22.7 43.3 2763312 403000 ?      SLl  Dec20 267:09 /usr/bin/motion -n -c /etc/motioneye/motion.conf -d 5
motion       980  0.0  1.9 136496 17936 ?        S    Dec20   0:01 /home/python-venv/bin/python /home/python-venv/bin/meyectl startserver -c /etc/motioneye/motioneye.conf
motion       984  0.0  0.0      0     0 ?        Zs   Dec20   0:00 [sh] <defunct>
motion      1046  0.0  0.0      0     0 ?        Zs   Dec20   0:00 [sh] <defunct>
motion      1047  0.0  0.0      0     0 ?        Zs   Dec20   0:00 [sh] <defunct>
motion      1177  0.0  0.0      0     0 ?        Zs   Dec20   0:00 [sh] <defunct>
motion      1266  0.0  0.0      0     0 ?        Zs   Dec20   0:00 [sh] <defunct>
motion      1267  0.0  0.0      0     0 ?        Zs   Dec20   0:00 [sh] <defunct>

root@pi3:~/watchdog# grep Zs debug_2023-12-21_19-26-02.txt | wc -l
741

I found out that it is enough to restart motioneye (systemctl restart motioneye.service) to kill all zobies and free the memory. So i changed my watchdog to do that instead of rebooting.

The following lines are repeated every 10 seconds in the journal:

Dec 27 12:00:10 pi3 motion[604]: [1:ml1] [WRN] [ALL] mlp_retry: Retrying until successful connection with camera
Dec 27 12:00:10 pi3 motion[604]: [1:ml1] [NTC] [VID] vid_start: Opening V4L2 device
Dec 27 12:00:10 pi3 motion[604]: [1:ml1] [NTC] [VID] v4l2_device_open: Using videodevice /dev/video0 and input -1
Dec 27 12:00:10 pi3 motion[604]: [1:ml1] [NTC] [VID] v4l2_device_capability: - VIDEO_CAPTURE
Dec 27 12:00:10 pi3 motion[604]: [1:ml1] [NTC] [VID] v4l2_device_capability: - READWRITE
Dec 27 12:00:10 pi3 motion[604]: [1:ml1] [NTC] [VID] v4l2_device_capability: - STREAMING
Dec 27 12:00:10 pi3 motion[604]: [1:ml1] [NTC] [VID] v4l2_input_select: Name = "unicam-image"- CAMERA
Dec 27 12:00:10 pi3 motion[604]: [1:ml1] [NTC] [VID] v4l2_norm_select: Device does not support specifying PAL/NTSC norm
Dec 27 12:00:10 pi3 motion[604]: [1:ml1] [NTC] [VID] v4l2_pixfmt_try: Unable to use YU12 (640x480)
Dec 27 12:00:10 pi3 motion[604]: [1:ml1] [NTC] [VID] v4l2_pixfmt_select: Configuration palette index 17 (YU12) for 640x480 doesn't work.
Dec 27 12:00:10 pi3 motion[604]: [1:ml1] [NTC] [VID] v4l2_pixfmt_select: Supported palettes:
Dec 27 12:00:10 pi3 motion[604]: [1:ml1] [NTC] [VID] v4l2_pixfmt_select: (0) YUYV (YUYV 4:2:2)
Dec 27 12:00:10 pi3 motion[604]: [1:ml1] [NTC] [VID] v4l2_pixfmt_select: (1) UYVY (UYVY 4:2:2)
Dec 27 12:00:10 pi3 motion[604]: [1:ml1] [NTC] [VID] v4l2_pixfmt_select: (2) YVYU (YVYU 4:2:2)
Dec 27 12:00:10 pi3 motion[604]: [1:ml1] [NTC] [VID] v4l2_pixfmt_select: (3) VYUY (VYUY 4:2:2)
Dec 27 12:00:10 pi3 motion[604]: [1:ml1] [NTC] [VID] v4l2_pixfmt_select: (4) RGBP (16-bit RGB 5-6-5)
Dec 27 12:00:10 pi3 motion[604]: [1:ml1] [NTC] [VID] v4l2_pixfmt_select: (5) RGBR (16-bit RGB 5-6-5 BE)
Dec 27 12:00:10 pi3 motion[604]: [1:ml1] [NTC] [VID] v4l2_pixfmt_select: (6) RGBO (16-bit A/XRGB 1-5-5-5)
Dec 27 12:00:10 pi3 motion[604]: [1:ml1] [NTC] [VID] v4l2_pixfmt_select: (7) RGBQ (16-bit A/XRGB 1-5-5-5 BE)
Dec 27 12:00:10 pi3 motion[604]: [1:ml1] [NTC] [VID] v4l2_pixfmt_select: (8) RGB3 (24-bit RGB 8-8-8)
Dec 27 12:00:10 pi3 motion[604]: [1:ml1] [NTC] [VID] v4l2_pixfmt_select: (9) BGR3 (24-bit BGR 8-8-8)
Dec 27 12:00:10 pi3 motion[604]: [1:ml1] [NTC] [VID] v4l2_pixfmt_select: (10) RGB4 (32-bit A/XRGB 8-8-8-8)
Dec 27 12:00:10 pi3 motion[604]: [1:ml1] [NTC] [VID] v4l2_pixfmt_select: (11) BA81 (8-bit Bayer BGBG/GRGR)
Dec 27 12:00:10 pi3 motion[604]: [1:ml1] [NTC] [VID] v4l2_pixfmt_select: (12) GBRG (8-bit Bayer GBGB/RGRG)
Dec 27 12:00:10 pi3 motion[604]: [1:ml1] [NTC] [VID] v4l2_pixfmt_select: (13) GRBG (8-bit Bayer GRGR/BGBG)
Dec 27 12:00:10 pi3 motion[604]: [1:ml1] [NTC] [VID] v4l2_pixfmt_select: (14) RGGB (8-bit Bayer RGRG/GBGB)
Dec 27 12:00:10 pi3 motion[604]: [1:ml1] [NTC] [VID] v4l2_pixfmt_select: (15) pBAA (10-bit Bayer BGBG/GRGR Packed)
Dec 27 12:00:10 pi3 motion[604]: [1:ml1] [NTC] [VID] v4l2_pixfmt_select: (16) BG10 (10-bit Bayer BGBG/GRGR)
Dec 27 12:00:10 pi3 motion[604]: [1:ml1] [NTC] [VID] v4l2_pixfmt_select: (17) pGAA (10-bit Bayer GBGB/RGRG Packed)
Dec 27 12:00:10 pi3 motion[604]: [1:ml1] [NTC] [VID] v4l2_pixfmt_select: (17) pGAA (10-bit Bayer GBGB/RGRG Packed)
Dec 27 12:00:10 pi3 motion[604]: [1:ml1] [NTC] [VID] v4l2_pixfmt_select: (18) GB10 (10-bit Bayer GBGB/RGRG)
Dec 27 12:00:10 pi3 motion[604]: [1:ml1] [NTC] [VID] v4l2_pixfmt_select: (19) pgAA (10-bit Bayer GRGR/BGBG Packed)
Dec 27 12:00:10 pi3 motion[604]: [1:ml1] [NTC] [VID] v4l2_pixfmt_select: (20) BA10 (10-bit Bayer GRGR/BGBG)
Dec 27 12:00:10 pi3 motion[604]: [1:ml1] [NTC] [VID] v4l2_pixfmt_select: (21) pRAA (10-bit Bayer RGRG/GBGB Packed)
Dec 27 12:00:10 pi3 motion[604]: [1:ml1] [NTC] [VID] v4l2_pixfmt_select: (22) RG10 (10-bit Bayer RGRG/GBGB)
Dec 27 12:00:10 pi3 motion[604]: [1:ml1] [NTC] [VID] v4l2_pixfmt_select: (23) pBCC (12-bit Bayer BGBG/GRGR Packed)
Dec 27 12:00:10 pi3 motion[604]: [1:ml1] [NTC] [VID] v4l2_pixfmt_select: (24) BG12 (12-bit Bayer BGBG/GRGR)
Dec 27 12:00:10 pi3 motion[604]: [1:ml1] [NTC] [VID] v4l2_pixfmt_select: (25) pGCC (12-bit Bayer GBGB/RGRG Packed)
Dec 27 12:00:10 pi3 motion[604]: [1:ml1] [NTC] [VID] v4l2_pixfmt_select: (26) GB12 (12-bit Bayer GBGB/RGRG)
Dec 27 12:00:10 pi3 motion[604]: [1:ml1] [NTC] [VID] v4l2_pixfmt_select: (27) pgCC (12-bit Bayer GRGR/BGBG Packed)
Dec 27 12:00:10 pi3 motion[604]: [1:ml1] [NTC] [VID] v4l2_pixfmt_select: (28) BA12 (12-bit Bayer GRGR/BGBG)
Dec 27 12:00:10 pi3 motion[604]: [1:ml1] [NTC] [VID] v4l2_pixfmt_select: (29) pRCC (12-bit Bayer RGRG/GBGB Packed)
Dec 27 12:00:10 pi3 motion[604]: [1:ml1] [NTC] [VID] v4l2_pixfmt_select: (30) RG12 (12-bit Bayer RGRG/GBGB)
Dec 27 12:00:10 pi3 motion[604]: [1:ml1] [NTC] [VID] v4l2_pixfmt_select: (31) pBEE (14-bit Bayer BGBG/GRGR Packed)
Dec 27 12:00:10 pi3 motion[604]: [1:ml1] [NTC] [VID] v4l2_pixfmt_select: (32) BG14 (14-bit Bayer BGBG/GRGR)
Dec 27 12:00:10 pi3 motion[604]: [1:ml1] [NTC] [VID] v4l2_pixfmt_select: (33) pGEE (14-bit Bayer GBGB/RGRG Packed)
Dec 27 12:00:10 pi3 motion[604]: [1:ml1] [NTC] [VID] v4l2_pixfmt_select: (34) GB14 (14-bit Bayer GBGB/RGRG)
Dec 27 12:00:10 pi3 motion[604]: [1:ml1] [NTC] [VID] v4l2_pixfmt_select: (35) pgEE (14-bit Bayer GRGR/BGBG Packed)
Dec 27 12:00:10 pi3 motion[604]: [1:ml1] [NTC] [VID] v4l2_pixfmt_select: (36) GR14 (14-bit Bayer GRGR/BGBG)
Dec 27 12:00:10 pi3 motion[604]: [1:ml1] [NTC] [VID] v4l2_pixfmt_select: (37) pREE (14-bit Bayer RGRG/GBGB Packed)
Dec 27 12:00:10 pi3 motion[604]: [1:ml1] [NTC] [VID] v4l2_pixfmt_select: (38) RG14 (14-bit Bayer RGRG/GBGB)
Dec 27 12:00:10 pi3 motion[604]: [1:ml1] [NTC] [VID] v4l2_pixfmt_select: (39) GREY (8-bit Greyscale)
Dec 27 12:00:10 pi3 motion[604]: [1:ml1] [NTC] [VID] v4l2_pixfmt_select: (40) Y10P (10-bit Greyscale (MIPI Packed))
Dec 27 12:00:10 pi3 motion[604]: [1:ml1] [NTC] [VID] v4l2_pixfmt_select: (41) Y10  (10-bit Greyscale)
Dec 27 12:00:10 pi3 motion[604]: [1:ml1] [NTC] [VID] v4l2_pixfmt_select: (42) Y12P (12-bit Greyscale (MIPI Packed))
Dec 27 12:00:10 pi3 motion[604]: [1:ml1] [NTC] [VID] v4l2_pixfmt_select: (43) Y12  (12-bit Greyscale)
Dec 27 12:00:10 pi3 motion[604]: [1:ml1] [NTC] [VID] v4l2_pixfmt_select: (44) Y14P (14-bit Greyscale (MIPI Packed))
Dec 27 12:00:10 pi3 motion[604]: [1:ml1] [NTC] [VID] v4l2_pixfmt_select: (45) Y14  (14-bit Greyscale)
Dec 27 12:00:10 pi3 motion[604]: [1:ml1] [NTC] [VID] v4l2_pixfmt_try: Testing palette Y12  (640x480)
Dec 27 12:00:10 pi3 motion[604]: [1:ml1] [ERR] [VID] v4l2_pixfmt_set: Error setting pixel format.: Device or resource busy
Dec 27 12:00:10 pi3 motion[604]: [1:ml1] [ERR] [VID] v4l2_pixfmt_select: Palette selection failed for format Y12
Dec 27 12:00:10 pi3 motion[604]: [1:ml1] [ERR] [VID] v4l2_pixfmt_select: Unable to find a compatible palette format.
Dec 27 12:00:10 pi3 motion[604]: [1:ml1] [ERR] [VID] vid_start: V4L2 device failed to open

What can I do to further narrow down and solve the problem?

Many thanks in advance. Michael

MichaIng commented 8 months ago

How does your service look like? The logs show some issue with motion trying to connect/get the stream from the camera but there are also two concurrent motionEye processes.

systemctl cat motioneye
journalctl -u motion

The camera is attached via DSI port and you have KMS enabled (default on RPi OS)?

kni-bo commented 8 months ago

Hello, Firstly, thank you very much for your quick reply.

root@pi3:~# systemctl cat motioneye
# /etc/systemd/system/motioneye.service
[Unit]
Description=motionEye Server
After=network.target local-fs.target remote-fs.target

[Service]
User=motion
RuntimeDirectory=motioneye
LogsDirectory=motioneye
StateDirectory=motioneye
ExecStart=/usr/bin/libcamerify /home/python-venv/bin/meyectl startserver -c /etc/motioneye/motioneye.conf
Restart=on-abort

[Install]
WantedBy=multi-user.target

Above I have posted the part from the journal that is always repeated. That's why the journal is huge. Apart from the messages above, I don't see anything conspicuous.

The camera is connected via the flat cable. I think this is the DSI port.

I don't know that I've set anything like KMS. If that is the standard, it should still be like that.

Thanks Michael

kni-bo commented 8 months ago

I found this:

root@pi3:~# grep kms /boot/config.txt 
dtoverlay=vc4-kms-v3d
disable_fw_kms_setup=1
MichaIng commented 8 months ago

The 64-bit RPi OS does not ship the OMX driver anymore and it does not work with KMS so generally makes sense that it is the same issue as with RPi 4 on other models. You did add it as V4L2 camera instead of MMAL, right? Out of interest, which /dev/video* device did you select or does libcamerify do some magic to show a unique camera device that works?

For the venv I wonder whether it is not required to load the venv within a shell before you can really run an application through it? Generally it is also possible to bypass the install blocker on modern Debian/Ubuntu and install motionEye natively system-wide. See the updated install instruction in our dev/beta README. But I guess it makes sense to update it again and switch to venv at some point.

From the service log, can you check whether you see some line(s) from the libcamerify/python/meyectl binary, resp. the wrapping systemd handler, whether the main process is in a restart loop (instead of only a motion process)? There is Restart=on-abort, and probably one of the cascaded binaries is causing an abort signal and hence a service restart in turn, which is not related to motion. The "Device or resource busy" also sounds more that it's concurrent motioneye>motion processes blocking each other, while one does/would work just fine.

kni-bo commented 8 months ago

Hi, as far as i understand it correctly, since libcamera there is only the possibility to use the rapberry camera with motioneye with libcamerify and v4l. I found the hint with libcamerify here. As described i add the first camera in motioneye.

Screenshot_2023-12-27_19-09-36_422x191

When I did the installation, the instructions were not yet updated and I got the message:

error: externally-managed-environment

× This environment is externally managed
╰─> To install Python packages system-wide, try apt install
    python3-xyz, where xyz is the package you are trying to
    install.

    If you wish to install a non-Debian-packaged Python package,
    create a virtual environment using python3 -m venv path/to/venv.
    Then use path/to/venv/bin/python and path/to/venv/bin/pip. Make
    sure you have python3-full installed.

    For more information visit http://rptl.io/venv

note: If you believe this is a mistake, please contact your Python installation or OS distribution provider. You can override this, at the risk of breaking your Python installation or OS, by passing --break-system-packages.
hint: See PEP 668 for the detailed specification.

Then I did the following:

python -m venv /home/python-venv
/home/python-venv/bin/pip install 'https://github.com/motioneye-project/motioneye/archive/dev.tar.gz'
/home/python-venv/bin/motioneye_init

vim /etc/systemd/system/motioneye.service
- ExecStart=/usr/local/bin/meyectl startserver -c /etc/motioneye/motioneye.conf
+ ExecStart=/usr/bin/libcamerify /home/python-venv/bin/meyectl startserver -c /etc/motioneye/motioneye.conf

systemctl daemon-reload
systemctl enable motioneye --now

As I had struggled from problem to problem until then, I was glad that it worked :-)

I have restarted the system once and searched the journal for user motion. Except the messages every 10 seconds as posted above, I don't see any messages from the motion. However, 200 zombies have been created during this time.

Thamks Michael

kni-bo commented 8 months ago

Good morning, perhaps also important. The number of zombies only seems to increase when movement is detected and video is recorded.

Thanks Michael

kni-bo commented 8 months ago

Hi, I went back to the journal from the boot to the Lopp and tried to delete everything that doesn't seem important for this case. Maybe there are a few interesting lines.

Dec 27 14:57:40 pi3 kernel: Linux version 6.1.0-rpi7-rpi-v8 (debian-kernel@lists.debian.org) (gcc-12 (Debian 12.2.0-14) 12.2.0, GNU ld (GNU Binutils for Debian) 2.40) #1 SMP PREEMPT Debian 1:6.1.63-1+rpt1 (2023-11-24)
Dec 27 14:57:40 pi3 kernel: random: crng init done
Dec 27 14:57:40 pi3 kernel: Machine model: Raspberry Pi 3 Model B Plus Rev 1.3
...
Dec 27 14:57:43 pi3 kernel: mc: Linux media interface: v0.10
Dec 27 14:57:43 pi3 kernel: vc_sm_cma: module is from the staging directory, the quality is unknown, you have been warned.
...
Dec 27 14:57:43 pi3 kernel: videodev: Linux video capture interface: v2.00
...
Dec 27 14:57:43 pi3 kernel: snd_bcm2835: module is from the staging directory, the quality is unknown, you have been warned.
...
Dec 27 14:57:43 pi3 kernel: bcm2835_mmal_vchiq: module is from the staging directory, the quality is unknown, you have been warned.
...
Dec 27 14:57:43 pi3 kernel: bcm2835_isp: module is from the staging directory, the quality is unknown, you have been warned.
Dec 27 14:57:43 pi3 kernel: bcm2835_v4l2: module is from the staging directory, the quality is unknown, you have been warned.
...
Dec 27 14:57:43 pi3 kernel: bcm2835-isp bcm2835-isp: Device node output[0] registered as /dev/video13
Dec 27 14:57:43 pi3 kernel: bcm2835-isp bcm2835-isp: Device node capture[0] registered as /dev/video14
Dec 27 14:57:43 pi3 kernel: bcm2835-isp bcm2835-isp: Device node capture[1] registered as /dev/video15
Dec 27 14:57:43 pi3 kernel: ov5647 10-0036: Consider updating driver ov5647 to match on endpoints
Dec 27 14:57:43 pi3 kernel: bcm2835-isp bcm2835-isp: Device node stats[2] registered as /dev/video16
Dec 27 14:57:43 pi3 kernel: bcm2835-isp bcm2835-isp: Register output node 0 with media controller
Dec 27 14:57:43 pi3 kernel: bcm2835-isp bcm2835-isp: Register capture node 1 with media controller
Dec 27 14:57:43 pi3 kernel: bcm2835-isp bcm2835-isp: Register capture node 2 with media controller
Dec 27 14:57:43 pi3 kernel: bcm2835-isp bcm2835-isp: Register capture node 3 with media controller
Dec 27 14:57:43 pi3 kernel: bcm2835-isp bcm2835-isp: Device node output[0] registered as /dev/video20
Dec 27 14:57:43 pi3 kernel: bcm2835-isp bcm2835-isp: Device node capture[0] registered as /dev/video21
Dec 27 14:57:43 pi3 kernel: bcm2835-isp bcm2835-isp: Device node capture[1] registered as /dev/video22
Dec 27 14:57:43 pi3 kernel: bcm2835-isp bcm2835-isp: Device node stats[2] registered as /dev/video23
Dec 27 14:57:43 pi3 kernel: bcm2835-isp bcm2835-isp: Register output node 0 with media controller
Dec 27 14:57:43 pi3 kernel: bcm2835-isp bcm2835-isp: Register capture node 1 with media controller
Dec 27 14:57:43 pi3 kernel: bcm2835-isp bcm2835-isp: Register capture node 2 with media controller
Dec 27 14:57:43 pi3 kernel: bcm2835-isp bcm2835-isp: Register capture node 3 with media controller
Dec 27 14:57:43 pi3 kernel: bcm2835-isp bcm2835-isp: Loaded V4L2 bcm2835-isp
...
Dec 27 14:57:43 pi3 kernel: bcm2835_codec: module is from the staging directory, the quality is unknown, you have been warned.
...
Dec 27 14:57:43 pi3 kernel: bcm2835-codec bcm2835-codec: Device registered as /dev/video10
Dec 27 14:57:43 pi3 kernel: bcm2835-codec bcm2835-codec: Loaded V4L2 decode
...
Dec 27 14:57:43 pi3 kernel: bcm2835-codec bcm2835-codec: Device registered as /dev/video11
Dec 27 14:57:43 pi3 kernel: bcm2835-codec bcm2835-codec: Loaded V4L2 encode
Dec 27 14:57:43 pi3 kernel: bcm2835-codec bcm2835-codec: Device registered as /dev/video12
Dec 27 14:57:43 pi3 kernel: bcm2835-codec bcm2835-codec: Loaded V4L2 isp
...
Dec 27 14:57:43 pi3 kernel: bcm2835-codec bcm2835-codec: Device registered as /dev/video18
Dec 27 14:57:43 pi3 kernel: bcm2835-codec bcm2835-codec: Loaded V4L2 image_fx
...
Dec 27 14:57:43 pi3 kernel: bcm2835-codec bcm2835-codec: Device registered as /dev/video31
Dec 27 14:57:43 pi3 kernel: bcm2835-codec bcm2835-codec: Loaded V4L2 encode_image
...
Dec 27 14:57:44 pi3 kernel: vc4-drm soc:gpu: bound 3f400000.hvs (ops vc4_hvs_ops [vc4])
...
Dec 27 14:57:44 pi3 kernel: rc rc0: vc4-hdmi as /devices/platform/soc/3f902000.hdmi/rc/rc0
Dec 27 14:57:44 pi3 kernel: input: vc4-hdmi as /devices/platform/soc/3f902000.hdmi/rc/rc0/input0
...
Dec 27 14:57:44 pi3 kernel: vc4-drm soc:gpu: bound 3f902000.hdmi (ops vc4_hdmi_ops [vc4])
Dec 27 14:57:44 pi3 kernel: vc4-drm soc:gpu: bound 3f004000.txp (ops vc4_txp_ops [vc4])
Dec 27 14:57:45 pi3 kernel: vc4-drm soc:gpu: bound 3f206000.pixelvalve (ops vc4_crtc_ops [vc4])
Dec 27 14:57:45 pi3 kernel: vc4-drm soc:gpu: bound 3f207000.pixelvalve (ops vc4_crtc_ops [vc4])
Dec 27 14:57:45 pi3 kernel: vc4-drm soc:gpu: bound 3f807000.pixelvalve (ops vc4_crtc_ops [vc4])
Dec 27 14:57:45 pi3 kernel: vc4-drm soc:gpu: bound 3fc00000.v3d (ops vc4_v3d_ops [vc4])
...
Dec 27 14:57:45 pi3 kernel: [drm] Initialized vc4 0.0.0 20140616 for soc:gpu on minor 0
Dec 27 14:57:45 pi3 kernel: vc4-drm soc:gpu: [drm] Cannot find any crtc or sizes
Dec 27 14:57:45 pi3 kernel: vc4-drm soc:gpu: [drm] Cannot find any crtc or sizes
...
Dec 27 14:57:46 pi3 systemd[1]: Starting motion.service - Motion - Security camera monitoring software....
Dec 27 14:57:46 pi3 systemd[1]: Started motioneye.service - motionEye Server.
...
Dec 27 14:57:47 pi3 systemd-logind[479]: Watching system buttons on /dev/input/event0 (vc4-hdmi)
...
Dec 27 14:57:51 pi3 motion[567]: [0:motion] [NTC] [ALL] conf_load: Processing thread 0 - config file /etc/motion/motion.conf
Dec 27 14:57:51 pi3 motion[567]: [0:motion] [NTC] [ALL] motion_startup: Logging to syslog
Dec 27 14:57:51 pi3 motion[567]: [0:motion] [NTC] [ALL] motion_startup: Motion 4.6.0 Started
Dec 27 14:57:51 pi3 motion[567]: [0:motion] [NTC] [ALL] motion_startup: Using default log type (ALL)
Dec 27 14:57:51 pi3 motion[567]: [0:motion] [NTC] [ALL] motion_startup: Using log type (ALL) log level (NTC)
Dec 27 14:57:51 pi3 motion[567]: [0:motion] [NTC] [STR] webu_start_strm: Starting all camera streams on port 8081
Dec 27 14:57:51 pi3 motion[567]: [0:motion] [NTC] [STR] webu_strm_ntc: Started camera 0 stream on port 8081
Dec 27 14:57:51 pi3 motion[567]: [0:motion] [NTC] [STR] webu_start_ctrl: Starting webcontrol on port 8080
Dec 27 14:57:51 pi3 motion[567]: [0:motion] [NTC] [STR] webu_start_ctrl: Started webcontrol on port 8080
Dec 27 14:57:51 pi3 motion[567]: [0:motion] [NTC] [ENC] ffmpeg_global_init: ffmpeg libavcodec version 59.37.100 libavformat version 59.27.100
Dec 27 14:57:51 pi3 motion[567]: [0:motion] [NTC] [ALL] translate_init: Language: English
Dec 27 14:57:51 pi3 motion[567]: [0:motion] [NTC] [ALL] motion_start_thread: Camera ID: 0 is from /etc/motion/motion.conf
Dec 27 14:57:51 pi3 motion[567]: [0:motion] [NTC] [ALL] motion_start_thread: Camera ID: 0 Camera Name: (null) Device: /dev/video0
Dec 27 14:57:51 pi3 motion[567]: [0:motion] [NTC] [ALL] main: Waiting for threads to finish, pid: 567
Dec 27 14:57:51 pi3 motion[567]: [1:ml1] [NTC] [ALL] motion_init: Camera 0 started: motion detection Enabled
Dec 27 14:57:51 pi3 motion[567]: [1:ml1] [NTC] [VID] vid_start: Opening V4L2 device
Dec 27 14:57:51 pi3 motion[567]: [1:ml1] [NTC] [VID] v4l2_device_open: Using videodevice /dev/video0 and input -1
Dec 27 14:57:51 pi3 motion[567]: [1:ml1] [NTC] [VID] v4l2_device_capability: - VIDEO_CAPTURE
Dec 27 14:57:51 pi3 motion[567]: [1:ml1] [NTC] [VID] v4l2_device_capability: - READWRITE
Dec 27 14:57:51 pi3 motion[567]: [1:ml1] [NTC] [VID] v4l2_device_capability: - STREAMING
Dec 27 14:57:51 pi3 motion[567]: [1:ml1] [NTC] [VID] v4l2_input_select: Name = "unicam-image"- CAMERA
Dec 27 14:57:51 pi3 motion[567]: [1:ml1] [NTC] [VID] v4l2_norm_select: Device does not support specifying PAL/NTSC norm
Dec 27 14:57:51 pi3 motion[567]: [1:ml1] [NTC] [VID] v4l2_pixfmt_try: Unable to use YU12 (640x480)
Dec 27 14:57:51 pi3 motion[567]: [1:ml1] [NTC] [VID] v4l2_pixfmt_select: Configuration palette index 17 (YU12) for 640x480 doesn't work.
Dec 27 14:57:51 pi3 motion[567]: [1:ml1] [NTC] [VID] v4l2_pixfmt_select: Supported palettes:
Dec 27 14:57:51 pi3 motion[567]: [1:ml1] [NTC] [VID] v4l2_pixfmt_select: (0) YUYV (YUYV 4:2:2)
Dec 27 14:57:51 pi3 motion[567]: [1:ml1] [NTC] [VID] v4l2_pixfmt_select: (1) UYVY (UYVY 4:2:2)
Dec 27 14:57:51 pi3 motion[567]: [1:ml1] [NTC] [VID] v4l2_pixfmt_select: (2) YVYU (YVYU 4:2:2)
Dec 27 14:57:51 pi3 motion[567]: [1:ml1] [NTC] [VID] v4l2_pixfmt_select: (3) VYUY (VYUY 4:2:2)
Dec 27 14:57:51 pi3 motion[567]: [1:ml1] [NTC] [VID] v4l2_pixfmt_select: (4) RGBP (16-bit RGB 5-6-5)
Dec 27 14:57:51 pi3 motion[567]: [1:ml1] [NTC] [VID] v4l2_pixfmt_select: (5) RGBR (16-bit RGB 5-6-5 BE)
Dec 27 14:57:51 pi3 motion[567]: [1:ml1] [NTC] [VID] v4l2_pixfmt_select: (6) RGBO (16-bit A/XRGB 1-5-5-5)
Dec 27 14:57:51 pi3 motion[567]: [1:ml1] [NTC] [VID] v4l2_pixfmt_select: (7) RGBQ (16-bit A/XRGB 1-5-5-5 BE)
Dec 27 14:57:51 pi3 motion[567]: [1:ml1] [NTC] [VID] v4l2_pixfmt_select: (8) RGB3 (24-bit RGB 8-8-8)
Dec 27 14:57:51 pi3 motion[567]: [1:ml1] [NTC] [VID] v4l2_pixfmt_select: (9) BGR3 (24-bit BGR 8-8-8)
Dec 27 14:57:51 pi3 motion[567]: [1:ml1] [NTC] [VID] v4l2_pixfmt_select: (10) RGB4 (32-bit A/XRGB 8-8-8-8)
Dec 27 14:57:51 pi3 motion[567]: [1:ml1] [NTC] [VID] v4l2_pixfmt_select: (11) BA81 (8-bit Bayer BGBG/GRGR)
Dec 27 14:57:51 pi3 motion[567]: [1:ml1] [NTC] [VID] v4l2_pixfmt_select: (12) GBRG (8-bit Bayer GBGB/RGRG)
Dec 27 14:57:51 pi3 motion[567]: [1:ml1] [NTC] [VID] v4l2_pixfmt_select: (13) GRBG (8-bit Bayer GRGR/BGBG)
Dec 27 14:57:51 pi3 motion[567]: [1:ml1] [NTC] [VID] v4l2_pixfmt_select: (14) RGGB (8-bit Bayer RGRG/GBGB)
Dec 27 14:57:51 pi3 motion[567]: [1:ml1] [NTC] [VID] v4l2_pixfmt_select: (15) pBAA (10-bit Bayer BGBG/GRGR Packed)
Dec 27 14:57:51 pi3 motion[567]: [1:ml1] [NTC] [VID] v4l2_pixfmt_select: (16) BG10 (10-bit Bayer BGBG/GRGR)
Dec 27 14:57:51 pi3 motion[567]: [1:ml1] [NTC] [VID] v4l2_pixfmt_select: (17) pGAA (10-bit Bayer GBGB/RGRG Packed)
Dec 27 14:57:51 pi3 motion[567]: [1:ml1] [NTC] [VID] v4l2_pixfmt_select: (18) GB10 (10-bit Bayer GBGB/RGRG)
Dec 27 14:57:51 pi3 motion[567]: [1:ml1] [NTC] [VID] v4l2_pixfmt_select: (19) pgAA (10-bit Bayer GRGR/BGBG Packed)
Dec 27 14:57:51 pi3 motion[567]: [1:ml1] [NTC] [VID] v4l2_pixfmt_select: (20) BA10 (10-bit Bayer GRGR/BGBG)
Dec 27 14:57:51 pi3 motion[567]: [1:ml1] [NTC] [VID] v4l2_pixfmt_select: (21) pRAA (10-bit Bayer RGRG/GBGB Packed)
Dec 27 14:57:51 pi3 motion[567]: [1:ml1] [NTC] [VID] v4l2_pixfmt_select: (22) RG10 (10-bit Bayer RGRG/GBGB)
Dec 27 14:57:51 pi3 motion[567]: [1:ml1] [NTC] [VID] v4l2_pixfmt_select: (23) pBCC (12-bit Bayer BGBG/GRGR Packed)
Dec 27 14:57:51 pi3 motion[567]: [1:ml1] [NTC] [VID] v4l2_pixfmt_select: (24) BG12 (12-bit Bayer BGBG/GRGR)
Dec 27 14:57:51 pi3 motion[567]: [1:ml1] [NTC] [VID] v4l2_pixfmt_select: (25) pGCC (12-bit Bayer GBGB/RGRG Packed)
Dec 27 14:57:51 pi3 motion[567]: [1:ml1] [NTC] [VID] v4l2_pixfmt_select: (26) GB12 (12-bit Bayer GBGB/RGRG)
Dec 27 14:57:51 pi3 motion[567]: [1:ml1] [NTC] [VID] v4l2_pixfmt_select: (27) pgCC (12-bit Bayer GRGR/BGBG Packed)
Dec 27 14:57:51 pi3 motion[567]: [1:ml1] [NTC] [VID] v4l2_pixfmt_select: (28) BA12 (12-bit Bayer GRGR/BGBG)
Dec 27 14:57:51 pi3 motion[567]: [1:ml1] [NTC] [VID] v4l2_pixfmt_select: (29) pRCC (12-bit Bayer RGRG/GBGB Packed)
Dec 27 14:57:51 pi3 motion[567]: [1:ml1] [NTC] [VID] v4l2_pixfmt_select: (30) RG12 (12-bit Bayer RGRG/GBGB)
Dec 27 14:57:51 pi3 motion[567]: [1:ml1] [NTC] [VID] v4l2_pixfmt_select: (31) pBEE (14-bit Bayer BGBG/GRGR Packed)
Dec 27 14:57:51 pi3 motion[567]: [1:ml1] [NTC] [VID] v4l2_pixfmt_select: (32) BG14 (14-bit Bayer BGBG/GRGR)
Dec 27 14:57:51 pi3 motion[567]: [1:ml1] [NTC] [VID] v4l2_pixfmt_select: (33) pGEE (14-bit Bayer GBGB/RGRG Packed)
Dec 27 14:57:51 pi3 motion[567]: [1:ml1] [NTC] [VID] v4l2_pixfmt_select: (34) GB14 (14-bit Bayer GBGB/RGRG)
Dec 27 14:57:51 pi3 motion[567]: [1:ml1] [NTC] [VID] v4l2_pixfmt_select: (35) pgEE (14-bit Bayer GRGR/BGBG Packed)
Dec 27 14:57:51 pi3 motion[567]: [1:ml1] [NTC] [VID] v4l2_pixfmt_select: (36) GR14 (14-bit Bayer GRGR/BGBG)
Dec 27 14:57:51 pi3 motion[567]: [1:ml1] [NTC] [VID] v4l2_pixfmt_select: (37) pREE (14-bit Bayer RGRG/GBGB Packed)
Dec 27 14:57:51 pi3 motion[567]: [1:ml1] [NTC] [VID] v4l2_pixfmt_select: (38) RG14 (14-bit Bayer RGRG/GBGB)
Dec 27 14:57:51 pi3 motion[567]: [1:ml1] [NTC] [VID] v4l2_pixfmt_select: (39) GREY (8-bit Greyscale)
Dec 27 14:57:51 pi3 motion[567]: [1:ml1] [NTC] [VID] v4l2_pixfmt_select: (40) Y10P (10-bit Greyscale (MIPI Packed))
Dec 27 14:57:51 pi3 motion[567]: [1:ml1] [NTC] [VID] v4l2_pixfmt_select: (41) Y10  (10-bit Greyscale)
Dec 27 14:57:51 pi3 motion[567]: [1:ml1] [NTC] [VID] v4l2_pixfmt_select: (42) Y12P (12-bit Greyscale (MIPI Packed))
Dec 27 14:57:51 pi3 motion[567]: [1:ml1] [NTC] [VID] v4l2_pixfmt_select: (43) Y12  (12-bit Greyscale)
Dec 27 14:57:51 pi3 motion[567]: [1:ml1] [NTC] [VID] v4l2_pixfmt_select: (44) Y14P (14-bit Greyscale (MIPI Packed))
Dec 27 14:57:51 pi3 motion[567]: [1:ml1] [NTC] [VID] v4l2_pixfmt_select: (45) Y14  (14-bit Greyscale)
Dec 27 14:57:51 pi3 motion[567]: [1:ml1] [NTC] [VID] v4l2_pixfmt_try: Testing palette Y12  (640x480)
Dec 27 14:57:51 pi3 motion[567]: [1:ml1] [NTC] [VID] v4l2_pixfmt_set: Using palette Y12  (640x480)
Dec 27 14:57:51 pi3 motion[567]: [1:ml1] [NTC] [VID] v4l2_pixfmt_select: Selected palette Y12
Dec 27 14:57:51 pi3 motion[567]: [1:ml1] [ERR] [VID] v4l2_fps_set: Error setting fps. Return code -1
Dec 27 14:57:51 pi3 motion[567]: [1:ml1] [ERR] [VID] v4l2_mmap_set: Error starting stream. VIDIOC_STREAMON: Invalid argument
Dec 27 14:57:51 pi3 motion[567]: [1:ml1] [ERR] [VID] vid_start: V4L2 device failed to open
Dec 27 14:57:51 pi3 motion[567]: [1:ml1] [WRN] [ALL] motion_init: Could not fetch initial image from camera
Dec 27 14:57:51 pi3 motion[567]: [1:ml1] [WRN] [ALL] motion_init: Motion continues using width and height from config file(s)
Dec 27 14:57:51 pi3 motion[567]: [1:ml1] [NTC] [ALL] image_ring_resize: Resizing pre_capture buffer to 1 items
Dec 27 14:57:51 pi3 kernel: unicam 3f801000.csi: Failed to start media pipeline: -22
Dec 27 14:57:51 pi3 motion[567]: [1:ml1] [NTC] [ALL] image_ring_resize: Resizing pre_capture buffer to 4 items
...
Dec 27 14:57:54 pi3 libcamerify[560]: configure_logging cmd motioneye: False
Dec 27 14:57:54 pi3 libcamerify[560]: configure logging to file: None
Dec 27 14:57:54 pi3 libcamerify[560]:     INFO: Hallo! Dies ist ein MotionEye-Server 0.43.0
...
Dec 27 14:57:59 pi3 motion[966]: [0:motion] [NTC] [ALL] conf_load: Processing thread 0 - config file /etc/motioneye/motion.conf
Dec 27 14:57:59 pi3 motion[966]: [0:motion] [NTC] [ALL] config_camera: Processing camera config file camera-1.conf
Dec 27 14:57:59 pi3 motion[966]: [0:motion] [NTC] [ALL] motion_startup: Logging to syslog
Dec 27 14:57:59 pi3 motion[966]: [0:motion] [NTC] [ALL] motion_startup: Motion 4.6.0 Started
Dec 27 14:57:59 pi3 motion[966]: [0:motion] [NTC] [ALL] motion_startup: Using default log type (ALL)
Dec 27 14:57:59 pi3 motion[966]: [0:motion] [NTC] [ALL] motion_startup: Using log type (ALL) log level (WRN)
Dec 27 14:57:59 pi3 motion[966]: [1:ml1:Camera1] [ERR] [VID] v4l2_fps_set: Error setting fps. Return code -1
Dec 27 14:58:00 pi3 libcamerify[560]:     INFO: Reinigung hat begonnen
Dec 27 14:58:00 pi3 libcamerify[560]:     INFO: wsswitch wurde gestartet
Dec 27 14:58:00 pi3 libcamerify[560]:     INFO: Aufgaben wurden gestartet
Dec 27 14:58:00 pi3 libcamerify[560]:     INFO: Der mjpg-Client-Garbage-Collector wurde gestartet
Dec 27 14:58:00 pi3 libcamerify[560]:     INFO: Der Server wurde gestartet
...
Dec 27 14:58:33 pi3 motion[567]: [1:ml1] [NTC] [ALL] mlp_capture: Video signal lost - Adding grey image
...
crankyfish commented 8 months ago

Note that the problem also happens under plain old motion 4.3.2 under bullseye libcamerify.

hrfried commented 7 months ago

Phew, this was a whirlwind to track down. Just wanted to add for anyone looking here that libcamerify is part of the libcamera-tools package. I didn't see that mentioned anywhere aside except in the link below...and I wasn't about to just go installing every single libcamera package in the repos lol.

https://github.com/Motion-Project/motion/issues/1434#issuecomment-1060755121

Then just running/modifying the service to prepend <path>/libcamerify to the service command as above seems to work fine.

i.e., for me (installed this in a venv to test) ExecStart=/usr/bin/libcamerify /home/dietpi/motioneye-test/motioneye-venv/bin/meyectl startserver -c /etc/motioneye/motioneye.conf

Will likely work with motioneye from dietpi-software now, since I think it's pulled from the same version as of v9.0.2, as long as libcamera-tools is installed...

@MichaIng this might be decent temporary fix on DietPi if it works on your end. Working for me on an Rpi4 8gig, latest patch with a raspberry pi v2.1 camera. Just realized I'm in motioneye repos right now but I know you're around. ;)

MichaIng commented 7 months ago

@hrfried Thanks, valuable information. I think we'll go with something like this: https://github.com/motioneye-project/motioneye/pull/2765 A Raspberry Pi 5 with a camera module 3 just arrived here, so I am now able to test all this. Likely I'll add a prompt to the motioneye_init command to prompt for whether a camera module is used, and in case install libcamera-tools and set the libcamerify option, if available, otherwise inform users that it is needed for camera modules to work with the modern interface.

AntiSol commented 5 months ago

Hello, thanks for motioneye, it's incredibly useful to me and I've been using it for years :)

I'm also experiencing this issue.

I have installed motioneye system-wide as per the instructions in the git repo's readme (using pip --pre) on two almost*-identical raspberry pis running bookworm and see this issue on both of them. They both have raspberry pi camera v3 modules installed (one is wide/noir, the other a standard camera), and I've modified the systemd service to run motioneye using libcameraify and added the camera as a v4l device.

meyectl -v reports motionEye 0.43.1b1

After some time (as @kni-bo points out, depending on how many times motion is detected) they freeze up because they are out of memory. Sometimes, if I'm very patient, the OOM killer will kick in and kill the process, and then things will be OK for a while. If I do a ps aux I see a bunch of defunct sh processes much like the output @kni-bo has posted.

Interestingly, I am not seeing the errors in the journal that @kni-bo reports, if I do journalctl --follow or dmesg --follow I don't see anything particularly interesting, and certainly no repeating messages or messages every ~10s.

One of my devices is in a position that sees regular motion, and this one tends to freeze up every couple of hours. The other sees motion quite rarely and frequently runs for days or even weeks before it hangs.

@kni-bo I'd appreciate it a lot if you could share the code for your watchdog script? that would be super helpful as an interim fix :)

Perhaps the fact that I have 2 devices might be useful for helping to track this down?

AntiSol commented 5 months ago

I've made a little bit of progress in tracking down what's going on here: I did some investigating and the zombie shells are running commands like these:

/bin/sh -c /usr/local/lib/python3.11/dist-packages/motioneye/scripts/relayevent.sh "/etc/motioneye/motioneye.conf" picture_save 1 /var/lib/motioneye/Camera1/2024-04-12/19-37-30.jpg  &

I now think that the number of zombies and the time to memory exhaustion is only coincidentally related to the number of motion events - as you can see this is a scheduled picture_save event, and it not related to motion being detected.

But I think there must be something about motion events that makes them more likely to cause a problem somewhere, because when the system freezes, it is almost always showing motion in the picture (I stream the video via http from port 9081 and in most cases when the system hangs there is motion in some part of the image, indicated by a red square)

I find it very strange that this relayevent.sh script is getting into the zombie state, because under the hood it's not doing much except hitting the python server with a curl request. This curl request specifies a 5 second timeout and silent failure, so I would expect it to exit gracefully if the endpoint is hanging or crashing for some reason. I can only guess that the response coming from the python http server is very strange and that this might be triggering a bug somewhere in curl or maybe in motion (which is what is calling these events, I can see them configured in /etc/motioneye/camera-1.conf

I looked at the invocation for a bunch of defunct sh processes and all the ones I looked at were for a picture_save event, but I expect the same thing is happening for motion events, too, and that picture_save simply outnumbers start and stop events.

I also wrote my own watchdog which restarts motioneye if the number of zombies gets above 150, I thought it might be helpful as a temporary measure if anybody else is having the same problem:

#!/bin/bash
limit=150
while true; do
    zombies=$(ps aux | grep sh | grep defunct | wc -l)
    if [ "$zombies" -gt $limit ];then 
        echo "`date`: restarting motioneye ($zombies zombies)"
        systemctl restart motioneye;
    else echo "$zombies zombies"
    fi
    sleep 600
done
AntiSol commented 5 months ago

I've confirmed that you still get zombie processes even if you modify the relayevent.sh script so that the second line is a simple exit 0

I now think that this issue is actually a problem somewhere in either motion or libcameraify and probably not an issue with motioneye, see: https://github.com/Motion-Project/motion/issues/1522

kbingham commented 4 months ago

Also referencing here in case anyone wants to investigate.

Dec 27 14:57:59 pi3 motion[966]: [0:motion] [NTC] [ALL] conf_load: Processing thread 0 - config file /etc/motioneye/motion.conf
Dec 27 14:57:59 pi3 motion[966]: [0:motion] [NTC] [ALL] config_camera: Processing camera config file camera-1.conf
Dec 27 14:57:59 pi3 motion[966]: [0:motion] [NTC] [ALL] motion_startup: Logging to syslog
Dec 27 14:57:59 pi3 motion[966]: [0:motion] [NTC] [ALL] motion_startup: Motion 4.6.0 Started
Dec 27 14:57:59 pi3 motion[966]: [0:motion] [NTC] [ALL] motion_startup: Using default log type (ALL)
Dec 27 14:57:59 pi3 motion[966]: [0:motion] [NTC] [ALL] motion_startup: Using log type (ALL) log level (WRN)
Dec 27 14:57:59 pi3 motion[966]: [1:ml1:Camera1] [ERR] [VID] v4l2_fps_set: Error setting fps. Return code -1
Dec 27 14:58:00 pi3 libcamerify[560]:     INFO: Reinigung hat begonnen
Dec 27 14:58:00 pi3 libcamerify[560]:     INFO: wsswitch wurde gestartet
Dec 27 14:58:00 pi3 libcamerify[560]:     INFO: Aufgaben wurden gestartet
Dec 27 14:58:00 pi3 libcamerify[560]:     INFO: Der mjpg-Client-Garbage-Collector wurde gestartet
Dec 27 14:58:00 pi3 libcamerify[560]:     INFO: Der Server wurde gestartet

Setting the FPS (i.e. the call that fails "Error setting fps" above) was started but doesn't look like it's been completed. If anyone wants to continue that work - the patch is at https://patchwork.libcamera.org/patch/15392/

AntiSol commented 4 months ago

Setting the FPS (i.e. the call that fails "Error setting fps" above) was started but doesn't look like it's been completed. If anyone wants to continue that work - the patch is at https://patchwork.libcamera.org/patch/15392/

Just a note to take into account before anybody does choose to donate their time to looking into this libcamera patch, @kbingham has made no attempt to replicate or investigate the zombie issue, is simply guessing that this error message may be related, and has made no attempt to provide a serious rationale as to why this inability to set fps in libcamera would cause zombies. I can't say for sure that they're not related, but @kbingham certainly can't say that they are.

kbingham commented 4 months ago

Setting the FPS (i.e. the call that fails "Error setting fps" above) was started but doesn't look like it's been completed. If anyone wants to continue that work - the patch is at https://patchwork.libcamera.org/patch/15392/

Just a note to take into account before anybody does choose to donate their time to looking into this libcamera patch, @kbingham has made no attempt to replicate or investigate the zombie issue, is simply guessing that this error message may be related, and has made no attempt to provide a serious rationale as to why this inability to set fps in libcamera would cause zombies. I can't say for sure that they're not related, but @kbingham certainly can't say that they are.

I completely agree. I have not investigated this, nor do I have time to do so.

I'm trying to provide what information I'm aware of from the libcamera project and previous attempts to use libcamerify that were not completed for anyone who /does/ want to investigate this.

AntiSol commented 4 months ago

@kbingham it's interesting that raspberry pi are selling devices which seem to exclusively rely on this library (without any mention of that on the packaging) but don't seem to have the resources/will to bother investigating issues caused by said library.

kbingham commented 4 months ago

I do not work for Raspberry Pi.

AntiSol commented 4 months ago

wow so their response is even more lacklustre than I thought!

MichaIng commented 4 months ago

it's interesting that raspberry pi are selling devices which seem to exclusively rely on this library (without any mention of that on the packaging) but don't seem to have the resources/will to bother investigating issues caused by said library.

It is the opposite: libcamera is an open source cross OS and device compatible library to access camera devices. Previously, the closed source firmware camera stack and libraries were used, which were RPi specific. Every computer with such DSI camera requires some additional driver/library to access it, like your laptop requires an additional driver to access its internal camera. USB cameras do more processing internally, so they do not require this, also not on RPi. So you can say that RPis have an additional hardware feature to attach cost effective small camera modules, which always require additional drivers/libraries, but RPi Ltd. moved from a closed source camera stack to an open approach with generic KMS + libcamera.

The problem is that many software have adapted to the RPi specific firmware driver/API to access RPi camera modules, and with Bullseye, this stack was deprecated, with Bookworm is was removed, and the camera module 3 never supported it. So libcamerify tries to address this, wrapping the new API into the old one.

I see you found out that it is a motion + libcamerify issue. Nasty indeed, as this is currently the only way to make the RPi camera module 3 work with motionEye, and supports RPi OS Bookworm. I do have an RPi 5 and camera module 3 to play around now, but I am also no C programmer and have no experience with neither the camera module, nor libcamera. But if I can help to test or verify something, let me know.

AntiSol commented 4 months ago

RPi Ltd. moved from a closed source camera stack to an open approach with generic KMS + libcamera.

Fair enough, but I'm actually more concerned about how they moved from something that did v4l compatibility well (and was thus usable with most camera software written in the last 2 decades) to something new / obscure / not well supported by most software, which also has buggy v4l compatibility, and don't seem to have anybody attached to their libcamera repo who can even be bothered trying to replicate the problem.

I'll keep at it. The response on the motion issue tracker has been much better.

MichaIng commented 4 months ago

moved from something that did v4l compatibility

The old stack did not support V4L2, but required those closed source (obscure) raspivid/raspistill blobs, or do I misunderstand what you mean? But at least it did provide a usable MJPEG stream via /dev/video0. Now these device nodes provide other functionality that is indeed weird/non-intuitive, as usually one expects one node per camera: https://www.raspberrypi.com/documentation/computers/camera_software.html#device-nodes-when-using-libcamera So a good question is why there as not one /dev/video* node left that can be used just like the old one, as direct full camera stream input, but instead requiring a complex library to process those multiple nodes correctly.

AntiSol commented 4 months ago

My understanding is that /dev/video0 is a standard v4l device.

if the bcm2835_v4l2 kernel module gives you v4l compatibility for the older camera modules:

$ v4l2-ctl --list-devices
bcm2835-codec-decode (platform:bcm2835-codec):
    /dev/video10
    /dev/video11
    /dev/video12

mmal service 16.1 (platform:bcm2835-v4l2):
    /dev/video0

And I for one can attest that it works great.

So a good question is why there as not one /dev/video* node left that can be used just like the old one, as direct full camera stream input, but instead requiring a complex library to process those multiple nodes correctly.

The answer to this question is known as the CADT model of software development. You wouldn't bother being compatible with every piece of camera software written over the span of 2 decades, instead you just expect those hundreds and hundreds of projects to adjust to your shiny new library that you can't even be bothered supporting properly. What could possibly go wrong?

MichaIng commented 4 months ago

My understanding is that /dev/video0 is a standard v4l device.

Ah yes, you are right. I somehow thought that e.g. v4l2-ctl is not able to control the camera modules (with the old driver), but at least it lists them properly. Makes sense. I wonder whether this means that we could drop mmalctl.py as the camera modules are listed (and properly labelled) within the V4L2 camera list as well, or whether there is other special handling for those, or functions in v4l2ctl.py which do not work with MMAL devices.

The answer to this question is known as the CADT model of software development.

Not exactly the same, as this is about closed bug reports after a rewrite, while here it is about missing (or non reliable/not drop-in libcamerify) backwards compatibility after a "rewrite". However, a little similar, though at least generally moving from the closed source Broadcom driver and tools to open (source) ones was generally needed at some point, and not done "for fun". I am just not sure about the reason that e.g. /dev/video0 could not be left as identically usable device node.

AntiSol commented 4 months ago

I wonder whether this means that we could drop mmalctl.py as the camera modules are listed (and properly labelled) within the V4L2 camera list as well, or whether there is other special handling for those, or functions in v4l2ctl.py which do not work with MMAL devices.

I would strongly discourage this, at least until the MMAL interface is obscure or broken somehow. More compatibility is always better.

Not exactly the same

That's what happens when there is no incentive for people to do the parts of programming that aren't fun. Fixing bugs isn't fun; going through the bug list isn't fun; but rewriting everything from scratch is fun (because "this time it will be done right", ha ha) and so that's what happens, over and over again.

You're missing the subtext. It is the same. In a couple of years there will be a shiny new camera module using some other shiny new library. And when that happens (if not before) the bugs I've filed against libcamera will be closed, never having been investigated.

MichaIng commented 4 months ago

I would strongly discourage this, at least until the MMAL interface is obscure or broken somehow. More compatibility is always better.

But as far as I can see, it is really just the vcgencmd detection, which got already broken 2 times due to some changes of the output, after I joined the project. So if v4l2-ctl --list-devices does exactly the same, i.e. either showing the camera module or not, but in a generic reliable way, then I see no point to additionally have another camera type which checks with a less stable and deprecated tool whether one is available or not. But as said, probably there are other places where those camera types are handled differently, I did not further go through the code.

You're missing the subtext. It is the same. In a couple of years there will be a shiny new camera module using some other shiny new library. And when that happens (if not before) the bugs I've filed against libcamera will be closed, never having been investigated.

Yeah similar. But as said, the rewrite was not done for fun, but as part of a consistent long term project to move from all those closed sources RPi-only Broadcom drivers/libs/APIs to open (source) drivers and standards, which are or can be the same on every hardware. Similarly the switch from legacy GPU framebuffer driver to KMS/DRM, which also broke some legacy RPi-only tools well known, but ultimately means that you can e.g. configure console display resolution and orientation via native Linux video= command-line parameter, instead of having to use some RPi-only setting in RPi-only config.txt, which so often caused RPi users to wonder why this does not work on their Odroid/NanoPi/PC whatever other system. The old drivers surely were also a burden for the RPi developer staff, needing to maintain everything on their own, or relying on Broadcom, while now it is just done in upstream Linux development. And this switch basically implied the camera stack switch, being part of the same whole GPU driver stack.

However, there is no point to overly discuss this here, as we need to live with what we have now. While I agree with some of your points, I just think it is unfair to throw their work on KMS and libcamera into the same box as this CADT rand, as, while we can discuss particular implementation details, it is overall a huge benefit and the absolutely correct thing to do, addressing the one major criticism point from major parts of Linux and open source community, the Raspberry Pi was always suffering from, and still is when it comes to the bootloader, all being fully of closed source binary blobs, no one else has and can have insights.

However, stopping this topic here. Let's hope the motion guys have some idea how the zombies issue is caused in combination with libcamerify. I hope to find some time soon to generally play around with my camera module 3 and motionEye. Maybe I can find a workaround for the issue. Sadly even that there are Python bindings for libcamera, support is most importantly missing in motion itself. Implementing support for MotionPlus is probably the only sane solution, which natively supports libcamera sources: https://github.com/Motion-Project/motionplus But this is certainly much more work than wrapping either motionEye itself (optionally) or the particular motion processes for MMAL cameras into libcamerify.

AntiSol commented 4 months ago

But as far as I can see, it is really just the vcgencmd detection, which got already broken 2 times due to some changes of the output, after I joined the project. So if v4l2-ctl --list-devices does exactly the same, i.e. either showing the camera module or not, but in a generic reliable way, then I see no point to additionally have another camera type which checks with a less stable and deprecated tool whether one is available or not. But as said, probably there are other places where those camera types are handled differently, I did not further go through the code.

Perhaps there is a good case for deprecating it after all, as long as there's not a userbase being left out in the cold.

the rewrite was not done for fun, but as part of a consistent long term project to move from all those closed sources RPi-only Broadcom drivers/libs/APIs to open (source) drivers and standards

The thing is, those horrible closed-source drivers actually work.

The correct solution is to use/write a driver that has support for the well-established and widely-adopted standard that is video4linux, and not some new, immature, buggy thing that:

  1. provides no simple compatibility with v4l (i.e providing a /dev/video* device, requiring the use of this libcamerify wrapper to make the vast majority of software work),
  2. provides a buggy/incomplete implementation of what v4l compatibility they do provide, and
  3. has nobody who can be bothered to investigate issue reports.

As I've said elsewhere, I'll grant that responsibility for the issue does not stem from the libcamera team as such - responsibility lies with raspberry pi for switching to using a nonstandard, immature, buggy library which is clearly not ready for the primetime, and in particular for not providing resources to support said library and help it mature to the point where it is ready.

Maybe I'm wrong and it's not CADT. I hope I am. I guess time will tell as to whether this library sticks around or falls into obscurity and a pile of unread bug reports.

So far, magic 8-ball says, "outlook does not look good".

AntiSol commented 4 months ago

@MichaIng also, back on topic, you mentioned a workaround: one nasty-ass workaround is a watchdog script similar to what I posted here. This does at least give you a system that doesn't lock up every few hours. Motioneye should be able to just restart motion rather than needing to restart motioneye (I guess I could do this, too, if I replaced the motion executable with a script)

However this workaround has some pretty significant downsides, e.g when motion is restarted, any motion event / video recording will be interrupted, mjpeg streams disappear temporarily, etc etc. But it does at least stop the whole system from becoming unresponsive.

A possible improvement to this would be to check whether there's a motion event currently in progress before doing the zombie check, and if so deferring the check until that motion event has finished. Note that I say "deferring", not "skipping" - you want the check to run immediately after the timeout as the motion event ends, not "at the next 10min interval", because that could cause restarts to never happen.

Another possible improvement would be to do some fine-tuning to maybe raise the number of allowed zombies to resude the number of restarts without compromising system stability, but I supect that might be difficult as I expect that's going to be very dependant on the hardware you're running on (e.g more ram probably means more zombies are tolerable)

kbingham commented 4 months ago

The thing is, those horrible closed-source drivers actually work.

"For the cameras you have so far". They do not work for the camera V3 - nor any other third party camera you may wish to buy or connect to the Raspberry Pi ecosystem.

AntiSol commented 4 months ago

They do not work for the camera V3

And as I have been pointing out for some time now, that (and the decision to use libcamera rather than something mature) was a choice made by the raspberry pi people, not some inherent property of the universe.

nor any other third party camera you may wish to buy or connect to the Raspberry Pi ecosystem.

This is very true - the vast majority of cameras that I might want to plug into a raspberry pi (i.e just about any usb camera made in the last 20 years, or pretty much anything other than a camera module 3) will just work with the mature / stable / well-supported video4linux.

But anyway, @kbingham, as MichaIng pointed out, we're actually not here to argue, we're trying to resolve this issue, so I'd suggest that if you have the time to read and respond to this off-topic discussion further, then perhaps your time might be better spent making a basic attempt to replicate the issue and actually contributing to the solution, rather than continuing an argument which has already run its course :)

kbingham commented 4 months ago

I would love to see this issue resolved. And I'm here to help (with my spare time) anyone who is willing to put the effort in. But you didn't seem to want to do that, just have someone else do it for you. I don't run motion, so replicating this isn't something I can do.

Meanwhile everything you say sounds like an attack on a project I and others work very hard to support and you continue to slander due to the lack of understanding of the technical issues that cause us to do the work in the first place (your replies above makes that clear).

But you're absolutely right. it's off topic here, and it sounds like you can resolve this by fixing the issue in the motion project.

I'll happily help Mr-Dave from Motion at great lengths if he needs any support on the libcamera side. Good luck!

AntiSol commented 4 months ago

You could have set up the software and made an attempt to replicate the problem in far far less time than you've already spent responding to this issue. I'd have been more than happy to assist with that if you were struggling. I'd have happily supplied you with an sd card image that you could have flashed in a matter of minutes, had you bothered to ask.

Instead you decided to try to conscript me to test out an unrelated patch in the vain hope that it might solve the problem, based on nothing more than a hunch and a totally unrelated error message, and without taking any time at all to even come up with a justification as to why you think said patch is related, and then refused to provide any assistance at all with doing so or to answer any of my questions about the flimsiness of said hunch.

You'll note that the so-called "attacks" and "slander" about your immature, unstable, unnecessary, unsupported project didn't start until after you admitted that you couldn't be bothered looking into the problem, and had already spent more time trolling than it would have taken to actually look into the issue.

Perhaps in future when you can't be bothered assisting, you'll choose to spend your time on something more constructive than trolling, and then you won't have to hear the awful "slanderous" opinions of the people that you can't be bothered to help (due to your participation in the CADT development methodology).

kbingham commented 4 months ago

i think at this stage lets just ask the @motioneye-project maintainers to lock this issue for further comments.

AntiSol commented 4 months ago

The issue remains outstanding, largely because you can't be bothered to look into it, and has only been derailed because you chose (twice) to derail it. a more appropriate course of action would be for you to refrain from further unconstructive comments.

AntiSol commented 4 months ago

just a note pointing at the fix: https://github.com/Motion-Project/motion/commit/629b3babf0d0375592de61f2f05c85b460efd65c

For anybody trying to solve this, currently you'll need to build motion from source to get the fix, but it is fixed - I've been running it on 2x pis for more than a day now. 0 zombies, no side-effects that I've noticed, ~and 0 totally unrelated random patches applied to libcamera based on the wild guesses of someone too lazy to bother looking~.

All praise and glory to motion's MrDave for tracking it down! :grin:

MichaIng commented 4 months ago

I still wonder why this is no issue without libcamerify. I do not understand the code well enough, but probably it changes the context in which this SIGCHLD "ignore handler" applied. The positive side about it is that it seems to have been legacy cruft anyway, not needed in current motion anymore. So aside of fixing this particular issue, it was a little cleanup as well, and probably this caused some other issues in certain rare circumstances as well 🙂.

AntiSol commented 4 months ago

Yeah this would be nice to know, but since it seems the libcamera people aren't interested in the quality of their code, and I can't be bothered to do their work for them, I guess we never will :shrug:

MichaIng commented 4 months ago

As far as I understand, it is/was indeed a motion issue. While libcamerify did have an effect on it, to me it does not look like something that is easy to diagnose from their end, without investing immense time to understand motion code, and how/why it intentionally ignored SIGCHLD in the HTTP thread. And now that it has been fixed, I am not sure whether it is worth it to invest this time to understand why it caused this different motion behaviour.

So be careful blaming anyone here. It doesn't help anyone, and most of us are volunteers, where such is doubled destructive and demotivating: when you donate your spare time in a project, to be used by other for free, and do not get real compensation but blames instead.

AntiSol commented 4 months ago

The issue could still be in libcamera. The problem only occurred when libcamerify is being used. The way motion was doing things might be perfectly valid. We don't know and can't be sure without investing even more time than a libcamera person would have needed to, to fully understand what is going on in both motion and libcamera.

This is not going to happen because, as we've established, nobody from libcamera is interested in looking or discussing it. Which is exactly equivalent to not being interested in the quality of their code, which is the core issue Jamie Zawinski so poetically pointed out decades ago when he called out their CADT model. If it's "demotivating" to hear these truths, then frankly the FOSS world is better off without that type of contributor.

Anyway, speaking of demotivating, the dismissive way I have been treated (by others, not you @MichaIng, you've been great) during this whole saga has actually been quite demotivating and depressing, and I don't have the energy to argue about it anymore. I just wanted to contribute my time to getting the problem fixed, it's fixed, yay me. And libcamera might still be buggy, but nobody cares :shrug: . I say close the issue.

MichaIng commented 4 months ago

And again, you are using free (of cost) software, which is by its license (typical short copyright notice for (L)GPL)"

distributed in the hope that it will be useful,
but WITHOUT ANY WARRANTY; without even the implied warranty of
MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE

from people who spend their spare time to make this available to you for free. You could blame RPi Ltd. for requiring the library too early for their camera modules on RPi, or for not investing enough of their own developer's time to make it a better in-place replacement for the previous API. But even then, strictly/contract-wise seen you paid for hardware, not for software, especially not for every piece of hardware addons and software to work with every other piece of software (which is impossible).

And another "again": motion contained code to explicitly ignore SIGCHLD, which, as far as I understand, implies zombies. I do not understand why this was not the case without libcamerify, but wrapping one piece of software into another, which mounts stuff around to put one API over another, might simply imply a certain change which caused this, without being a bug.

Whichever way you see it, each individual point is IMO enough to make your blames absolutely inappropriate, non-productive/destructive anyway the way you phrase them. Still, I am thankful that you took the time make all sides aware of it, investigating and testing it, which lead to the solution. It would however have happened the same way without blaming libcamera devs or libcamera in general.

AntiSol commented 4 months ago
  1. Regardless, a maintainer has an ethical obligation. If you're supporting your project, you need to actually support it, not alienate your users and potential contributors by refusing to even attempt to address reasonable questions put to you.
  2. You could blame RPi Ltd. for requiring the library too early for their camera modules on RPi, or for not investing enough of their own developer's time to make it a better in-place replacement for the previous API. Yep, I did that, before the "inappropriate behaviour" started. I was happy to leave it there. If you bother to re-read the entire horror show, you'll note that I was very polite for a long time: even after, for example, the libcamera maintainers demonstrated that they didn't understand how their own mailing list they insisted I join worked, and even after one of them tried to hijack this issue and my time to have his unrelated patch tested. I didn't start the inappropriate behaviour here, all I did was call out inappropriate behaviour.
  3. But even then, strictly/contract-wise seen you paid for hardware, not for software It would seem you're not familiar with my country's consumer protection laws.
  4. Your word salad does nothing at all to negate my previous point.
  5. No, (deeply) inappropriate would be failing to make any attempt to respond to questions put to you, then coming uninvited to the issue tracker of another project, soliciting it's users and maintainers to test a completely unrelated patch, wasting their time when you can't be bothered spending any time at all to even coming up with a sensible rationalisation for why you think said patch might help, or to answer any questions about your hunch, much less actually bothering to look at the issue, and then trolling when you are called out for this behaviour and continuing to troll despite being asked to keep it on topic.
zagrim commented 4 months ago

@AntiSol With all kindness :heart: and respect :pray: , you really need to cool down now. Neither you or anyone else can demand anything from some other person on the Internet just on the basis that they have shared some of their work to others for free, or decided to donate some of their time for a project they like. Seems like you have a number of your own OSS projects so you should already know this. Yes, one can argue about ethical or moral obligations, but you have no idea what else is going on in that maintainer's life at the moment, and you certainly are not in any contract with them other than the license of that piece of software which certainly tells you the provider of that software has zero obligations to you (and yes, it the grand scale of software industry, that thing does suck badly).

If you don't like off-the-cuff quick suggestions for the possible avenues to inspect the issue, would you like total silence any better?

I do understand your frustration, though, but I do not understand your behaviour now. All respect to you for actually being part of hunting the (assumed) root cause down, but isn't this a bit too much already? :disappointed:

AntiSol commented 4 months ago

would you like total silence any better?

Absolutely yes.

isn't this a bit too much already?

Yep. I was over it days ago, and said as much.

AntiSol commented 3 weeks ago

hmmm, I'm dubious that your problem is related @LaurentChemla. I've got 2 pis running the fixed motion build and I don't think either of them has restarted more than a couple of times since april, and i think 100% of those were due to power failure.

When you say "After some time", how long are you talking? hours? days? weeks? My devices currently have an uptime of 18 days and are going strong.

The zombie issue will lock up your entire machine, not just motion. I think you probably have a different issue.

AntiSol commented 3 weeks ago

I think you're probably better off filing a new bug report with as much detail as you can, I don't think your symptoms sound the same. I've never seen this "Fix the cause of camera/system locking and restart Motion" message you mention, and my pis are currently enjoying weeks to months of uptime with no issues.

do you see 'defunct' motion processes when you do a sudo ps aux | grep motion? If not, it's not the same issue.