kerberos-io / machinery

(DEPRECATED) An open source image processing framework, which uses your USB-, IP- or RPi-camera to recognize events (e.g. motion).
https://www.kerberos.io
490 stars 104 forks source link

machinery 2.6 constantly crashes #148

Open hank opened 6 years ago

hank commented 6 years ago

Running with docker as per instructions. This is what the docker log shows:

2018-06-02 15:59:01,641 INFO spawned: 'machinery' with pid 1195
2018-06-02 15:59:02,657 INFO success: machinery entered RUNNING state, process has stayed up for > than 1 seconds (startsecs)
2018-06-02 16:00:10,363 INFO exited: machinery (terminated by SIGABRT (core dumped); not expected)
2018-06-02 16:00:11,365 INFO spawned: 'machinery' with pid 1253
2018-06-02 16:00:12,868 INFO success: machinery entered RUNNING state, process has stayed up for > than 1 seconds (startsecs)
2018-06-02 16:00:51,897 INFO exited: machinery (terminated by SIGABRT (core dumped); not expected)
2018-06-02 16:00:52,899 INFO spawned: 'machinery' with pid 1370
2018-06-02 16:00:54,419 INFO success: machinery entered RUNNING state, process has stayed up for > than 1 seconds (startsecs)
2018-06-02 16:01:18,449 INFO exited: machinery (terminated by SIGABRT (core dumped); not expected)
2018-06-02 16:01:19,451 INFO spawned: 'machinery' with pid 1455
2018-06-02 16:01:20,466 INFO success: machinery entered RUNNING state, process has stayed up for > than 1 seconds (startsecs)
2018-06-02 16:01:48,301 INFO exited: machinery (terminated by SIGABRT (core dumped); not expected)
2018-06-02 16:01:49,303 INFO spawned: 'machinery' with pid 1520
2018-06-02 16:01:50,912 INFO success: machinery entered RUNNING state, process has stayed up for > than 1 seconds (startsecs)
2018-06-02 16:03:04,957 INFO exited: machinery (terminated by SIGABRT (core dumped); not expected)
2018-06-02 16:03:05,959 INFO spawned: 'machinery' with pid 1712
2018-06-02 16:03:07,410 INFO success: machinery entered RUNNING state, process has stayed up for > than 1 seconds (startsecs)
2018-06-02 16:03:21,428 INFO exited: machinery (terminated by SIGABRT (core dumped); not expected)
2018-06-02 16:03:22,431 INFO spawned: 'machinery' with pid 1763
2018-06-02 16:03:23,446 INFO success: machinery entered RUNNING state, process has stayed up for > than 1 seconds (startsecs)
2018-06-02 16:04:36,480 INFO exited: machinery (terminated by SIGABRT (core dumped); not expected)
2018-06-02 16:04:37,481 INFO spawned: 'machinery' with pid 1826
2018-06-02 16:04:39,152 INFO success: machinery entered RUNNING state, process has stayed up for > than 1 seconds (startsecs)
2018-06-02 16:04:48,167 INFO exited: machinery (terminated by SIGABRT (core dumped); not expected)
2018-06-02 16:04:49,169 INFO spawned: 'machinery' with pid 1877
2018-06-02 16:04:50,184 INFO success: machinery entered RUNNING state, process has stayed up for > than 1 seconds (startsecs)

I have a core dump from the machine, but for security reasons I don't want to upload it here. Let me know if you'd like a copy.

hank commented 6 years ago

Here's more:

Using host libthread_db library "/lib/x86_64-linux-gnu/libthread_db.so.1".
Core was generated by `/usr/bin/kerberosio'.
Program terminated with signal SIGABRT, Aborted.
#0  0x00007f112fdf1c37 in __GI_raise (sig=sig@entry=6)
    at ../nptl/sysdeps/unix/sysv/linux/raise.c:56
56      ../nptl/sysdeps/unix/sysv/linux/raise.c: No such file or directory.
(gdb) bt
#0  0x00007f112fdf1c37 in __GI_raise (sig=sig@entry=6)
    at ../nptl/sysdeps/unix/sysv/linux/raise.c:56
#1  0x00007f112fdf5028 in __GI_abort () at abort.c:89
#2  0x00007f1130b1b535 in __gnu_cxx::__verbose_terminate_handler() ()
   from /usr/lib/x86_64-linux-gnu/libstdc++.so.6
#3  0x00007f1130b196d6 in ?? () from /usr/lib/x86_64-linux-gnu/libstdc++.so.6
#4  0x00007f1130b19703 in std::terminate() ()
   from /usr/lib/x86_64-linux-gnu/libstdc++.so.6
#5  0x00007f1130b19922 in __cxa_throw ()
   from /usr/lib/x86_64-linux-gnu/libstdc++.so.6
#6  0x00000000005ff8d4 in kerberos::healthContinuously(void*) ()
#7  0x00007f1131383184 in start_thread (arg=0x7f112a9e9700)
    at pthread_create.c:312
#8  0x00007f112feb903d in clone ()
    at ../sysdeps/unix/sysv/linux/x86_64/clone.S:111
(gdb)
hank commented 6 years ago

Here's a sanitized capture.xml: capture.xml.txt The camera is a FOSCAM FI8918W

cedricve commented 6 years ago

Hey @hank, thanks for the detailed info. It looks like the camera is blocking as it's crashing in the healthcheck function. Do you have this camera on wifi? and do you other endpoints like rtsp?

cedricve commented 6 years ago

btw how do you generated the core dump?

hank commented 6 years ago

The docker image will apparently core dump by default to /core.

The camera is indeed on wifi and it does NOT have RTSP unfortunately. Is there a way I can increase the timeout for the healthCheck so it is more resilient to short dropouts?

Thanks!

cedricve commented 6 years ago

Thanks hank.

Actually it's hardcoded, if it isn't able to fetch the next frame in 5seconds, than an error is thrown, and the machinery will stop (to prevent the camera is hanging and stops blocking). https://github.com/kerberos-io/machinery/blob/master/src/kerberos/capture/Capture.cpp#L138

I would recommend using Ethernet or buy a camera with RTSP support, it's using less bandwidth.

hank commented 6 years ago

I may just recompile with that changed - if you could make it an option that would be awesome. Bandwidth isn't a concern since it's only on my LAN. The camera works perfectly in a browser using the MJPG stream, and with other software like IVideon. Thanks!

cedricve commented 6 years ago

yeah we might make it configurable in the xml file. good idea!

jonlar commented 5 years ago

This happens to me as well, using an el-cheapo D-link camera DCS-932L. Any progress on making the machinery more resilient and handle faults more gracefully?

stroobl commented 5 years ago

I have the same issue with a Robin SmartView SIP 5MP IP Camera. I suspect something goes wrong during the initialization of the stream and then the container enters a crash loop. All connections are wired in my case. I switched back to the machinery v2.2.2 container and it works stable there.

stroobl commented 3 years ago

Old issue, but I still had this problem with the latest version of kerberios.io, so I started looking into this and in my case the problem got resolved by replacing io.xml with the default version. (stumbled into a comment in #121 with a remark about io.xml and mine also had some missing parts)