Getting an segfault while running marie server. This is problematic as it creates a defunct aka zombie process cause the kernel to leave a task stuck in uninterruptible "D" state. A task/process in that state cannot be killed kill -9.
log output from dmesg
[227392.162828] perf: interrupt took too long (2510 > 2500), lowering kernel.perf_event_max_sample_rate to 79500
[240010.209731] marie[111619]: segfault at 7f2987800b00 ip 00007f30a0f7895b sp 00007f2a655f71d0 error 4 in cv2.abi3.so[7f30a0745000+2f4f000] likely on CPU 10 (core 20, socket 0)
[240010.209740] Code: 48 63 4d 00 48 8b 7c 24 08 89 da 44 8d 43 01 49 03 7f 28 48 8b 47 18 48 8b b7 d0 00 00 00 85 c9 0f 8e a1 16 00 00 4d 8b 4f 18 <45> 8b 34 89 48 8b 4c 24 20 80 3c 19 00 0f 85 52 fe ff ff c7 45 00
[240215.295317] INFO: task marie:110689 blocked for more than 120 seconds.
[240215.295324] Tainted: P OE 6.2.0-37-generic #38~22.04.1-Ubuntu
[240215.295326] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
[240215.295328] task:marie state:D stack:0 pid:110689 ppid:110658 flags:0x00000002
[240215.295331] Call Trace:
[240215.295332] <TASK>
[240215.295335] __schedule+0x2b7/0x5f0
[240215.295339] schedule+0x68/0x110
[240215.295341] do_exit+0xf3/0x6c0
[240215.295343] do_group_exit+0x35/0x90
[240215.295347] get_signal+0x8a5/0x8d0
[240215.295349] ? __f_unlock_pos+0x12/0x20
[240215.295352] arch_do_signal_or_restart+0x2a/0x120
[240215.295355] ? exit_to_user_mode_prepare+0x3b/0xd0
[240215.295357] exit_to_user_mode_loop+0xaf/0x140
[240215.295358] exit_to_user_mode_prepare+0xb9/0xd0
[240215.295359] irqentry_exit_to_user_mode+0x9/0x20
[240215.295361] irqentry_exit+0x43/0x50
[240215.295363] sysvec_reschedule_ipi+0x7b/0x120
[240215.295365] asm_sysvec_reschedule_ipi+0x1b/0x20
[240215.295367] RIP: 0033:0x5634e4bfe3d3
Describe how you solve it
Environment
PIP versions of opencv
marie# pip list | grep opencv
opencv-python 4.8.1.78
opencv-python-headless 4.8.1.78
This could be possibly related to error seen in the logs
gbugaj@asp-gpu032:~$ docker logs marieai-dev-server-corr | grep 'Exception' -A 10 | head
Exception ignored when trying to write to the signal wakeup fd:
Traceback (most recent call last):
File "/usr/lib/python3.10/asyncio/selector_events.py", line 115, in _read_from_self
data = self._ssock.recv(4096)
BlockingIOError: [Errno 11] Resource temporarily unavailable
Exception ignored when trying to write to the signal wakeup fd:
Traceback (most recent call last):
File "/usr/lib/python3.10/asyncio/selector_events.py", line 115, in _read_from_self
data = self._ssock.recv(4096)
BlockingIOError: [Errno 11] Resource temporarily unavailable
Describe the bug
Getting an segfault while running
marie server
. This is problematic as it creates adefunct
akazombie
process cause the kernel to leave a task stuck in uninterruptible "D" state. A task/process in that state cannot be killedkill -9
.log output from
dmesg
Describe how you solve it
Environment
PIP versions of
opencv
This could be possibly related to error seen in the logs