I am experiencing a problem where openhpid becomes unresponsive to all client connections. It returns SA_ERR_HPI_NO_RESPONSE for all API connections. This is in both 3.2.1 and 3.4.0. This is using the IPMI direct plugin.
Here is what happens:
We have a rogue process that connects to openhpid via the C API. Then it crashes, and starts up again 1 second later. Crashes, starts up, etc.
This is causing openhpid to not release socket descriptors.
An "lsof -p" shows 1024 socket descriptors stuck in CLOSE_WAIT. (1024 is the max file descriptor limit for this user on this machine.)
When we get into this situation I sent openhpid an ABRT signal, and there are 1024 threads most all of which are blocked on:
(gdb) bt
0 0x00007f13236b41eb in pthread_cond_timedwait@@GLIBC_2.3.2 ()
from /lib64/libpthread.so.0
1 0x00007f1323dc74c5 in ?? () from /usr/lib64/libgthread-2.0.so.0
2 0x00007f13238e0ebf in ?? () from /usr/lib64/libglib-2.0.so.0
3 0x00007f13238e1711 in g_async_queue_timed_pop ()
from /usr/lib64/libglib-2.0.so.0
4 0x0000000000424c0f in oh_dequeue_session_event ()
5 0x00000000004197a4 in saHpiEventGet ()
6 0x000000000040b820 in servicethread(void, void_) ()
7 0x00007f13239342d8 in ?? () from /usr/lib64/libglib-2.0.so.0
8 0x00007f1323931db6 in ?? () from /usr/lib64/libglib-2.0.so.0
9 0x00007f13236aff05 in start_thread () from /lib64/libpthread.so.0
10 0x00007f1322d8210d in clone () from /lib64/libc.so.6
Attached to this bug is /var/log/messages with "openhpid -v". The problem starts to happen at 18:43:08.
I am experiencing a problem where openhpid becomes unresponsive to all client connections. It returns SA_ERR_HPI_NO_RESPONSE for all API connections. This is in both 3.2.1 and 3.4.0. This is using the IPMI direct plugin.
(gdb) bt
0 0x00007f13236b41eb in pthread_cond_timedwait@@GLIBC_2.3.2 ()
from /lib64/libpthread.so.0
1 0x00007f1323dc74c5 in ?? () from /usr/lib64/libgthread-2.0.so.0
2 0x00007f13238e0ebf in ?? () from /usr/lib64/libglib-2.0.so.0
3 0x00007f13238e1711 in g_async_queue_timed_pop ()
from /usr/lib64/libglib-2.0.so.0
4 0x0000000000424c0f in oh_dequeue_session_event ()
5 0x00000000004197a4 in saHpiEventGet ()
6 0x000000000040b820 in servicethread(void, void_) ()
7 0x00007f13239342d8 in ?? () from /usr/lib64/libglib-2.0.so.0
8 0x00007f1323931db6 in ?? () from /usr/lib64/libglib-2.0.so.0
9 0x00007f13236aff05 in start_thread () from /lib64/libpthread.so.0
10 0x00007f1322d8210d in clone () from /lib64/libc.so.6
Attached to this bug is /var/log/messages with "openhpid -v". The problem starts to happen at 18:43:08.
Reported by: trguitar