osvenskan / posix_ipc

Other
139 stars 25 forks source link

segfault w/msg queue threaded rearm under BSD #8

Open osvenskan opened 6 years ago

osvenskan commented 6 years ago

The test test_request_notification_threaded_rearm() sometimes segfaults on FreeBSD. I don't know why it should segfault, but running with DPRINTF enabled is informative. Under Linux, the DPRINTF log shows an orderly, repeated sequence of events --

Implicit in the description above is the behavior of the Threading.Event object which the test case relies on to awake the main thread when the callback is invoked. Once the main thread is woken, it should quickly call mq.receive(). (Keep in mind that notifications should only be sent when the queue transitions from empty to not-empty.)

Under FreeBSD, the sequence of events is diferent. Everything appears normal until after the first call to mq_receive() which is immediately followed by the first call to mq_send(). After that, the log shows a loop of over 500,000 iterations of callback/rearm without mq_receive() being called.

I'm suspicious of the code in uipc_mqueue.c marked by this comment --

/*
 * if there is no receivers and message queue
 * is not empty, we should send notification
 * as soon as possible.
 */
osvenskan commented 6 years ago

Here are the relevant log files for Linux and FreeBSD (probably from 11.1). The log file for BSD has been greatly truncated to fit GitHub's attachment size limitations. The original file is ~161Mb, 4.6 million lines. This subset is about 5% of the original. I can provide the whole log on request.

rearm_log_linux.txt rearm_log_bsd_subset.txt

osvenskan commented 1 year ago

Confirmed that this still happens under FreeBSD 13.1-RELEASE-p5. The unit tests in test_message_queues.py seg fault at some point, although not necessarily in test_request_notification_threaded_rearm(). If I comment out that test, the seg faults go away. I remember that the threaded notification code was tricky code for me to write in posix_ipc_module.c so I wouldn't be surprised to find out I'm doing something wrong, although this works on other platforms so I'm not sure what to think.