Closed pg9182 closed 2 years ago
cc @L1ghtman2k
This appears to be caused by a race condition when closing the displayfd.
In Xorg, it writes the display number, then a newline: https://sourcegraph.com/github.com/mirror/xserver@7b7170ecd636ae1110622e2430549f79598750ca/-/blob/os/connection.c?L189:9.
void
NotifyParentProcess(void)
{
#if !defined(WIN32)
if (displayfd >= 0) {
if (write(displayfd, display, strlen(display)) != strlen(display))
FatalError("Cannot write display number to fd %d\n", displayfd);
if (write(displayfd, "\n", 1) != 1)
FatalError("Cannot write display number to fd %d\n", displayfd);
close(displayfd);
displayfd = -1;
}
if (RunFromSmartParent) {
if (ParentProcess > 1) {
kill(ParentProcess, SIGUSR1);
}
}
if (RunFromSigStopParent)
raise(SIGSTOP);
#endif
}
In nswrap, we close it after reading the display number: https://github.com/pg9182/northstar-dedicated/blame/57bdcff2bd7f6c80191a8dfb4f2647cca3d21374/src/nswrap/nswrap.c#L289.
To fix this, I'm probably going to add another read and/or a delay before closing the FD. As a more permanent fix which doesn't depend on implementation details of Xorg, I'll look into using the "smart parent" SIGUSR1 logic with a signalfd.
Logs from instance
finer-dormammu
(457ea7c023e6):Recently, this has been occurring more frequently.