kenz-gelsoft / gecko-dev

Read-only Git mirror of the Mercurial gecko repositories at https://hg.mozilla.org. How to contribute: https://firefox-source-docs.mozilla.org/contributing/contribution_quickref.html
https://firefox-source-docs.mozilla.org/setup/index.html
Other
12 stars 1 forks source link

Do we need to implment Darwin(macOS) like workaround to avoid early FD close on IPC? #50

Open kenz-gelsoft opened 19 hours ago

kenz-gelsoft commented 19 hours ago

@waddlesplash

This is an answer to your comment at https://github.com/kenz-gelsoft/gecko-dev/issues/34#issuecomment-2311585220

I found firefox's single Unit test (GTest) IPDLTest_CrossProcess.TestShmem is smaller test case to reproduce the issue. Test failure is intermittent.

I investigated that failure on https://github.com/kenz-gelsoft/gecko-dev/issues/49 (I created this new Issue, as that was very low S/N.)

I found that test case fails only when !msg->attached_handles_.IsEmpty(). That test case calls sendmsg() muliple times, but first call fails that !msg->attached_handles_.IsEmpty() after sendmsg().

https://github.com/kenz-gelsoft/gecko-dev/issues/49#issuecomment-2377928216

For the same condition, firefox's IPC code (based on Chromium IPC) has special logic for Darwin(macOS) like this:

    ssize_t bytes_written =
        HANDLE_EINTR(corrected_sendmsg(pipe_, &msgh, MSG_DONTWAIT));
    CHROMIUM_LOG(WARNING)
        << "ChannelImpl::ProcessOutgoingMessages4: errno=" << errno
        << ", written=" << bytes_written;

    if (bytes_written < 0) {
      switch (errno) {
        case EAGAIN:
// (snip errno check)
      }
    }

    if (intentional_short_write ||
        static_cast<size_t>(bytes_written) != amt_to_write) {
// (snip, this pass didn't call at all both on success and on failure)
      return true;
    } else {
      MOZ_ASSERT(partial_write_->handles_.Length() == num_fds,
                 "not all handles were sent");
      partial_write_.reset();

      if (!msg->attached_handles_.IsEmpty()) {
        CHROMIUM_LOG(WARNING)
            << "ChannelImpl::ProcessOutgoingMessages6";
      }
#if defined(XP_DARWIN)
      if (!msg->attached_handles_.IsEmpty()) {
        pending_fds_.push_back(PendingDescriptors{
            msg->fd_cookie(), std::move(msg->attached_handles_)});
      }
#else
      if (bytes_written > 0) {
        msg->attached_handles_.Clear();
      }
#endif

      // Message sent OK!

For this Darwin(macOS) only code, I looked that git blame and I found 12yr old bugzilla why they introduced this special logic: https://bugzilla.mozilla.org/show_bug.cgi?id=868919#c7

The question is whether this is a race condition in our code, or a race condition in the kernel. It > looks like neither, if I'm reading the XNU source right. In unp_gc() in bsd/kern/uipc_usrreq.c,

   /*
    * If all refs are from msgs, and it's not marked accessible
    * then it must be referenced from some unreachable cycle
    * of (shut-down) FDs, so include it in our
    * list of FDs to remove
    */
   if (fg->fg_count == fg->fg_msgcount && !(fg->fg_flag & FMARK)) {

So, if the only active reference to a socket is sitting in a process's message queue, it is considered dead and will be closed.

BSD 4.4 lists this under "BUGS" (http://www.unix.com/man-page/FreeBSD/2/sendmsg/):

Because sendmsg() does not necessarily block until the data has been
transferred, it is possible to transfer an open file descriptor across an
AF_UNIX domain socket (see recv(2)), then close() it before it has actu-
ally been sent, the result being that the receiver gets a closed file
descriptor.  It is left to the application to implement an acknowledgment
mechanism to prevent this from happening.

So this should be easy to fix by just holding the parent fds alive until the owning channel is closed.

If Haiku's implementation follows to the same spec, it can be POSIX compliant behavior, but application code (firefox IPC code) need to handle early FD close specially for Haiku.

What do you think on this?

kenz-gelsoft commented 18 hours ago

I'll try porting these logic tonight. I hope these fixes test failures.

kenz-gelsoft commented 17 hours ago

I found that test case fails only when !msg->attachedhandles.IsEmpty(). That test case calls sendmsg() muliple times, but first call fails that !msg->attachedhandles.IsEmpty() after sendmsg().

I guess this is when sending FD before previously sent FDs are received. My debugging in https://github.com/kenz-gelsoft/gecko-dev/issues/49 supports this.

If FDs are closed before processed, they can be recycled for sending another font descriptors, it may look like a shared memory corruption.

It may also result in unreceived file descriptor is required error.