vision-dbms / vision

The master repository for the Vision database system.
https://vision-dbms.com
BSD 3-Clause "New" or "Revised" License
27 stars 12 forks source link

Vca Hangs in simultaneous large write() calls #60

Open VCommitter opened 5 years ago

VCommitter commented 5 years ago

I've been having some trouble with Vca communication locking up between two peers when large writes do not complete. Basically both peers wind up in the middle of the write call in Vca::OS::DevicePoll::doWrite. It seems like one is waiting for the other. The backtrace on both ends looks something like this:

(gdb) bt
#0  0x000000373fc0d9f0 in __write_nocancel () from /lib64/libpthread.so.0
#1  0x00002adee4e2ed35 in Vca::OS::DevicePoll::doWrite (this=0x1a41dea4, rStatus=..., rsTransfer=@0x7fff8a530c90, rArea=..., hDevice=7) at ../kernel/Vca_VDeviceFactory.cpp:6607
#2  0x00002adee4e3d9a5 in Vca::OS::Device::doWrite (this=0x1a41de80, rStatus=..., rsTransfer=@0x7fff8a530c90, rArea=...) at ../kernel/Vca_VDeviceFactory.cpp:7125
#3  0x00002adee4e3d9e7 in Vca::OS::Device::BSPut::finish (this=0x373a6078, rStatus=...) at ../kernel/Vca_VDeviceFactory.cpp:6870
#4  0x00002adee4e3da60 in Vca::OS::Device::BSPut::attempt (this=0x373a6078, rStatus=...) at ../kernel/Vca_VDeviceFactory.cpp:6855
#5  0x00002adee4e3dae9 in Vca::OS::Device::BSPut::start (this=0x373a6078, rStatus=..., rArea=...) at ../kernel/Vca_VDeviceFactory.cpp:6846
#6  0x00002adee4e3db1a in Vca::VBSMove_<Vca::VDevice_<Vca::OS::SocketDevice>::ConsumerTypes>::start_ (this=0x373a5fb0, rStatus=..., rArea=...) at ../kernel/Vca_VDevice_.h:223
#7  0x0000000000656d9b in Vca::VDevice::BSWriteFace::BSWriter::BSWrite::start (this=0x373a5fb0, rStatus=..., pUser=0x2a9c35b0, rArea=...) at ../kernel/Vca_VDevice.h:733
#8  0x00002adee4e3aa7d in Vca::VBSMove_<Vca::VDevice_<Vca::OS::SocketDevice>::ConsumerTypes>::start (this=0x373a5fb0, rStatus=..., pUser=0x2a9c35b0, rArea=...) at ../kernel/Vca_VDevice_.h:219
#9  0x00002adee4e3aaea in Vca::VBSProducerConsumerDevice_<Vca::OS::SocketDevice>::start_ (this=0x1a41ddd8, rStatus=..., pUser=0x2a9c35b0, rArea=...) at ../kernel/Vca_VBSProducerConsumerDevice_.h:75
#10 0x00002adee4e2a6b6 in Vca::VDevice::BSWriteFace::BSWriter::start (this=0x2a9c35b0, rStatus=..., rArea=...) at ../kernel/Vca_VDevice.cpp:383
#11 0x00002adee4e2a709 in Vca::VDevice::BSWriteFace::BSWriter::start (this=0x2a9c35b0, rStatus=..., pArea=0x3dfbf20e, sArea=1995938) at ../kernel/Vca_VDevice.cpp:378
#12 0x00002adee4eb05f0 in Vca::VStreamSink::transferData (this=0x2a9c33e0) at ../kernel/Vca_VStreamSink.cpp:78
#13 0x00002adee4e039aa in Vca::VBS::put (this=0x2a9c33e0, pClient=0x37447b68, pData=0x3dfbe230, sData=2000000) at ../kernel/Vca_VBS.cpp:354
#14 0x00002adee4f84507 in Vca::VcaTransportOutbound::putData (this=0x249c8280, pSerializer=0x1a3dad00, pData=0x3dfbe230, sData=2000000) at ../kernel/Vca_VcaTransport.cpp:1107
#15 0x00002adee4f775e4 in Vca::VcaSerializer::put (this=0x1a3dad00, pData=0x3dfbe230, sData=2000000) at ../kernel/Vca_VcaSerializer.cpp:113
#16 0x00002adee4ed2185 in Vca::VcaSerializerForVkDynamicArrayOfPODType<int>::putData (this=0x1a3dad00) at ../kernel/Vca_VcaSerializerForVkDynamicArrayOf_.h:136
#17 0x00002adee4f8317a in Vca::VcaTransportOutbound::transferDataFor (this=0x249c8280, pSerializer=0x1a3dad00) at ../kernel/Vca_VcaTransport.cpp:1112
#18 0x00002adee4f775a4 in Vca::VcaSerializer::transferData (this=0x1a3dad00) at ../kernel/Vca_VcaSerializer.cpp:117
...

It is not limited to writes on Vca::VBSProducerConsumerDevice_<Vca::OS::SocketDevice> - it can also lock up on writes to Vca::VBSConsumerDevice_<Vca::OS::Device>:

(gdb) bt
#0  0x0000003939c0e7cd in write () at ../sysdeps/unix/syscall-template.S:82
#1  0x00007f02983e186c in Vca::OS::DevicePoll::doWrite (this=<optimized out>, rStatus=..., rsTransfer=rsTransfer@entry=@0x7ffd6ecfed00: 0, rArea=..., hDevice=<optimized out>)
    at /home/dev/fonix/online/pool/02/src/Vca_VDeviceFactory.cxx:6593
#2  0x00007f02983eadc1 in doWrite (rArea=..., rsTransfer=@0x7ffd6ecfed00: 0, rStatus=..., this=<optimized out>) at /home/dev/fonix/online/pool/02/src/Vca_VDeviceFactory.cxx:7111
#3  Vca::OS::Device::BSPut::finish (this=0xe0b3b8, rStatus=...) at /home/dev/fonix/online/pool/02/src/Vca_VDeviceFactory.cxx:6856
#4  0x00007f02983f249f in start (rArea=..., pUser=0xe03b70, rStatus=..., this=0xe0b2f0) at /home/dev/fonix/online/pool/02/src/Vca_VDevice.h:733
#5  start (rArea=..., pUser=0xe03b70, rStatus=..., this=0xe0b2f0) at /home/dev/fonix/online/pool/02/src/Vca_VDevice_.h:219
#6  Vca::VBSConsumerDevice_<Vca::OS::Device>::start_ (this=0xe03510, rStatus=..., pUser=0xe03b70, rArea=...) at /home/dev/fonix/online/pool/02/src/Vca_VBSConsumerDevice_.h:58
#7  0x00007f02983f6a88 in Vca::VDevice::BSWriteFace::BSWriter::start (this=this@entry=0xe03b70, rStatus=..., pArea=<optimized out>, sArea=<optimized out>)
    at /home/dev/fonix/online/pool/02/src/Vca_VDevice.cxx:378
#8  0x00007f029832af49 in Vca::VStreamSink::transferData (this=0xe039a0) at /home/dev/fonix/online/pool/02/src/Vca_VStreamSink.cxx:81
...

Attaching a debugger to either peer in this process interrupts the locked write(). Once continueed the peers unlock and complete their operations with no lingering problems.

Inspecting a locked Vca::OS::DevicePoll::doWrite shows you that the write() has already written some data but seems to get stuck before completing the requested write. Here is an example where 1,172,880 bytes (rsTransfer) have been written of a call to write() of 1,995,938 bytes (sRequest):

0x00002adee4e2ed35 in Vca::OS::DevicePoll::doWrite (this=0x1a41dea4, rStatus=..., rsTransfer=@0x7fff8a530c90, rArea=..., hDevice=7) at ../kernel/Vca_VDeviceFactory.cpp:6607
6607        rsTransfer = write (hDevice, rArea.base (), sRequest);
(gdb) list
6602    //  ... and do the write:
6603        size_t sRequest = rArea.size ();
6604        if (sRequest > SSIZE_MAX)
6605            sRequest = SSIZE_MAX;
6606
6607        rsTransfer = write (hDevice, rArea.base (), sRequest);
6608
6609        return rsTransfer >= 0 || rStatus.MakeErrorStatus ();
6610    }
6611    ^L
(gdb) p sRequest
$2 = 1995938
(gdb) p rArea
$3 = (const Vca::OS::DevicePoll::BSWriteArea &) @0x373a6098: {m_pRegion = 0x3dfbf20e, m_sRegion = 1995938}
(gdb) next
6609        return rsTransfer >= 0 || rStatus.MakeErrorStatus ();
(gdb) next
6610    }
(gdb) print rsTransfer
$4 = (ssize_t &) @0x7fff8a530c90: 1172880

Proposed Fix

I propose that we add a timeout to calls to write() in Vca - specifically those calls to write() in Vca::OS::DevicePoll::doWrite. This will keep peers from blocking indefinitely on large writes. Once the hung write() is timed out Vca's I/O scheduling system is able to successfully complete the I/Os scheduled between peers.

This write() timeout will serve to make Vca's communication more robust. The timeout may resolve other hangs that I haven't reproduced in my own testing.

MichaelJCaruso commented 5 years ago

We may have seen something like this before.

VCommitter commented 5 years ago

Similar for sure. Though in this case I don't think poll() is lying to us:

       POLLOUT
              Writing is now possible, though a write larger that the avail‐
              able space in a socket or pipe will still block (unless O_NON‐
              BLOCK is set).

It has written 1.2 megs of 1.9. Presumably it's peer that is also blocked on a large write needs to do some reading to drain the TCP/IP buffers before the last 0.7 megs can be written.

Instead of a timeout based solution we could set the device to non-blocking. Right now setNonBlockingTo() is implemented at the Vca::OS::Device level which should work for a socket file descriptor. Non-blocking is supported by all file descriptors where write timeouts are only supported by sockets via setsockopt().

bool Vca::OS::Device::setNonBlockingTo (bool bNonBlocking) const {
    int iFlagsNow = fcntl (m_hDevice, F_GETFL);
    int iFlagsNew = bNonBlocking ? iFlagsNow | O_NONBLOCK : iFlagsNow & ~O_NONBLOCK;
    return iFlagsNow != -1 && (iFlagsNow == iFlagsNew || fcntl (m_hDevice, F_SETFL, iFlagsNew) != -1);
}

A Question

We may have seen something like this before.

I have a question about the code near this comment. If the fix is to make the device non-blocking it looks like a fix was made and then commented it out? Or was it implemented some other way and this experimental non-blocking attempted fix was discarded?

MichaelJCaruso commented 5 years ago

Good question. Wish the breadcrumbs weren't so stale :-). It's possible that the fix was sound, but difficult to test properly and so wasn't deployed. It's a long shot, but do the old Perforce logs still exist?

dialtr commented 5 years ago

@MichaelJCaruso Are you referring to the Perforce history? I believe we should still have that.

MichaelJCaruso commented 5 years ago

@dialtr Yes. I don't know how accessible and searchable that history is, but if the change I referenced in my before link a couple of comments back can be found, there's a small chance that the commit comment may say something more useful than checking in a bunch of changes :-). If we're lucky, it may even say something like preserving code for a proposed change that we don't have time to test now but should release once we do. It's a long shot and probably not worth spending too much time on except for the historical audit value.

jarena commented 5 years ago

@MichaelJCaruso the makeDeviceFriendly() method was introduced back on 05-FEB-2004 with the change description:

Promote:

  • O_NONBLOCK Unix/Linux devices.
  • Minor routine renaming.
  • Further vca.xfer pruning.

At that time, it included the call to makeNonBlocking().

A few weeks later on 01-MAR-2004, the call to makeNonBlocking() was commented out, but the change description was not helpful:

Move VInterfaceMember into Vca namespace, rename VInterfaceMember... to Vca_VInterfaceMember...

6+ years later, on 19-JUL-2010, the call to makeNonBlocking() was re-introduced with the change description:

Make serial Vca devices NON-BLOCKING on Unix/Linux. Historically our practice has been to leave these devices in their default blocking state and use 'poll' to determine when calls that would otherwise block can be expected to return immediately. Unfortunately, in recent Linux testing, 'poll' has failed us. The problem is easily reproduced, occuring in processes that have served as middle-men in multi-hop communication (e.g., a pool) when those processes are closing their connections prior to exiting.

This was the change that added in the large comment block that you referenced earlier.

And just 4 days later, on 23-JUL-2010, the call to makeNonBlocking() was commented out again with the change description:

Revert the decision to make devices non-blocking on Unix/Linux. I've seen more problems with non-blocking devices (at least with terminal devices) than I have with what was there and have come not to trust it as a result.

VCommitter commented 5 years ago

I pushed a branch for the timeout fix and a branch for the nonblocking fix. These are both passing preliminary tests in our problem scenario.

Unfortunately I can't share my reproducer because it's proprietary. @MichaelJCaruso perhaps you can try either of these fixes out against some of the hangs you've been seeing?

MichaelJCaruso commented 5 years ago

Thanks @jarena!!! Commit commentary shortcomings aside, that history is exactly what is needed.

VCommitter commented 5 years ago

Given the discussion here and in other threads I propose the following fix:

Non-Blocking fixes the problems I've been encountering as well as a timeout on the socket; without imposing the performance cost of 30 second timeouts.

It's also simpler to get thorough testing of nonblocking: every operation that succeeds over a socket is testing nonblocking reads and writes. Timeouts on the other hand present testing complications as different timeout sizes and I/O operations sizes will combine to many different scenarios that need coverage.

I will proceed with this fix.

VCommitter commented 5 years ago

Here's another comment @MichaelJCaruso found in Vca_VDeviceFactory.cpp that also seems related:

/********************************************************************************
 * Note: Blocking socket descriptors created using Linux's socketpair ()
 *       has problems (hangs) when writing large chunks of date. To avoid it
 *       we have written this local getSocketPair () which is called with AF_INET
 *       family, to create connected pair of INET domain sockets
 ********************************************************************************/