microsoft / msquic

Cross-platform, C implementation of the IETF QUIC protocol, exposed to C, C++, C# and Rust.
MIT License
4.05k stars 530 forks source link

secnetperf not working for big payload sizes #4144

Open PratikKumarSaha opened 8 months ago

PratikKumarSaha commented 8 months ago

Describe the bug

The QUIC version (till 7th Oct 2023) is working fine on my machine for payload size of 1.5GB, however, it is not working for the commits after 7th October 2023.

We have assigned 2 CPUs each for SERVER and CLIENT. For small payload size (eg : 150000 ) it works fine but not for bigger payload sizes. Following is the output to distinguish the results.

[root@9030bf879c51 x86_Debug_openssl]# ./secnetperf -exec:maxtput -test:tput -target:192.168.96.3 -up:150000 -ptput:1
Started!

Result: Upload 150000 bytes @ 236173 kbps (5.081 ms) .

[root@9030bf879c51 x86_Debug_openssl]# ./secnetperf -exec:maxtput -test:tput -target:192.168.96.3 -up:1500000000 -ptput:1
Started!

Aborted (core dumped) 

Affected OS

Additional OS information

MsQuic version

main

Steps taken to reproduce bug

  1. /secnetperf -exec:maxtput (Server)
  2. ./secnetperf -exec:maxtput -test:tput -target:192.168.96.3 -up:1500000000 -ptput:1 (Client)

Expected behavior

It should not have thrown core dumped since it was working properly for the commits before and om 7th October 2023.

Actual outcome

[root@9030bf879c51 x86_Debug_openssl]# ./secnetperf -exec:maxtput -test:tput -target:192.168.96.3 -up:150000 -ptput:1
Started!

Result: Upload 150000 bytes @ 236173 kbps (5.081 ms) .

[root@9030bf879c51 x86_Debug_openssl]# ./secnetperf -exec:maxtput -test:tput -target:192.168.96.3 -up:1500000000 -ptput:1
Started!

Aborted (core dumped) 

Additional details

No response

nibanks commented 8 months ago

Our output indicates a core dump. Would you mind sharing that and ideally the thread/callstack that's crashing?

PratikKumarSaha commented 8 months ago
#0  0x00007f3c88630acf in raise () from /lib64/libc.so.6
#1  0x00007f3c88603ea5 in abort () from /lib64/libc.so.6
#2  0x00007f3c89ba0918 in quic_bugcheck (File=0x7f3c89d4d088 "/msquic/src/platform/datapath_epoll.c", Line=2179, 
    Expr=0x7f3c89d4d5a8 "SendData->TotalSize + MaxBufferLength <= sizeof(SendData->Buffer)") at /msquic/src/platform/platform_posix.c:93
#3  0x00007f3c89ba92e8 in SendDataAllocBuffer (SendData=0x7f3c38017840, MaxBufferLength=<optimized out>) at /msquic/src/platform/datapath_epoll.c:2188
#4  0x00007f3c89ba25b8 in CxPlatSendDataAllocBuffer (SendData=<optimized out>, MaxBufferLength=<optimized out>) at /msquic/src/platform/datapath_linux.c:369
#5  0x00007f3c89b9c651 in QuicPacketBuilderPrepare (Builder=0x7f3c5f58ea40, NewPacketKeyType=QUIC_PACKET_KEY_1_RTT, IsTailLossProbe=<optimized out>, 
    IsPathMtuDiscovery=IsPathMtuDiscovery@entry=0 '\000') at /msquic/src/core/packet_builder.c:291
#6  0x00007f3c89b9cfb0 in QuicPacketBuilderPrepareForStreamFrames (Builder=Builder@entry=0x7f3c5f58ea40, IsTailLossProbe=<optimized out>)
    at /msquic/src/core/packet_builder.c:623
#7  0x00007f3c89b9283b in QuicSendFlush (Send=Send@entry=0x7f3c440021a0) at /msquic/src/core/send.c:1352
#8  0x00007f3c89b78445 in QuicConnDrainOperations (Connection=Connection@entry=0x7f3c440013e0) at /msquic/src/core/connection.c:7595
#9  0x00007f3c89b61953 in QuicWorkerProcessConnection (Worker=Worker@entry=0x1d6f710, Connection=0x7f3c440013e0, ThreadID=13507, 
    TimeNow=TimeNow@entry=0x7f3c5f58eeb0) at /msquic/src/core/worker.c:505
#10 0x00007f3c89b6222b in QuicWorkerLoop (Context=Context@entry=0x1d6f710, State=State@entry=0x7f3c5f58eeb0) at /msquic/src/core/worker.c:658
#11 0x00007f3c89b62626 in QuicWorkerThread (Context=0x1d6f710) at /msquic/src/core/worker.c:726
#12 0x00007f3c889af1ca in start_thread () from /lib64/libpthread.so.0
#13 0x00007f3c8861be73 in clone () from /lib64/libc.so.6
PratikKumarSaha commented 8 months ago

Hello @nibanks any update on this issue?

nibanks commented 8 months ago

Sorry, we still haven't had a chance to take a look. This is not a known issue.

PratikKumarSaha commented 8 months ago

But this issue is not seen on the code version available till 7th Oct 2023. Are your platforms working for big payload sizes (1500000000) ?

nibanks commented 8 months ago

I haven't seen any issues with big payload. Have you tried Release builds? Perhaps that assert is benign.

PratikKumarSaha commented 7 months ago

Hello @nibanks I have tested the same for release v.2.3.4 and v.2.3.5 same issue is being observed for big payload sizes. Please provide us some insight on this.

nibanks commented 7 months ago

I mean, have you tried Release (not Debug) builds?

PratikKumarSaha commented 7 months ago

Yes , I have tried the releases both v.2.3.4 and v.2.3.5 Big payload sizes are failing for RedHat 8.8 environment

nibanks commented 7 months ago

I am not referring to a particular release branch or tag, but that you aren't building for debug mode. Asserts are only compiled in for debug mode, and I think the assert you're hitting is benign. If you are compiling/running release mode, please share the callstack for what is crashing now.

PratikKumarSaha commented 7 months ago
#0  0x00007f3c88630acf in raise () from /lib64/libc.so.6
#1  0x00007f3c88603ea5 in abort () from /lib64/libc.so.6
#2  0x00007f3c89ba0918 in quic_bugcheck (File=0x7f3c89d4d088 "/msquic/src/platform/datapath_epoll.c", Line=2179, 
    Expr=0x7f3c89d4d5a8 "SendData->TotalSize + MaxBufferLength <= sizeof(SendData->Buffer)") at /msquic/src/platform/platform_posix.c:93
#3  0x00007f3c89ba92e8 in SendDataAllocBuffer (SendData=0x7f3c38017840, MaxBufferLength=<optimized out>) at /msquic/src/platform/datapath_epoll.c:2188
#4  0x00007f3c89ba25b8 in CxPlatSendDataAllocBuffer (SendData=<optimized out>, MaxBufferLength=<optimized out>) at /msquic/src/platform/datapath_linux.c:369
#5  0x00007f3c89b9c651 in QuicPacketBuilderPrepare (Builder=0x7f3c5f58ea40, NewPacketKeyType=QUIC_PACKET_KEY_1_RTT, IsTailLossProbe=<optimized out>, 
    IsPathMtuDiscovery=IsPathMtuDiscovery@entry=0 '\000') at /msquic/src/core/packet_builder.c:291
#6  0x00007f3c89b9cfb0 in QuicPacketBuilderPrepareForStreamFrames (Builder=Builder@entry=0x7f3c5f58ea40, IsTailLossProbe=<optimized out>)
    at /msquic/src/core/packet_builder.c:623
#7  0x00007f3c89b9283b in QuicSendFlush (Send=Send@entry=0x7f3c440021a0) at /msquic/src/core/send.c:1352
#8  0x00007f3c89b78445 in QuicConnDrainOperations (Connection=Connection@entry=0x7f3c440013e0) at /msquic/src/core/connection.c:7595
#9  0x00007f3c89b61953 in QuicWorkerProcessConnection (Worker=Worker@entry=0x1d6f710, Connection=0x7f3c440013e0, ThreadID=13507, 
    TimeNow=TimeNow@entry=0x7f3c5f58eeb0) at /msquic/src/core/worker.c:505
#10 0x00007f3c89b6222b in QuicWorkerLoop (Context=Context@entry=0x1d6f710, State=State@entry=0x7f3c5f58eeb0) at /msquic/src/core/worker.c:658
#11 0x00007f3c89b62626 in QuicWorkerThread (Context=0x1d6f710) at /msquic/src/core/worker.c:726
#12 0x00007f3c889af1ca in start_thread () from /lib64/libpthread.so.0
#13 0x00007f3c8861be73 in clone () from /lib64/libc.so.6

Hello @nibanks , I have verified the current release (2.3.5) and found the core dump stack is same as earlier which I have shared. Also, we have compiled it using "-Config Debug" and "-Config Release" modes. Both scenarios are having the same core dump call stack.

nibanks commented 7 months ago

SendDataAllocBuffer only has debug asserts (CXPLAT_DBG_ASSERT), so if you're hitting an assert, you must not actually be compiling or running the Release config:

_IRQL_requires_max_(DISPATCH_LEVEL)
_Success_(return != NULL)
QUIC_BUFFER*
SendDataAllocBuffer(
    _In_ CXPLAT_SEND_DATA* SendData,
    _In_ uint16_t MaxBufferLength
    )
{
    CXPLAT_DBG_ASSERT(SendData != NULL);
    CXPLAT_DBG_ASSERT(MaxBufferLength > 0);
    CxPlatSendDataFinalizeSendBuffer(SendData);
    CXPLAT_DBG_ASSERT(SendData->SegmentSize == 0 || SendData->SegmentSize >= MaxBufferLength);
    CXPLAT_DBG_ASSERT(SendData->TotalSize + MaxBufferLength <= sizeof(SendData->Buffer));
    CXPLAT_DBG_ASSERT(
        SendData->SegmentationSupported ||
        SendData->BufferCount < SendData->SocketContext->DatapathPartition->Datapath->SendIoVecCount);
    UNREFERENCED_PARAMETER(MaxBufferLength);
    if (SendData->ClientBuffer.Buffer == NULL) {
        return NULL;
    }
    SendData->ClientBuffer.Length = MaxBufferLength;
    return &SendData->ClientBuffer;
}
PratikKumarSaha commented 7 months ago

But on running in -Config Debug mode, I'm getting the same core dump call stack

nibanks commented 7 months ago

It can't be the same callstack on Release though. What is the callstack on Release?

URNOTCharlotte commented 1 month ago

hi nibanks, I also met this issue on my phone Samsung A217M (Android 12), with MsQuic v2.2.4, it's ok. But after I upgraded MsQuic to v2.3.6, this issue occurs. I have tried the releases both v.2.3.4 and v.2.3.5, also failed on my phone. I check the source code of MsQuic v2.3.6, and I think something is wrong in datapath_epoll.c:

  In the CxPlatSendDataFinalizeSendBuffer function:

  if (SendData->SegmentationSupported) {
    SendData->Iovs[0].iov_len += SendData->ClientBuffer.Length;
    if (SendData->SegmentSize == 0 ||
        SendData->ClientBuffer.Length < SendData->SegmentSize ||
        SendData->TotalSize + SendData->SegmentSize > sizeof(SendData->Buffer)) {
        SendData->ClientBuffer.Buffer = NULL;
    } else {
        SendData->ClientBuffer.Buffer += SendData->SegmentSize;
    }
} else {
    struct iovec* IoVec = &SendData->Iovs[SendData->BufferCount - 1];
    IoVec->iov_base = SendData->ClientBuffer.Buffer;
    IoVec->iov_len = SendData->ClientBuffer.Length;
    if (SendData->TotalSize + SendData->SegmentSize > sizeof(SendData->Buffer) ||       <--- it's here
        SendData->BufferCount == SendData->SocketContext->DatapathPartition->Datapath->SendIoVecCount) {
        SendData->ClientBuffer.Buffer = NULL;
    } else {
        SendData->ClientBuffer.Buffer += SendData->ClientBuffer.Length;
    }
}

My phone doesn't support segmentation, the value of "SendData->SegmentSize" is 0, so I think for devices that does't support segmentation the condition below is not sufficient: SendData->TotalSize + SendData->SegmentSize > sizeof(SendData->Buffer)

I add trace in the begin of the CxPlatSendDataFinalizeSendBuffer: QuicTraceEvent( DatapathSend_CxPlatSendDataFinalizeSendBuffer, "[data][%p] TotalSize %u bytes in %hhu buffers (segment=%hu) ClientBuffer.Length %u SegmentationSupported %u SendIoVecCount %u", SendData, SendData->TotalSize, SendData->BufferCount, SendData->SegmentSize, SendData->ClientBuffer.Length, SendData->SegmentationSupported, SendData->SocketContext->DatapathPartition->Datapath->SendIoVecCount);

[data][0x7c6c879340] TotalSize 64484 bytes in 47 buffers (segment=0) ClientBuffer.Length 0 SegmentationSupported 0 SendIoVecCount 53 [DatapathSend_CxPlatSendDataFinalizeSendBuffer:/xxx/msquic/msquic-2.3.6/src/platform/datapath_epoll.c:2166]

[ lib] ASSERT, 2221:/Users/charlotte/Projects/android-streamer/lib.srs/external/msquic/msquic-2.3.6/src/platform/datapath_epoll.c - SendData->TotalSize + MaxBufferLength <= sizeof(SendData->Buffer). [LibraryAssert:/Users/charlotte/Projects/android-streamer/lib.srs/external/msquic/msquic-2.3.6/src/platform/platform_posix.c:848]

SendData->TotalSize: 64484 MaxBufferLength: 1372 sizeof(SendData->Buffer): 65536

If running in release mode, SendData->TotalSize will be 65856, over 65536, then sendmsg will failed.

nibanks commented 1 month ago

Interesting. Would you be able to propose a fix?

URNOTCharlotte commented 1 month ago

Yes, please check the attached patch file. 006_datapath_epoll.patch

nibanks commented 1 month ago

Would you be willing to make a PR with the fix?

URNOTCharlotte commented 1 month ago

Yes, please

URNOTCharlotte commented 3 weeks ago

hi nibanks, I have made a PR with the fix. Please check it. https://github.com/microsoft/msquic/pull/4621