Open PratikKumarSaha opened 8 months ago
Our output indicates a core dump. Would you mind sharing that and ideally the thread/callstack that's crashing?
#0 0x00007f3c88630acf in raise () from /lib64/libc.so.6
#1 0x00007f3c88603ea5 in abort () from /lib64/libc.so.6
#2 0x00007f3c89ba0918 in quic_bugcheck (File=0x7f3c89d4d088 "/msquic/src/platform/datapath_epoll.c", Line=2179,
Expr=0x7f3c89d4d5a8 "SendData->TotalSize + MaxBufferLength <= sizeof(SendData->Buffer)") at /msquic/src/platform/platform_posix.c:93
#3 0x00007f3c89ba92e8 in SendDataAllocBuffer (SendData=0x7f3c38017840, MaxBufferLength=<optimized out>) at /msquic/src/platform/datapath_epoll.c:2188
#4 0x00007f3c89ba25b8 in CxPlatSendDataAllocBuffer (SendData=<optimized out>, MaxBufferLength=<optimized out>) at /msquic/src/platform/datapath_linux.c:369
#5 0x00007f3c89b9c651 in QuicPacketBuilderPrepare (Builder=0x7f3c5f58ea40, NewPacketKeyType=QUIC_PACKET_KEY_1_RTT, IsTailLossProbe=<optimized out>,
IsPathMtuDiscovery=IsPathMtuDiscovery@entry=0 '\000') at /msquic/src/core/packet_builder.c:291
#6 0x00007f3c89b9cfb0 in QuicPacketBuilderPrepareForStreamFrames (Builder=Builder@entry=0x7f3c5f58ea40, IsTailLossProbe=<optimized out>)
at /msquic/src/core/packet_builder.c:623
#7 0x00007f3c89b9283b in QuicSendFlush (Send=Send@entry=0x7f3c440021a0) at /msquic/src/core/send.c:1352
#8 0x00007f3c89b78445 in QuicConnDrainOperations (Connection=Connection@entry=0x7f3c440013e0) at /msquic/src/core/connection.c:7595
#9 0x00007f3c89b61953 in QuicWorkerProcessConnection (Worker=Worker@entry=0x1d6f710, Connection=0x7f3c440013e0, ThreadID=13507,
TimeNow=TimeNow@entry=0x7f3c5f58eeb0) at /msquic/src/core/worker.c:505
#10 0x00007f3c89b6222b in QuicWorkerLoop (Context=Context@entry=0x1d6f710, State=State@entry=0x7f3c5f58eeb0) at /msquic/src/core/worker.c:658
#11 0x00007f3c89b62626 in QuicWorkerThread (Context=0x1d6f710) at /msquic/src/core/worker.c:726
#12 0x00007f3c889af1ca in start_thread () from /lib64/libpthread.so.0
#13 0x00007f3c8861be73 in clone () from /lib64/libc.so.6
Hello @nibanks any update on this issue?
Sorry, we still haven't had a chance to take a look. This is not a known issue.
But this issue is not seen on the code version available till 7th Oct 2023. Are your platforms working for big payload sizes (1500000000) ?
I haven't seen any issues with big payload. Have you tried Release builds? Perhaps that assert is benign.
Hello @nibanks I have tested the same for release v.2.3.4 and v.2.3.5 same issue is being observed for big payload sizes. Please provide us some insight on this.
I mean, have you tried Release (not Debug) builds?
Yes , I have tried the releases both v.2.3.4 and v.2.3.5 Big payload sizes are failing for RedHat 8.8 environment
I am not referring to a particular release branch or tag, but that you aren't building for debug mode. Asserts are only compiled in for debug mode, and I think the assert you're hitting is benign. If you are compiling/running release mode, please share the callstack for what is crashing now.
#0 0x00007f3c88630acf in raise () from /lib64/libc.so.6 #1 0x00007f3c88603ea5 in abort () from /lib64/libc.so.6 #2 0x00007f3c89ba0918 in quic_bugcheck (File=0x7f3c89d4d088 "/msquic/src/platform/datapath_epoll.c", Line=2179, Expr=0x7f3c89d4d5a8 "SendData->TotalSize + MaxBufferLength <= sizeof(SendData->Buffer)") at /msquic/src/platform/platform_posix.c:93 #3 0x00007f3c89ba92e8 in SendDataAllocBuffer (SendData=0x7f3c38017840, MaxBufferLength=<optimized out>) at /msquic/src/platform/datapath_epoll.c:2188 #4 0x00007f3c89ba25b8 in CxPlatSendDataAllocBuffer (SendData=<optimized out>, MaxBufferLength=<optimized out>) at /msquic/src/platform/datapath_linux.c:369 #5 0x00007f3c89b9c651 in QuicPacketBuilderPrepare (Builder=0x7f3c5f58ea40, NewPacketKeyType=QUIC_PACKET_KEY_1_RTT, IsTailLossProbe=<optimized out>, IsPathMtuDiscovery=IsPathMtuDiscovery@entry=0 '\000') at /msquic/src/core/packet_builder.c:291 #6 0x00007f3c89b9cfb0 in QuicPacketBuilderPrepareForStreamFrames (Builder=Builder@entry=0x7f3c5f58ea40, IsTailLossProbe=<optimized out>) at /msquic/src/core/packet_builder.c:623 #7 0x00007f3c89b9283b in QuicSendFlush (Send=Send@entry=0x7f3c440021a0) at /msquic/src/core/send.c:1352 #8 0x00007f3c89b78445 in QuicConnDrainOperations (Connection=Connection@entry=0x7f3c440013e0) at /msquic/src/core/connection.c:7595 #9 0x00007f3c89b61953 in QuicWorkerProcessConnection (Worker=Worker@entry=0x1d6f710, Connection=0x7f3c440013e0, ThreadID=13507, TimeNow=TimeNow@entry=0x7f3c5f58eeb0) at /msquic/src/core/worker.c:505 #10 0x00007f3c89b6222b in QuicWorkerLoop (Context=Context@entry=0x1d6f710, State=State@entry=0x7f3c5f58eeb0) at /msquic/src/core/worker.c:658 #11 0x00007f3c89b62626 in QuicWorkerThread (Context=0x1d6f710) at /msquic/src/core/worker.c:726 #12 0x00007f3c889af1ca in start_thread () from /lib64/libpthread.so.0 #13 0x00007f3c8861be73 in clone () from /lib64/libc.so.6
Hello @nibanks , I have verified the current release (2.3.5) and found the core dump stack is same as earlier which I have shared. Also, we have compiled it using "-Config Debug" and "-Config Release" modes. Both scenarios are having the same core dump call stack.
SendDataAllocBuffer
only has debug asserts (CXPLAT_DBG_ASSERT
), so if you're hitting an assert, you must not actually be compiling or running the Release config:
_IRQL_requires_max_(DISPATCH_LEVEL)
_Success_(return != NULL)
QUIC_BUFFER*
SendDataAllocBuffer(
_In_ CXPLAT_SEND_DATA* SendData,
_In_ uint16_t MaxBufferLength
)
{
CXPLAT_DBG_ASSERT(SendData != NULL);
CXPLAT_DBG_ASSERT(MaxBufferLength > 0);
CxPlatSendDataFinalizeSendBuffer(SendData);
CXPLAT_DBG_ASSERT(SendData->SegmentSize == 0 || SendData->SegmentSize >= MaxBufferLength);
CXPLAT_DBG_ASSERT(SendData->TotalSize + MaxBufferLength <= sizeof(SendData->Buffer));
CXPLAT_DBG_ASSERT(
SendData->SegmentationSupported ||
SendData->BufferCount < SendData->SocketContext->DatapathPartition->Datapath->SendIoVecCount);
UNREFERENCED_PARAMETER(MaxBufferLength);
if (SendData->ClientBuffer.Buffer == NULL) {
return NULL;
}
SendData->ClientBuffer.Length = MaxBufferLength;
return &SendData->ClientBuffer;
}
But on running in -Config Debug mode, I'm getting the same core dump call stack
It can't be the same callstack on Release though. What is the callstack on Release?
hi nibanks, I also met this issue on my phone Samsung A217M (Android 12), with MsQuic v2.2.4, it's ok. But after I upgraded MsQuic to v2.3.6, this issue occurs. I have tried the releases both v.2.3.4 and v.2.3.5, also failed on my phone. I check the source code of MsQuic v2.3.6, and I think something is wrong in datapath_epoll.c:
In the CxPlatSendDataFinalizeSendBuffer function:
if (SendData->SegmentationSupported) {
SendData->Iovs[0].iov_len += SendData->ClientBuffer.Length;
if (SendData->SegmentSize == 0 ||
SendData->ClientBuffer.Length < SendData->SegmentSize ||
SendData->TotalSize + SendData->SegmentSize > sizeof(SendData->Buffer)) {
SendData->ClientBuffer.Buffer = NULL;
} else {
SendData->ClientBuffer.Buffer += SendData->SegmentSize;
}
} else {
struct iovec* IoVec = &SendData->Iovs[SendData->BufferCount - 1];
IoVec->iov_base = SendData->ClientBuffer.Buffer;
IoVec->iov_len = SendData->ClientBuffer.Length;
if (SendData->TotalSize + SendData->SegmentSize > sizeof(SendData->Buffer) || <--- it's here
SendData->BufferCount == SendData->SocketContext->DatapathPartition->Datapath->SendIoVecCount) {
SendData->ClientBuffer.Buffer = NULL;
} else {
SendData->ClientBuffer.Buffer += SendData->ClientBuffer.Length;
}
}
My phone doesn't support segmentation, the value of "SendData->SegmentSize" is 0, so I think for devices that does't support segmentation the condition below is not sufficient: SendData->TotalSize + SendData->SegmentSize > sizeof(SendData->Buffer)
I add trace in the begin of the CxPlatSendDataFinalizeSendBuffer: QuicTraceEvent( DatapathSend_CxPlatSendDataFinalizeSendBuffer, "[data][%p] TotalSize %u bytes in %hhu buffers (segment=%hu) ClientBuffer.Length %u SegmentationSupported %u SendIoVecCount %u", SendData, SendData->TotalSize, SendData->BufferCount, SendData->SegmentSize, SendData->ClientBuffer.Length, SendData->SegmentationSupported, SendData->SocketContext->DatapathPartition->Datapath->SendIoVecCount);
[data][0x7c6c879340] TotalSize 64484 bytes in 47 buffers (segment=0) ClientBuffer.Length 0 SegmentationSupported 0 SendIoVecCount 53 [DatapathSend_CxPlatSendDataFinalizeSendBuffer:/xxx/msquic/msquic-2.3.6/src/platform/datapath_epoll.c:2166]
[ lib] ASSERT, 2221:/Users/charlotte/Projects/android-streamer/lib.srs/external/msquic/msquic-2.3.6/src/platform/datapath_epoll.c - SendData->TotalSize + MaxBufferLength <= sizeof(SendData->Buffer). [LibraryAssert:/Users/charlotte/Projects/android-streamer/lib.srs/external/msquic/msquic-2.3.6/src/platform/platform_posix.c:848]
SendData->TotalSize: 64484 MaxBufferLength: 1372 sizeof(SendData->Buffer): 65536
If running in release mode, SendData->TotalSize will be 65856, over 65536, then sendmsg will failed.
Interesting. Would you be able to propose a fix?
Yes, please check the attached patch file. 006_datapath_epoll.patch
Would you be willing to make a PR with the fix?
Yes, please
hi nibanks, I have made a PR with the fix. Please check it. https://github.com/microsoft/msquic/pull/4621
Describe the bug
The QUIC version (till 7th Oct 2023) is working fine on my machine for payload size of 1.5GB, however, it is not working for the commits after 7th October 2023.
We have assigned 2 CPUs each for SERVER and CLIENT. For small payload size (eg : 150000 ) it works fine but not for bigger payload sizes. Following is the output to distinguish the results.
Affected OS
Additional OS information
MsQuic version
main
Steps taken to reproduce bug
Expected behavior
It should not have thrown core dumped since it was working properly for the commits before and om 7th October 2023.
Actual outcome
Additional details
No response