ultravideo / uvgRTP

An open-source library for RTP/SRTP media delivery
BSD 2-Clause "Simplified" License
296 stars 84 forks source link

Failed to flush the message queue #201

Closed xueshenan closed 11 months ago

xueshenan commented 1 year ago

Hi, I want to use the uvgRTP to send H.264 frame. When use the following send method:

sender_stream->push_frame((uint8_t *)buffer, buffer_size, RTP_NO_FLAGS);

I got lot of error message as followint:

[uvgRTP][ERROR][::flush_queue] Failed to flush the message queue: 14
[uvgRTP][ERROR][::log_platform_error] sendmmsg(2) failed: Bad address 14
tampsa commented 1 year ago

Hello. Sorry to hear that you have a problem. We could use a bit more information to help you debug this issue.

Are you using the latest release version v2.3.0 or the latest source code on GitHub?

The error code that you got suggests that the buffer parameter might be the cause of the problem.

Can you provide more of your sender code for us to analyze?

jrsnen commented 11 months ago

Seems like a bad address in buffer. Closing as no answer was given.

bcizmeci commented 10 months ago

I would like to jump in to this question because I encountered the same error messages for a specific HEVC configuration (NVENC). I also tried to increase the buffer sizes to 80 MB for RCV & SND 16 MB for RING buffer size but increasing the buffer sizes doesn't help at all.

The error messages periodically happening when the encoder triggers intra refresh. More specifically, I set intra_refresh_period: 250 intra_refresh_count: 50

I validated this behaviour with different parameters. When the intra refresh is disabled, the error messages are not printed and there aren't any artifacts in the video.

bcizmeci commented 10 months ago

I further performed another test by selecting RTP_FORMAT_GENERIC and I was able to stream the videos with the intra refresh mode enabled. However, when the codec specific fragmentation is enabled (RTP_FORMAT_H265), the sending error messages and artifacts start to occur.

jrsnen commented 10 months ago

Hi @bcizmeci

Failed to flush message queue originates from the send function of socket somehow. It seems there is something in the OS or the network that is preventing the sending of multiple packets.

1) Do you use RCE_FRAGMENT_GENERIC flag with RTP_FORMAT_GENERIC? Without the enable flag, there is no fragmentation with generic. With IPv4, IP level fragmentation is used, which is not correct for RTP. IPv6 does not have IP level fragmentation. Another possibility is that the network has large Ethernet frames in use but somehow sending multiple packets is problematic.

2) Are you using other RCE flags? The SCC could cause this sort of issues in some situations.

3) Did you test rate control? Intra frames are often larger than inter frames and in many cases the inter frames do not need fragmentation whereas intra requires it almost always. It might be just that the problem is sending more data than the network can handle.

You might also try if RCE_FRAMERATE (default is 30 fps) and RCE_FRAGMEN_PACING if those make it easier for the network to handle a burst of packets.

Best Regards, Joni

bcizmeci commented 10 months ago

Hi Joni,

Thanks a lot for your feedback!

Regarding item 1 you mentioned above, yes I enabled the RCE_FRAGMENT_GENERIC as well. With the generic transmission, everything runs smoothly. Once I switch back to RTP_FORMAT_H265, the error messages only appears when the encoder triggers intra refresh, which is not a single I or IDR frame but the refresh of macroblocks over a certain period (in a raster scan order parts of the P-frame forced intra coding to recover packet loss errors). When the intra refresh period is over, the normal P frame transmission works smoothly but because of the missing reference frames there are artifacts and the stream requires and I frame to recover.

Regarding the item 2 you mentioned above, we have only RCE_H26X_PREPEND_SC is enabled. When I switch back without any flags, I still observe the error with RTP_FORMAT_H265.

Regarding the item 3 you mentioned above, yes the CBR rate controller is always enabled. With periodic I frame transmission, everything works fine but the main reason why we wanted to replace I frames with the intra refresh mode is to exactly avoid the transmission peaks that occur with I frames. I have also checked the individual frame sizes for the intra refresh P frames, they are very close to the same size of a standard coded P-frame as expected. In my opinion, the problem is not sending a lot of data to the network because the system works with RTP_FORMAT_GENERIC and desired intra refresh configuration.

Finally, I also played around the frame rate regulation flags but it didn't change anything.

If you need a specific information or output to debug, I can provide it.

Best regards, Burak

jrsnen commented 10 months ago

I'm assuming you are using master version. Does previous release version work?

If that does not work, could @tampsa take a look at this? He has been more involved with uvgRTP in recent months.

BR, Joni

bcizmeci commented 10 months ago

Joni, I have tested with the recent master from yesterday. We were also using an older version 2.1.0, the same happens there as well. I am also diving into the code and give you more specific information.

jrsnen commented 10 months ago

Have you made any progress?

What happens if you use RTP_NO_H26X_SCL RTP flag when pushing the frame? This would reveal if the problem is related to SCL.

bcizmeci commented 9 months ago

@jrsnen Thanks for asking! The above also didn't help to fix things on my side. I think it is best, I open a new issue and provide you a dataset to check the problem on your side.

dreamerns commented 8 months ago

@bcizmeci did you found out how to fix this issue? I'm using current master and I have same issues using H264 encoder. Thanks.

bcizmeci commented 8 months ago

@bcizmeci did you found out how to fix this issue? I'm using current master and I have same issues using H264 encoder. Thanks.

@dreamerns Unfortunately I didn't fix it. Currently, I switched back to RTP_FORMAT_GENERIC until the bug is found.

witaly-iwanow commented 8 months ago

I see the same issue with RTP_FORMAT_H264, both with master and 2.3.0. However when I change it to H265 the problem goes away. This is really frustrating as I really like the API and easy of use, but looking at a bunch of h.264 issues reported here I can say claiming this library supports h.264 is a very bold statement - it should come with (alpha) disclaimer.

The problem is really easy to repro and it happens on both macOS and Ubuntu 22.04, and it looks like some sort of memory corruption issue when a big frame is being fragmented. Here's the test app (forwarding RTP from one port to another):

uvgrtp::context ctx;
uvgrtp::session *sess = ctx.create_session("127.0.0.1");
uvgrtp::media_stream *stream = sess->create_stream(8890, 8891, RTP_FORMAT_H264, RTP_NO_FLAGS);

while (true) {
    uvgrtp::frame::rtp_frame* frame = stream->pull_frame(1000);
    if (frame) {
        auto ret = stream->push_frame(frame->payload, frame->payload_len, RTP_COPY);
        std::cout << frame->payload_len << " bytes, res: " << ret << std::endl;
    }
}

and the output is (see how the error pops up on a large NALU)

[uvgRTP][ERROR][::flush_queue] Failed to flush the message queue: 14
29564 bytes, res: -5
9 bytes, res: 0
4400 bytes, res: 0
9 bytes, res: 0
3189 bytes, res: 0
12 bytes, res: 0
9 bytes, res: 0
witaly-iwanow commented 8 months ago

I tried to debug sendmmsg calls and I could see it fragments a NALU into something like 1200+1200+1200+...+300 byte chunks - all good at this point, but after that there are 20-30 tiny packets with apparently bogus destination address it tries to send out

jrsnen commented 8 months ago

Hi. I created new issues for both H264 (#204) and H265 (#205) as there is a possibility that they are not the same issue. This makes discussing the issues easier on our end. These both issues are acknowledged and will be fixed at some point, but we also have other responsibilities, so it may take some time. Please, let's continue discussion in these dedicated issues.

BR, Joni