sipcapture / homer

HOMER - 100% Open-Source SIP, VoIP, RTC Packet Capture & Monitoring
https://sipcapture.org
GNU Affero General Public License v3.0
1.61k stars 240 forks source link

Blank QoS tab and UA Report in Flow tab #536

Closed solarmon closed 2 years ago

solarmon commented 2 years ago

Hi,

Homer 7.10.18 rtrpproxy 2.2.alpha.d5b7b4c4

I have set up rtpproxy to sent RTCP captures to Homer. This is working in that the packets do get to Homer.

However, for some calls (not all calls) I have this situation. In the QoS tab, it is empty/blank:

image

In the Flow tab, I see a "UA Report". The IP are that of each media endpoint. Note the timestamp of this UA Report:

image

In the Log tab, there are RTCP packets, but they start after the timestamp of the UA Report:

image

Please can you advise what this UA Report is, and when it does appear it causes the QoS tab to be blank.

Thank you.

lmangani commented 2 years ago

We need a dataset to reproduce this in order to find the potential issue

solarmon commented 2 years ago

@lmangani,

Just tell me what you require, but I cannot expose any sensitive data on a public forum - we we need to agree on how the dataset gets to you.

lmangani commented 2 years ago

I'm afraid we cannot work on this privately but should be able to remove the sensitive data very easily in exchange for free support :) We're solving a technical problem and we need technical details. None of the PII apply here (ips, numbers, etc) just session identifiers and details.

solarmon commented 2 years ago

OK, I will try to sanitise the data - please can you let me know exactly what you need and how/where to get it from.

lmangani commented 2 years ago

If timing in the RTCP reports is the issue, you can try boost your mapping lookup range to catch them:

    {
        "source_field": "data_header.callid",
        "lookup_id": 5,
        "lookup_profile": "default",
        "lookup_field": "sid",
        "lookup_range": [
            -300,
            2000
        ]
    },
solarmon commented 2 years ago

Do you mean time/date stamp on the rtpproxy and homer servers? I've just checked and the clocks are within a second of each other.

Or do you mean the timing difference between the RTCP endpoints themselves?

Anyways, I did change the range to beteen -600 and 400:

 {
        "source_field": "data_header.callid",
        "lookup_id": 100,
        "lookup_profile": "default",
        "lookup_field": "sid",
        "lookup_range": [
            -600,
            400
        ]
    },
    {
        "source_field": "data_header.callid",
        "lookup_id": 5,
        "lookup_profile": "call",
        "lookup_field": "sid",
        "lookup_range": [
            -600,
            400
        ]
    },

I've made 10 test calls so far and none of them has show the issue that was seen before and I see both QoS stats for both legs of the call.

I mention both legs of a call because the issue seems to be occurring on the inbound leg of the call. So do you think there is a timing difference between our RTCP endpoint and the SIP provider RTCP endpoint that would cause such an issue?

solarmon commented 2 years ago

And just typical...as soon as I posted the last comment and made another test call, the same issue has occurred. Again, on the inbound SIP leg of the call coming from the SIP provider.

lmangani commented 2 years ago

Let us know if you spot the difference between the good and bad scenario

solarmon commented 2 years ago

I'm doing the exact same test call every time. At the moment It just seems intermittent/random as to whether it has this display issue or not.

What would help me troubleshoot further is understanding what this "UA Report" is why/when does it appear in the flow view. It is not a SIP method, and the flow view is showing it with the IP of the media endpoints. So it must be to do with how it analyse the RTCP packets?

lmangani commented 2 years ago

Check if your BYE or 200 OK contain a User-Agent report (X-RTP-Stats, P-RTP-Stat) or sends RTCP-XR or whatnot

solarmon commented 2 years ago

I don't see anything in the SIP packets.

Doing a capture on the rtpproxy, I see that the first RTCP packet received is a "Receiver Report" (with Source Description), the rest of the RTCP packets are "Sender Report".

The Receiver Report packet matches exactly the timestamp of the "UA Report" that appears in Homer.

image

Could this Receiver Report be messing up the RTCP stats in the QoS tab?

solarmon commented 2 years ago

On Homer, in the Message tab of the call, this UA Report packet seems to appear as:

image

Which matches the timestamp of the Sender Report in the rtpproxy capture:

image

solarmon commented 2 years ago

This is the rtpproxy capture, filtered for just RTCP:

image

lmangani commented 2 years ago

I'm sorry but we can't work with screenshots. It's hard to even guess what the issue might be based on the provided info.

solarmon commented 2 years ago

I found that you can filter by SIP and RTCP in the Message tab:

image

So this first packet doesn't seem to be classed as Method or Type of RTCP?

solarmon commented 2 years ago

Looking back at the captures in Homer of the calls (and filtering the call view just to show RTCP payload type) that do and don't have this issue, the common thing is that when there a UA Report (RTCP packet type of Receiver Report), the QoS tab is empty; and when there is no UA Report then the QoS tab is working as expected.

Something about the UA Report (RTCP packet type of Receiver Report) seems to be breaking the QoS tab and RTCP stats analysis.

lmangani commented 2 years ago

As mentioned, you need to share the actual data for us to debug, there's nothing we can do with the theory - I'm sorry.

solarmon commented 2 years ago

What data do you require?

lmangani commented 2 years ago

Up to your judgement, we need to see everything that's needed in order to replicate the same issue.

solarmon commented 2 years ago

Maybe the RTCP HEP capture packets itself? And just the one for the Receiver Report?

lmangani commented 2 years ago

But how would be then able to correlate them with a session we do not have? those all cross-reference. If you're not willing to share the full traces to reproduce it (privately or censored) we won't be able to debug this

solarmon commented 2 years ago

@lmangani

Please find attached a sanitised version of the RTCP HEP captures coming from rtpproxy. The first packet is of type 201 (Receiver Report) which seems to cause the display issue in the QoS tab. (attachment removed)

solarmon commented 2 years ago

@lmangani

Looking at the HEP capture and using the HEP decoder from https://github.com/sipcapture/hep-wireshark/tree/master/lua

The only issue I can see (although, the decoder doesn't say anything about) is that the Encapsulated Payload content for the Receiver Report packet isn't quite JSON format valid - the last item (report_count) ends with a comma character:

image

Could this be the reason for the issue?

solarmon commented 2 years ago

I think I have found the bug in the rttproxy rtcp hep module:

Line 193 doesn't have the comma character:

https://github.com/sippy/rtpproxy/blob/b9d7b4ced2e9797f7d5d3f033af288fbc1aaecd8/modules/acct_rtcp_hep/rtcp2json.c#L193

But line 217 seems to have this comma character:

https://github.com/sippy/rtpproxy/blob/b9d7b4ced2e9797f7d5d3f033af288fbc1aaecd8/modules/acct_rtcp_hep/rtcp2json.c#L217

lmangani commented 2 years ago

@solarmon sounds like a very plausible cause! nice finding!

solarmon commented 2 years ago

@lmangani

I fixed the source code, recompiled rtpproxy and the modules, and that seems to have fixed the issue:

Homer now processed the packet correctly and the QoS tab shows RTCP stats:

image

You might want to look at making Homer handle such malformed packets more gracefully. Or at least give a clear indication that the packet is malformed.

Thank you for you support, it is very much appreciated.

I suppose I should now raise a bug report on the rtpproxy Github!