sipcapture / homer

HOMER - 100% Open-Source SIP, VoIP, RTC Packet Capture & Monitoring
https://sipcapture.org
GNU Affero General Public License v3.0
1.58k stars 239 forks source link

Homer 10 - Call Flow - Sorting Issues/Wrong Order #625

Open tony1661 opened 6 months ago

tony1661 commented 6 months ago

I am noticing an issue in Homer 10's Call Flow dashboard. It seems that the Call Flow is not in the correct order which is making troubleshooting difficult.

I have a Freeswitch server (FusionPBX) with the portable helpify agent running heplify -hs homer-stack-ip:9060

If I search by the Call-ID in the SIP headers, I see all SIP messages associated with the call. See below: image

Obviously the INVITE would have occurred before the 407 however that is not what Grafana shows.

Based on the HEP Flow panel, the messages should be sorted oldest to newest.

If I look into each message I see that the INVITE has a later date than the 407.

INVITE Date image

407 Date image

The issue is similar to what is being experienced in Homer 7's web-ui which leads me to think the issue may be related to Heplify. In Homer 7, when searching for a call, the results are also in the wrong order, however when I click on the Session ID to view the ladder, it is in the correct order.

Is it possible that the ladder in Homer 7 is referencing a different timestamp that is also being referenced in Homer 10?

github-actions[bot] commented 6 months ago

Please star this repository to motivate the developers and to get higher priority! :star:

tony1661 commented 6 months ago

I've tested using captagent instead of heplify and the issue persists in Homer 10.

tony1661 commented 6 months ago

To add to this issue, I took a packet capture of a call that has the SIP messages displayed in the wrong order in Homer 10.

It seems that in the HEP packets, the Unix Timestamp doesn't change. See below: image

The last two messages (BYE and 200 OK) have a different timestamp and in Homer they do indeed appear at the bottom of the ladder but they appear in a different order in the ladder than they do in the packet capture.

It may be worth adding the Timestamp μs to the equation.

image

lmangani commented 6 months ago

Thanks for the report @tony1661 we're investigating and will make sure this is part of the next grafana-flow release @AlexeyOplachko could you check this after the holidays?

tony1661 commented 6 months ago

@AlexeyOplachko let me know if I can help in any way. I can provide logs, pcaps etc.

I have this on a production freeswitch server with heplify and captagent both running.

Hundreds of calls a day that we can look at.

lmangani commented 6 months ago

Hi @tony1661 we're back next week and we' ll most definitely address this

RFbkak37y3kIY commented 6 months ago

pushed fix for grafana-plugin, https://github.com/metrico/grafana-flow/pull/47

used field [tsNs] for increased sorting accuracy SIP messages

tony1661 commented 5 months ago

Hi all,

I saw there were some code merged. If I pull the latest docker images, will I be able to test this?

lmangani commented 5 months ago

As long as its using plugin version 10.0.10 you can also update an existing setup

lmangani commented 5 months ago

@tony1661 here's how

tony1661 commented 5 months ago

@lmangani Thanks for your quick response. I tested and the issue seems to still be there. Is there anything I can provide to help? Logs etc

AlexeyOplachko commented 5 months ago

For starters can you please verify that your grafana indeed got new plugin version please? your_grafana_url/plugins/qxip-flow-panel image

On our side we'll try to replicate this issue today and see if we need anything else from you

tony1661 commented 5 months ago

Hi @AlexeyOplachko ,

I have verified that I have 10.0.10 installed. See below: image

tony1661 commented 4 months ago

Anything I can help with?

lmangani commented 4 months ago

@AlexeyOplachko please provide an update

AlexeyOplachko commented 4 months ago

Anything I can help with?

@tony1661 Can you please provide screenshots of Message details with all the info in them, on two messages that are in incorrect order. image image And also can you please check if Sort Items is set. image

tony1661 commented 2 months ago

Hi @AlexeyOplachko sorry for the delay on this.

The Sort items is set to "Sort by Time: Oldest first".

Here is what the call flow looks like: ![image](https://github.com/sipcapture/homer/assets/5287266/cea3d73e-4d39-46f0-8f76-684d61d76323)
Here is the first message (INVITE): ![image](https://github.com/sipcapture/homer/assets/5287266/2038f4bd-9495-4f3a-80ab-b0c51ed7993e)
Here is the second message (200 OK): ![image](https://github.com/sipcapture/homer/assets/5287266/ca0ed13c-07d5-47bc-875f-b1b85ebb2424)
Here is the fourth message (that is supposed to be second - 100 Trying): ![image](https://github.com/sipcapture/homer/assets/5287266/afe06316-337d-4439-9d35-e0af80749bc3)
AlexeyOplachko commented 2 months ago

Hi @tony1661, thanks for reply, seems like this is not a sorting issue, but an issue with data.

If you look closely, message with 100 Trying has timestamp almost 4 minutes later than 200 OK. And all three timestamps(one in labels, one in Time field, and nanosecond one) show matching data that supports this.

tony1661 commented 2 months ago

@AlexeyOplachko Yea something seems off with the data. The issue happens on multiple HEP clients (heplify and captagent)

I have some screenshots from pcaps above that may assist.

I am running freeswitch (via FusionPBX)

Dletta commented 1 month ago

@AlexeyOplachko Could you check if you ended up fixing this issue?

tony1661 commented 1 month ago

@Dletta are you also experiencing this issue?

AlexeyOplachko commented 1 month ago

@AlexeyOplachko Could you check if you ended up fixing this issue?

Yes, from standpoint of our frontend there is no way for it to sort incorrectly, so it's only an issue with data

Dletta commented 1 month ago

@tony1661

I am not experiencing the same issue. I work with Alexey and wanted to make sure we don't let this issue go stale, :)

tony1661 commented 4 weeks ago

@AlexeyOplachko Could you check if you ended up fixing this issue?

Yes, from standpoint of our frontend there is no way for it to sort incorrectly, so it's only an issue with data

What can I do to assist? I've used multiple HEP agents and get the same results. Homer 7 does not have the issue with the same data source