irontec / sngrep

Ncurses SIP Messages flow viewer
GNU General Public License v3.0
1.02k stars 187 forks source link

SIP messages out-of-order in Call Flow #408

Open jcabezas61 opened 2 years ago

jcabezas61 commented 2 years ago

Hi, Very often (but not always) when running sngrep in my opensips server: The ordering of SIP messages in Call Flow window is very messed. I'm not sure but it seems that all the messages are displayed though out-of-order.

sngrep version = 1.4.6 OS = Ubuntu 20.04.2 LTS

sngrep-out-of-order-blurred

Thanks, Julio

Kaian commented 2 years ago

Hi Julio!

Looks like time sort function is not working as expected because there are negative time diffs in the left column.

Can this be reproduced with an offline pcap file? Could send me one to debug the issue?

Thanks!

jcabezas61 commented 2 years ago

Hi,

Here is a "bad call" as seen on screen and in the exported pcap. for doing the export I selected only the desired bad call but pcap includes many other SIP messages (that were flowing through the server during the capture). You can select in Wireshark the relevant call messages using a filter like "sip.Call-ID~MWQ2"

I could notice some things about the problem during my usage of sngrep:

1- In my experience the same installed sngrep in the same server: along one day works fine(message order correct) for some minutes/hours and then starts to mess things for some more minutes/hours and then again works fine. It forms a sucession of cycles of well- and ill- functioning.

2- I could not yet understand the duration of those cycles or what triggers/explains the change from well- to ill- and vice-versa.

3- Besides the messed order of the messages in the displayed call flow it is frequent that when doing the capture I can see that some messages take some randow seconds to appear in the flow, some appear after other later messages already rendered on screen

4- My procedure to produce the .pcap is selecting just the one call that I want to export. It seems that a "problematic call" goes to pcap with several other messages not pertaining to the selected call. On the other hand a "good call" export shows strictly all the messages that are part of the call and no other extra message.

Thanks out-of-order_19-07-22

Link to pcap: https://www.dropbox.com/s/ytyewxwm5rs4yoy/out-of-order_19-07-22.pcap?dl=0.

jcabezas61 commented 2 years ago

hi, Any news on this issue? BR

Kaian commented 2 years ago

Hi!

Sorry, I've been on hollidays these weeks.

I've tested the attached pcap and message order seems ok in both sngrep 1.4.6 and 1.5.0 Although orrder is ok, the flow shows lots of messages that are probably packet retransmissions.

sngrep does not support TCP retransmissions (#102) packets and they are handled like normal packets so flows may end with a lot of duplicated arrows.

image

image

Maybe the problem is totally related to TCP dialogs?

Regards

jcabezas61 commented 2 years ago

Hi,

You ask me Maybe the problem is totally related to TCP dialogs? and I don't know what to say but the fact is that sometimes, during some time (see below) sngrep handles well the TCP-based dialogs. Btw all my important SIP traffic is TCP.

Let's make a fresh assessment of the problem as we know today:

There are time intervals (periods that can last for minutes or more) when all successive sngrep captures seem flawless

But there are time intervals (periods that last for minutes/hours) when all sngrep captures are defective

Also I observed that:

What could be the next step to understanding? or some new experiment?

BR.

Kaian commented 2 years ago

Hi!

My guess is that period with defective captures are caused by networks errors that generates TCP retransmissions. When those retransmissions occur, sngrep handle them as normal packets, causing errors in flows (because it only supports TCP streams that are flawless as we mention earlier).

One approach would be to try to reproduce this with an offline capture. Try capturing at the same time with other raw capture like tcpdump all the traffic and as soon as sngrep fails, stop the capture and check if there have been errors in TCP streams. Configure tcpdump to rotate captures to get a small amount of packets to analize. Opening that capture with sngrep will probably cause the same defective behaviour.

Regards!