sipwise / rtpengine

The Sipwise media proxy for Kamailio
GNU General Public License v3.0
784 stars 368 forks source link

Wrong IPv session stats #1828

Closed smititelu closed 4 months ago

smititelu commented 4 months ago

rtpengine version the issue has been seen with

latest master

Used distribution and its version

debian 12

Linux kernel version used

No response

CPU architecture issue was seen on (see uname -m)

None

Expected behaviour you didn't see

Decent nr of stats

Unexpected behaviour you saw

Current sessions ipv4 only media: 18446744073704960623 Current sessions ipv6 only media: 18446744073709393906 Current sessions ip mixed media: 18446744073705726688

Steps to reproduce the problem

Let system run for a while.

Additional program output to the terminal or logs illustrating the issue

No response

Anything else?

Looking at the function statistics_update_ip46_inc_dec() I see it already has a flag to prevent double increments or decrements. Only way this wrong stats can be printed is if multiple decrements happen. I think this may happen because no locks protect those flag checks.

As a solution, I propose adding a (new) ipv stats lock at the beginning of the function and releasing it at the end. What do you think?

Thanks, Stefan

rrb3942 commented 4 months ago

Are you using redis for call replication?

I have been testing HA using redis to replicate calls to a standby node and noticed similar issues with statistics on the standby.

Active Node:

rtpengine-ctl list numsessions
Current sessions own: 1
Current sessions foreign: 0
Current sessions total: 1
Current transcoded media: 0
Current sessions ipv4 only media: 1
Current sessions ipv6 only media: 0
Current sessions ip mixed  media: 0

Standby Node:

rtpengine-ctl list numsessions
Current sessions own: 0
Current sessions foreign: 1
Current sessions total: 1
Current transcoded media: 0
Current sessions ipv4 only media: 18446744073709551613
Current sessions ipv6 only media: 0
Current sessions ip mixed  media: 0

I think this is an underflow issue caused by how replication works. The standby handles changes to a call by removing the call and then restoring it again. On the removal I believe it is decrementing these counters, but I don't think it is incrementing them on the restore?

On the above example we update the call a few times with answer commands for progress and then the final answer.

smititelu commented 4 months ago

created PR #1829

For my setup it fixed the stats. Maybe @rrb3942 you can try it too.

Yes indeed I am using a redis restore of calls in my setup and looks like no increments are done for them, only decrements when call_destroy() is called.

rrb3942 commented 4 months ago

@smititelu Tested PR #1829 and it seems to work fine and report the correct statistics.