Closed smititelu closed 4 months ago
Are you using redis for call replication?
I have been testing HA using redis to replicate calls to a standby node and noticed similar issues with statistics on the standby.
Active Node:
rtpengine-ctl list numsessions
Current sessions own: 1
Current sessions foreign: 0
Current sessions total: 1
Current transcoded media: 0
Current sessions ipv4 only media: 1
Current sessions ipv6 only media: 0
Current sessions ip mixed media: 0
Standby Node:
rtpengine-ctl list numsessions
Current sessions own: 0
Current sessions foreign: 1
Current sessions total: 1
Current transcoded media: 0
Current sessions ipv4 only media: 18446744073709551613
Current sessions ipv6 only media: 0
Current sessions ip mixed media: 0
I think this is an underflow issue caused by how replication works. The standby handles changes to a call by removing the call and then restoring it again. On the removal I believe it is decrementing these counters, but I don't think it is incrementing them on the restore?
On the above example we update the call a few times with answer
commands for progress and then the final answer.
created PR #1829
For my setup it fixed the stats. Maybe @rrb3942 you can try it too.
Yes indeed I am using a redis restore of calls in my setup and looks like no increments are done for them, only decrements when call_destroy() is called.
@smititelu Tested PR #1829 and it seems to work fine and report the correct statistics.
rtpengine version the issue has been seen with
latest master
Used distribution and its version
debian 12
Linux kernel version used
No response
CPU architecture issue was seen on (see
uname -m
)None
Expected behaviour you didn't see
Decent nr of stats
Unexpected behaviour you saw
Current sessions ipv4 only media: 18446744073704960623 Current sessions ipv6 only media: 18446744073709393906 Current sessions ip mixed media: 18446744073705726688
Steps to reproduce the problem
Let system run for a while.
Additional program output to the terminal or logs illustrating the issue
No response
Anything else?
Looking at the function statistics_update_ip46_inc_dec() I see it already has a flag to prevent double increments or decrements. Only way this wrong stats can be printed is if multiple decrements happen. I think this may happen because no locks protect those flag checks.
As a solution, I propose adding a (new) ipv stats lock at the beginning of the function and releasing it at the end. What do you think?
Thanks, Stefan