Open YuriyFateev opened 2 months ago
Looks like you're just out of memory. There was an skb leak fixed in 11.5.1.19. You should update.
Hello. After this fix https://github.com/sipwise/rtpengine/commit/6cc7ceb1e24eb1b4be8994b51471b92443e19a16 The problem remains, but appears much less frequently.
We also tried this
opp = kzalloc(sizeof(*opp), GFP_KERNEL);
if (!opp) {
set_current_state(TASK_UNINTERRUPTIBLE);
schedule_timeout(msecs_to_jiffies(5));
opp = kzalloc(sizeof(*opp), GFP_KERNEL);
if (!opp) {
set_current_state(TASK_UNINTERRUPTIBLE);
schedule_timeout(msecs_to_jiffies(5));
opp = kzalloc(sizeof(*opp), GFP_KERNEL);
if (!opp) {
printk(KERN_ERR "xt_RTPENGINE [%s:%i] Error allocate memory !!!!!", __FUNCTION__, __LINE__);
goto err;
}
}
}
the problem did not recur. Without knowing the logic of the work, we are not entirely sure that this will be correct. Can you confirm or deny this? Thank you.
Hello. After this fix 6cc7ceb The problem remains, but appears much less frequently.
We also tried this
opp = kzalloc(sizeof(*opp), GFP_KERNEL); if (!opp) { set_current_state(TASK_UNINTERRUPTIBLE); schedule_timeout(msecs_to_jiffies(5)); opp = kzalloc(sizeof(*opp), GFP_KERNEL); if (!opp) { set_current_state(TASK_UNINTERRUPTIBLE); schedule_timeout(msecs_to_jiffies(5)); opp = kzalloc(sizeof(*opp), GFP_KERNEL); if (!opp) { printk(KERN_ERR "xt_RTPENGINE [%s:%i] Error allocate memory !!!!!", __FUNCTION__, __LINE__); goto err; } } }
the problem did not recur. Without knowing the logic of the work, we are not entirely sure that this will be correct. Can you confirm or deny this? Thank you.
I would rather offload the retry logic to user space. Return -ENOMEM when allocation fails and then handle it in the daemon.
Hello. Transferred processing to the daemon.
#define REQUEST_ATTEMPTS 1
GList *kernel_list() {
char s[64];
int fd;
struct rtpengine_list_entry *buf;
GList *li = NULL;
ssize_t ret;
if (!kernel.is_open)
return NULL;
sprintf(s, PREFIX "/%u/blist", kernel.table);
fd = open(s, O_RDONLY);
if (fd == -1)
return NULL;
for (;;) {
buf = g_slice_alloc(sizeof(*buf));
ret = read(fd, buf, sizeof(*buf));
int count = 0;
while ( ret == -1 && errno == ENOMEM ) {
if ( count >= REQUEST_ATTEMPTS ) {
ilog(LOG_ERROR, "RTPENGINE daemon [%s:%i] Error allocate memory in kernel mode!!! Request attempts: %d;", __FUNCTION__, __LINE__, count);
break;
}
count++;
usleep(5000);
ret = read(fd, buf, sizeof(*buf));
}
if (ret != sizeof(*buf))
break;
li = g_list_prepend(li, buf);
}
g_slice_free1(sizeof(*buf), buf);
close(fd);
return li;
}
The value (REQUEST_ATTEMPTS) will be determined based on the results of load tests.
Can you confirm or deny this?
Thank you.
Yes, this looks good to me. I would even be ok with completely aborting the function on failure and then just trying again on the next loop iteration of the main timer. But chances of success can vary, so feel free to experiment and determine the best approach. And then feel free to open a pull request if you want.
@rfuchs Thank you very much.
rtpengine version the issue has been seen with
11.5.1.3+0~mr11.5.1.3 git-HEAD-04b31c0
Used distribution and its version
Debian 11
Linux kernel version used
5.10.0-28-cloud-amd64
CPU architecture issue was seen on (see
uname -m
)x86_64
Expected behaviour you didn't see
No response
Unexpected behaviour you saw
No response
Steps to reproduce the problem
No response
Additional program output to the terminal or logs illustrating the issue
Anything else?
No response