meetecho / janus-gateway

Janus WebRTC Server
https://janus.conf.meetecho.com
GNU General Public License v3.0
8.17k stars 2.47k forks source link

[0.x] Possible memory leaks found with libasan latest #3367

Closed arpu closed 4 months ago

arpu commented 4 months ago

What version of Janus is this happening on? v0.14.3

Have you tested a more recent version of Janus too? Yes

We use the Rust plugin janus plugin for video Audio + datachannel, and we see some memory growing over time at the moment i am not sure if this is a janus problem or the rust plugin https://github.com/networked-aframe/janus-plugin-sfu

janus log + libasan output with 2 users connceted each sending audi/video +datachannel, after both of them leave the Room i close janus with Ctrl+C

https://gist.github.com/arpu/1280532db480c8a3bd045784aef2cad9

arpu commented 4 months ago

Versions: LIBNICE="0.1.22" SRTP="2.6.0" usrsctp latest c4b52c34d4a7ecca6c992ba2f7f09607997d8ead

loop_index is set to 4

build on Ubuntu 24.04 in Docker

jnaus build with: CFLAGS="${CFLAGS} -O0 -g -ggdb3 -fno-omit-frame-pointer -fsanitize=address -fno-sanitize-recover=all -fsanitize-address-use-after-scope" LDFLAGS="-fsanitize=address" ./configure --prefix=/usr --disable-all-plugins --disable-all-handlers

lminiero commented 4 months ago

I'm sorry, but I won't debug logs related to plugins that are not part of this repo, let alone potentially proprietary ones. If you can replicate the problem with the stock code feel free to reopen.

atoppi commented 4 months ago

a few notes:

arpu commented 4 months ago

understand and thanks for the Feedback

@atoppi i add a new test and the session is closed after i hit CTRL-C

the new test was with older ubuntu 20.04 LTS and not enable event_loop https://gist.github.com/arpu/8fa1751b9473534f85e9d755d5c90946

arpu commented 4 months ago

@atoppi can i ask what version of https://github.com/sctplab/usrsctp is used in your test? because i see #2 0x555e8987d127 in janus_dtls_srtp_create_sctp /janus-gateway/dtls.c:689

i wonder if this has something to do with the datachannels

atoppi commented 4 months ago

commit c4b52c34d4a7ecca6c992ba2f7f09607997d8ead from April '24. This issue is very likely caused by the rust plugin not releasing resources.

arpu commented 4 months ago

Thank you, used the same version on test

do you have any Idea what resources, we should look at in the rust plugin? https://github.com/networked-aframe/janus-plugin-sfu/blob/master/src/lib.rs#L325

lminiero commented 4 months ago

@arpu please stop linking to the code of third party plugins. As already discussed, we won't inspect or debug code that doesn't belong to this repo.

arpu commented 4 months ago

any interests to bring this rust plugin to the official janus repo?

arpu commented 4 months ago

Question if we send a Datachannel with the plugin as message a DataChannel is opened https://github.com/meetecho/janus-gateway/blob/0.x/janus.c#L1662

do we need to use janus_dtls_srtp_destroy and free in the Plugin?

arpu commented 4 months ago

this is the cleanup after removing 2 clients from same room ( audio video datachannel) https://gist.github.com/arpu/e7687ce4e1da38c0a6d21366e11c61ea

vincentfretin commented 4 months ago

Thank you @atoppi for your notes, much appreciated. I'm happy to say we don't have memory leaks anymore in the rust sfu plugin https://github.com/networked-aframe/janus-plugin-sfu/pull/11 with loop_events option specified and the following versions

with the two following fixes:

But we have the following direct leak with latest libnice:

Direct leak of 69840 byte(s) in 2 object(s) allocated from:
    #0 0x725d9ca7e4d0 in calloc ../../../../src/libsanitizer/asan/asan_malloc_linux.cpp:77
    #1 0x725d9c5ec7a1 in g_malloc0 (/lib/x86_64-linux-gnu/libglib-2.0.so.0+0x637a1) (BuildId: 9753724b85d60f97b5d5663181ef7f4e69a62131)
    #2 0x580861beb9a4 in janus_sctp_association_create /janus-gateway/sctp.c:189
    #3 0x580861ad1e47 in janus_dtls_srtp_create_sctp /janus-gateway/dtls.c:689
    #4 0x580861b748e6 in janus_process_incoming_request /janus-gateway/janus.c:1662
    #5 0x580861b84b37 in janus_transport_task /janus-gateway/janus.c:3478
    #6 0x725d9c61a541  (/lib/x86_64-linux-gnu/libglib-2.0.so.0+0x91541) (BuildId: 9753724b85d60f97b5d5663181ef7f4e69a62131)
    #7 0x725d9c614c81  (/lib/x86_64-linux-gnu/libglib-2.0.so.0+0x8bc81) (BuildId: 9753724b85d60f97b5d5663181ef7f4e69a62131)
    #8 0x725d9c9e1109 in asan_thread_start ../../../../src/libsanitizer/asan/asan_interceptors.cpp:234

g_malloc0 is from (in 0.x branch) https://github.com/meetecho/janus-gateway/blob/a7767ad30b803d96e11b491547bcf5660cb7a937/sctp.c#L189 As far as I can see this has nothing to do with the rust plugin and janus_sctp_association_destroy(sctp) seems to be called in all relevant cases. So I really have no clue about this one.

I tested the following libnice versions: ok libnice 2020-07-06 13:53 (post 0.1.18) 48dac0d702b134f7b11b92602c234ba1120cc75b agent: keep a track of the candidate refreshes being pruned ok libnice 2020-12-06 19:09 (post 0.1.18) b353f30cfce498ffc2f1057d2d14aeb4b183e671 udp-turn: don't allocate large arrays on the stack ko libnice 2020-12-06 19:12 (post 0.1.18) a3f535669de74ba707fbd11c268ac04e60564a65 agent: don't allocate large arrays on the stack => the two previous commits come from https://gitlab.freedesktop.org/libnice/libnice/-/merge_requests/171 ko libnice 2021-04-20 15:09 (post 0.1.18) 8ccac4b04759edaecead7470e28865aff8066921 ... ko libnice 2024-03-04 (0.1.22) ae3eb16fd7d1237353bf64e899c612b8a63bca8a

I know you guys don't support debugging third party plugin. I'm just putting this here in case it rings a bell to you and there is something obvious for you or maybe an incompatibility between libstrp and libnice versions you know of.