meetecho / janus-gateway

Janus WebRTC Server
https://janus.conf.meetecho.com
GNU General Public License v3.0
8.17k stars 2.47k forks source link

Janus crash with VideoRoom streams summary [1.x] #3249

Closed zevarito closed 1 year ago

zevarito commented 1 year ago

What version of Janus is this happening on? It does happen in latest e1c7704 but it is happening since a few months now.

Have you tested a more recent version of Janus too? Yes

Was this working before? This same exception and a similar one (same function) I think is around since May at least.

Is there a gdb or libasan trace of the issue?

Additional context I've also seen a related exception that I don't have the backtrace at hand which fails in the same function but a little bit earlier and the core dump generated says janus instead of hloop. My guess is that it is some sort of race condition when that list is generated and a publisher leaves the room at the same time. It crash approximately after 2 weeks on every server, but I have the sense that latest build might also crash early, updated recently will let you know.

Let me know if you need me to provide any extra info and thank you very much for looking into this issue!

lminiero commented 1 year ago

Please test with #3247 too, since it addresses a couple of race conditions in the VideoRoom.

zevarito commented 1 year ago

Thanks @lminiero will do.

zevarito commented 1 year ago

I know you guys are on vacation right now, I am just updating the issue to keep you in the loop when you got back.

I've updated a few servers with the patch mentioned above but one of them only last 2 days up, it did not crash in the same way as previous crashes though, but it did last much less than previous revisions. I am attaching full BT let me know if you need something else, thank you!

#0  0x00007f91b022d843 in janus_videoroom_handler (data=<optimized out>) at plugins/janus_videoroom.c:11609
        iter = {dummy1 = 0x7f912c92e2a0, dummy2 = 0x0, dummy3 = 0x0, dummy4 = 8, dummy5 = 0, dummy6 = 0xd}
        value = 0x0
        audiocodec = <optimized out>
        vp9_profile = <optimized out>
        temp = <optimized out>
        jsep = <optimized out>
        videoroom = <optimized out>
        error_str = '\000' <repeats 344 times>...
        start = <optimized out>
        count = <optimized out>
        answer = <optimized out>
        h264_profile = <optimized out> 

https://gist.github.com/zevarito/d71a35b43d1e699f2791784a952dad2e

tmatth commented 1 year ago

I've updated a few servers with the patch mentioned above but one of them only last 2 days up, it did not crash in the same way as previous crashes though, but it did last much less than previous revisions. I am attaching full BT let me know if you need something else, thank you!

Thanks for the backtrace, can you try with https://github.com/meetecho/janus-gateway/pull/3259 ?

atoppi commented 1 year ago

@zevarito any update? Did you try the proposed patch?

zevarito commented 1 year ago

@atoppi no servers have crashed so far with this patch, the longest running is about 10 days now, however previous releases took 3 months some times to crash so deploying carefully. As in performance the patch doesn't seem to present any regressions.

zevarito commented 1 year ago

@atoppi just in case, I was talking about #3259, unfortunately #3247 keep crashing as mentioned above.

tmatth commented 1 year ago

@atoppi just in case, I was talking about #3259, unfortunately #3247 keep crashing as mentioned above.

Thanks @zevarito I've marked #3259 as ready for review in that case.

lminiero commented 1 year ago

Closing as I merged @tmatth 's patch.