Closed haozxuan closed 1 year ago
For deadlock issue, we need the stack traces so we can know which threads are involved in the deadlock and at which functions.
How do I cooperate, log level 5 or 6 and PJ_DEBUG 1? are these enough,Any ideas would be greatly appreciated
for deadlock by golang pprof debug, i found some like this
10 @ 0x4bf2fc 0x11a5725 0x1232236 0x123233c 0x12a6717 0x12ba576 0x12ba173 0xed730e 0xed70c8 0x52b001
# 0x11a5724 git.admin.com/uvms/pjsip/pjsua2._Cfunc__wrap_Account_create__SWIG_1_pjsua2_32805695c8aaaed9+0x44 _cgo_gotypes.go:4472
# 0x1232235 git.admin.com/uvms/pjsip/pjsua2.SwigcptrAccount.Create__SWIG_1+0x55 /data/runner_builds/go/pkg/mod/git.admin.com/uvms/pjsip@v0.0.0-20230605065620-c834a241644f/pjsua2/pjsua2.go:25093
# 0x123233b git.admin.com/uvms/pjsip/pjsua2.SwigcptrAccount.Create+0xbb /data/runner_builds/go/pkg/mod/git.admin.com/uvms/pjsip@v0.0.0-20230605065620-c834a241644f/pjsua2/pjsua2.go:25099
# 0x12a6716 git.admin.com/uvms/pjsip/class.InitAccount+0x436 /data/runner_builds/go/pkg/mod/git.admin.com/uvms/pjsip@v0.0.0-20230605065620-c834a241644f/class/account.go:71
# 0x12ba575 git.admin.com/uvms/consumer-vms/service/sip.(*Sip).call+0x2f5 /data/runner_builds/eQMxWkS7/0/uvms/consumer-vms/service/sip/sip.go:107
# 0x12ba172 git.admin.com/uvms/consumer-vms/service/sip.(*Sip).handle+0x32 /data/runner_builds/eQMxWkS7/0/uvms/consumer-vms/service/sip/sip.go:95
# 0xed730d git.admin.com/uvms/utils/queue.(*Queue).start+0x1cd /data/runner_builds/go/pkg/mod/git.admin.com/uvms/utils@v1.0.2/queue/queue.go:102
# 0xed70c7 git.admin.com/uvms/utils/queue.(*Queue).Start.func1+0x107 /data/runner_builds/go/pkg/mod/git.admin.com/uvms/utils@v1.0.2/queue/queue.go:70
Yes, something like that. But deadlocks typically involve two or more threads. Example: https://github.com/pjsip/pjproject/pull/3492
Ok, I am going to modify the log level configuration, and publish it online, waiting for the next deadlock to occur
Thanks a lot @sauwming, marsking00.log marsking04.log marsking03.log marsking02.log marsking01.log Although there was no blocking phenomenon today, there was a crash situation. I desensitized the online log and submitted it;
BTW: Based on my personal understanding of the business, I always encounter problems when the client actively hangs up. I hope there are any suggestions or ways to properly handle this scenario
There are 100+ MBs in the logs. Where exactly did the crash occur?
file marsking04.log and log [2023-06-22T16:25:36.251+0800 ERROR class/logger.go:56 {"": "16:25:36.251 media.cpp ev_thread pjsua_conf_disconnect(id, si nk.id) error: Invalid value or argument (PJ_EINVAL) (status=70004) [../src/pjsua2/media.cpp:235]"}]
I found him because of playermedia's SWIG not catch thrown error(in pjsua2_wrap.cxx file)
@sauwming Thank you very much for your attention,I have an idea, whether it is a deadlock or a crash, it is because I have not handled the user-side hang-up behavior well, because I cannot handle both the audio playback completion scene and the user-side hang-up scene in OnEof2, so I Made some transformations; through the golang channel, the logic in the callback event is extracted and processed in other coroutines, so that both can be taken into account. I have released this transformation online, and it looks very good now. I will Keep tracking for a while to make sure it's working
Glad to hear that you manage to resolve the problem. Let us know if there's any issue that still needs to be addressed.
Describe the bug
I integrated pjsip as a sip client in my linux server to initiate a call to the sip server (outbound calls only, no receiving call scenarios). The testing and research process is very good, and it can work normally, but deadlocks occasionally appear. No longer accepting outbound tasks, but has been in a deadlock state;
Scenario description: According to the logs, every time the deadlock occurs when the client actively hangs up and the server is cleaning up tasks, I try to print a lot of debug logs to locate the specific behaviors that may cause the deadlock, but even so , the deadlock still exists;
Any ideas or direction would be greatly appreciated
Steps to reproduce
and some code like that
PJSIP version
2.13
Context
Platform: Centos7.4 Config:config_site.h
build:./configure --enable-epoll swig:-c++ -go -cgo -intgosize 64 -outcurrentdir -I/usr/local/include /data/pjproject-2.13/pjsip-apps/src/swig/pjsua2.i golang : 1.20
Based on historical issues, I have the following guesses:
Verification process: Due to the very strict requirements for the callback on the server side and the active hangup on the client side, it only appears in online business for the time being, and it is not easy to reproduce locally.