Closed achaloyan closed 9 years ago
FYI this is with UniMRCP r1401.
Reported by cmrienzo
on 2010-03-04 16:11:32
I noticed the app is much more stable when I disabled RTCP. It has processed 1600
calls and 986 TTS requests with no crashes.
This reminds me of the first crash. I noticed some mpf_timer activity going on after
a session was terminated. Perhaps something didn't get cleaned up completely?
Reported by cmrienzo
on 2010-03-04 19:44:20
Chris, thanks for the detailed analyzes.
I've not got a chance to look into this closer yet, but just wanted to suggest the
same to disable RTCP. If it helps, then either the problem is in processing of RTCP
messages or in timers. May I suggest to enable RTCP back, but disable RTCP receiver
by setting rx-resolution to 0
<param name="rtcp-rx-resolution" value="0"/>
If the problem comes back, then the issue is in timers.
I'm about to finalize my changes in trunk soon. Then, I'm planning to perform several
stress tests and will try with FS either.
Reported by achaloyan
on 2010-03-04 20:25:13
So, it just crashed again without the RTCP. I guess I try again with valgrind running.
Reported by cmrienzo
on 2010-03-04 20:46:51
Then, the problem could be in the same timeouts discussed in Issue-72.
I've noticed your comment on Jira
> 1. The callback is not expected as the MRCP session was destroyed
If callback is being called, it means session isn't destroyed yet internally in
client stack.
Reported by achaloyan
on 2010-03-04 20:54:56
I found something fishy on the FS side... looks like a call terminates without
cleaning up the session. I believe this is the cause of the problem since my
stream->obj was allocated off of the call's pool.
Reported by cmrienzo
on 2010-03-05 13:04:25
The same with me. I've managed to reproduce it 2 times today. I made about 4000 calls
with SIPp -> FreeSWITCH -> UniMRCPServer and got the same backtrace. Analyzing logs,
I found a few suspicious things. See below filtered out, TTS-9973 related part.
4098010 2010-03-05 16:47:48.144169 [DEBUG] mod_unimrcp.c:622 (TTS-9973) audio queue
created
4098011 2010-03-05 16:47:48.144169 [NOTICE] mrcp_client.c:549 Create MRCP Handle
0x84a82b0 [unimrcpserver-mrcp1]
4098012 2010-03-05 16:47:48.144169 [INFO] mrcp_client_session.c:142 Create Channel
0x84a82b0 <new>
4098015 2010-03-05 16:47:48.144169 [INFO] mrcp_client_session.c:398 Receive App
Request 0x84a82b0 <new> [2]
4098017 2010-03-05 16:47:48.144169 [INFO] mrcp_client.c:901 Add MRCP Handle 0x84a82b0
4098018 2010-03-05 16:47:48.144169 [DEBUG] mrcp_client_session.c:1203 Dispatch
Application Request 0x84a82b0 <new> [2]
4098019 2010-03-05 16:47:48.144169 [NOTICE] mrcp_client_session.c:718 Add Control
Channel 0x84a82b0 <new@speechsynth>
4098020 2010-03-05 16:47:48.144169 [DEBUG] mrcp_client_session.c:762 Add RTP
Termination 0x84a82b0 <new>
4098221 2010-03-05 16:47:48.184411 [DEBUG] mrcp_client_session.c:1039 On
Termination Add 0x84a82b0 <new>
4098222 2010-03-05 16:47:48.184411 [DEBUG] mrcp_client_session.c:1039 On
Termination Add 0x84a82b0 <new>
4098223 2010-03-05 16:47:48.184411 [INFO] mrcp_client_session.c:420 Send Offer
0x84a82b0 <new> [c:0 a:1 v:0]
4098364 2010-03-05 16:47:48.208263 [INFO] mrcp_client_session.c:158 Receive Answer
0x84a82b0 <0532c32ad6abe740> [c:0 a:1 v:0]
4098365 2010-03-05 16:47:48.208263 [DEBUG] mrcp_client_session.c:1160 Modify
Termination 0x84a82b0 <0532c32ad6abe740>
Pay attention to the delay here. You may see the same timed out request, but the big
question is what was the reason.
4115033 2010-03-05 16:47:58.152993 [ERR] mod_unimrcp.c:963 (TTS-9973) Timed out
waiting for channel to be ready
4116285 2010-03-05 16:47:58.328997 [DEBUG] mrcp_client_session.c:1043 On
Termination Modify 0x84a82b0 <0532c32ad6abe740>
The expected "On Termination Modify" was eventually received. I suspect the MPF
callback was blocked from mod_unimrcp context, at least I see that all the other
threads of client stack were busy processing other messages.
4116286 2010-03-05 16:47:58.328997 [INFO] mrcp_client_session.c:459 Raise App
Response 0x84a82b0 <0532c32ad6abe740> [2] SUCCESS [0]
4116287 2010-03-05 16:47:58.328997 [DEBUG] mod_unimrcp.c:1752 (TTS-9973)
SYNTHESIZER channel is ready, codec = LPCM, sample rate = 8000
4116288 2010-03-05 16:47:58.328997 [DEBUG] mod_unimrcp.c:1469 (TTS-9973) CLOSED ==>
READY
BTW, I have been doing a few other tests in parallel now and haven't seen this issue
on other products yet (not tried Asterisk though). So I suppose it's FS related.
Hope this helps a bit.
Reported by achaloyan
on 2010-03-05 13:39:51
I have a fix to the FS core that I'm going to try today. Found some cases where a FS
call can terminate without cleaning up its mod_unimrcp session.
Reported by cmrienzo
on 2010-03-05 14:05:56
I'm going to make a change to mod_unimrcp so that it does not timeout when destroying
the session. It will be better to let it stay stuck. At least I can attach to the
process then and see what is going on.
Plus, we are now seeing how bad things can get if I let freeswitch destroy its
session without cleaning up the MRCP session.
Reported by cmrienzo
on 2010-03-05 15:52:15
Agreed, as it's indeed a significant error condition.
Reported by achaloyan
on 2010-03-05 16:27:15
Hello Arsen,
Tony checked in a fix to the FS core to prevent my crash and I checked in a fix to
mod_unimrcp to no longer allow timeout when terminating the session.
Could you send me a copy of your test setup? SIPp script + command line args, FS
dialplan with TTS request, and UniMRCP server config. I'll try to reproduce in the
lab.
Reported by cmrienzo
on 2010-03-09 19:00:59
Hello Chris,
I'm glad the issue has been fixed and will try to re-test it on my end during the days.
At the moment, I have no access to the testbed to provide you with the exact setup.
Perhaps, there was nothing extraordinary. As far as I remember, I tried 5 calls per
sec with 3 sec call duration and typical FS TTS dialplan.
Reported by achaloyan
on 2010-03-09 20:36:03
Just to let you know that I have also implemented Chris' fix into app_unimrcp.
Reported by thirionjwf
on 2010-03-10 08:11:04
FYI,
I have made 50000 calls with FreeSWITCH trunk (r16966) today ... no more crashes! It
crashed on about 4000 calls before.
./sipp -r 5 -rp 1000 -d 2500
<extension name="unispeak">
<condition field="destination_number" expression="^7004$">
<action application="answer"/>
<action application="set" data="tts_engine=unimrcp"/>
<action application="speak" data="Hello world from FreeSWITCH"/>
</condition>
</extension>
The same is true for Asterisk. I have tried the latest app_unimrcp (r1579) and
reached the same 50000 calls without any noticeable issues using MRCPSynth.
I'll try to do more tests and document setup when possible.
Reported by achaloyan
on 2010-03-11 14:17:09
Chris, I have added you as a committer to the project.
http://code.google.com/p/unimrcp/people/list
So you have privileged access to svn, wiki, tracker, etc. This would allow us to
collaborate more efficiency, I guess.
Glad to work with you!
Arsen
Reported by achaloyan
on 2010-03-11 14:26:36
Thanks for the vote of confidence, Arsen. :)
I am currently ramping up the load on my production server and I haven't seen any
issues since the last round of fixes, either. Once FreeSWITCH finishes their 1.0.5
release, I'll resume work on getting mod_unimrcp up to date with the latest rev of
UniMRCP.
Reported by cmrienzo
on 2010-03-12 02:01:39
Well and thanks for the update, Chris.
So I consider the issue has been fixed.
Reported by achaloyan
on 2010-03-12 12:39:01
Fixed
Reported by thirionjwf
on 2010-12-13 12:29:10
Verified
Originally reported on Google Code with ID 75
Reported by
cmrienzo
on 2010-03-04 16:04:10