unispeech / asterisk-unimrcp

UniMRCP modules for Asterisk
http://www.unimrcp.org/asterisk
GNU General Public License v2.0
48 stars 35 forks source link

deadlock on latest release #17

Closed scgm11 closed 6 years ago

scgm11 commented 6 years ago

Im experiencing a deadlock with the latest release 1.5

Channel Context Extension Prio State Application Data CallerID Duration Accountcode PeerAccount BridgeID SIP/QUALITY1-0000005 functions FUNC_DIGITE_O_DI 5 Up MRCPRecog "/usr/local/unimrcp/data/ 912732278 07:26:42 QUALITY1

attached traces I will compile it with better debug options and add them

core-asterisk-running-2017-09-07T16-59-48+0200-brief.txt core-asterisk-running-2017-09-07T16-59-48+0200-full.txt core-asterisk-running-2017-09-07T16-59-48+0200-locks.txt core-asterisk-running-2017-09-07T16-59-48+0200-thread1.txt

scgm11 commented 6 years ago

Seems to there is no deadlock but anyway the channel is stuck, seems to be when a grammar cant be loaded

ERROR[32387][C-00000005]: app_mrcprecog.c:1203 app_recog_exec: (ASR-13) Unable to load grammar

then the channel get stuck

core-asterisk-running-2017-09-07T19-51-35+0200-brief.txt core-asterisk-running-2017-09-07T19-51-35+0200-full.txt core-asterisk-running-2017-09-07T19-51-35+0200-locks.txt core-asterisk-running-2017-09-07T19-51-35+0200-thread1.txt

achaloyan commented 6 years ago

Could you provide Asterisk/UniMRCP logs with debug log level enabled. I assume you have built everything from source, right. Just in case, what version of Asterisk do you use?

scgm11 commented 6 years ago

it's not really a deadlock in the log attached is clear, the channels is waiting for a timeout maybe some issue on the backend, anyway I think should clear the channel earlier because even asterisk can't hangup the call neither manually using hangup request (channel. I don't see any lock hold as I show on the log.

Maybe is an engine issue but didn't happen on the previous version or at least not as usual to be aware of. ast 13.17

log.txt

achaloyan commented 6 years ago

I doubt this issue is related to the latest release. Actually, there is no response to session termination request. The following log statement is a very bad indication, anyway.

speech_channel.c:370 speech_channel_destroy: (ASR-57) MRCP session has not terminated after 30000 ms

If you enable UniMRCP client logs (set log-level to DEBUG in mrcp.conf), I'll tell you what is going on precisely. Most likely there is no response from the MRCP server you use.

scgm11 commented 6 years ago

Ill try to reproduce it again I haven't been able yet, Im closing the issue for the time being I will reopen if needed.

scgm11 commented 6 years ago

Im seeing this in a loaded environment that is hard to get all the debug, anyway why cant I hangup request the call? is there no global timeout that can be added so in case not response hangup the channel?

achaloyan commented 6 years ago

If you set the request-timeout in mrcp.conf, for example, to 5000 msec, then any request with no response received within the specified timeout will be cancelled. As a result, the channel will ultimately hang up.

scgm11 commented 6 years ago

channels are not releasing even with the request-timeout feature, I will try to get more log

scgm11 commented 6 years ago

attached full log of a call that the channel get stuck, even with the timeout params

I think there is other case that hangs the channel indefinitly this is released about 10 minutes later

fulllog.txt logother.txt

[Sep 22 13:42:18] NOTICE[21049]: src/rtsp_client.c:524 : Push RTSP Request to In-Progress Queue 0x7f33700344a8 CSeq:1 [Sep 22 13:42:18] DEBUG[21049]: src/apt_poller_task.c:244 : Wait for Messages [uni4] timeout [5000] [Sep 22 13:42:23] NOTICE[21049]: src/rtsp_client.c:764 : Cancel RTSP Request 0x7f33700344a8 CSeq:1 [408] [Sep 22 13:42:23] NOTICE[21049]: src/rtsp_client.c:549 : Pop In-Progress RTSP Request 0x7f33700344a8 CSeq:1 [Sep 22 13:42:23] DEBUG[21049]: src/apt_task.c:263 : Signal Message to [MRCP Client] [0x7f3380003330;1;0] [Sep 22 13:42:23] DEBUG[21049]: src/apt_poller_task.c:249 : Wait for Messages [uni4] [Sep 22 13:42:23] DEBUG[21043]: src/apt_task.c:335 : Process Message [MRCP Client] [0x7f3380003330;1;0] [Sep 22 13:42:23] NOTICE[21043]: src/mrcp_client_session.c:149 : Receive Answer ASR-186 [c:0 a:0 v:0] Status 408 [Sep 22 13:42:23] NOTICE[21043]: src/mrcp_client_session.c:453 : Raise App Response ASR-186 [2] FAILURE [2] [Sep 22 13:42:23] DEBUG[21043]: app_mrcprecog.c:253 speech_on_channel_add: (ASR-186) speech_on_channel_add [Sep 22 13:42:23] ERROR[21043]: app_mrcprecog.c:294 speech_on_channel_add: (ASR-186) Channel error status=2, response code=408! [Sep 22 13:42:23] DEBUG[21043]: speech_channel.c:109 speech_channel_set_state_unlocked: (ASR-186) CLOSED ==> ERROR [Sep 22 13:42:23] DEBUG[21630][C-00000049]: speech_channel.c:507 speech_channel_open: (ASR-186) Terminating MRCP session [Sep 22 13:42:23] DEBUG[21043]: src/apt_consumer_task.c:135 : Wait for Messages [MRCP Client] [Sep 22 13:42:23] DEBUG[21630][C-00000049]: src/apt_task.c:263 : Signal Message to [MRCP Client] [0x7f344c01de80;4;0]

achaloyan commented 6 years ago

While I could not find anything obviously wrong in the attached txt files, it is interesting what happened after the last log statement that you dumped in the comments.

[Sep 22 13:42:23] DEBUG[21630][C-00000049]: src/apt_task.c:263 : Signal Message to [MRCP Client] [0x7f344c01de80;4;0]

For every Signal Message, there should be a Process Message. If you have the remaining logs, look for 0x7f344c01de80;4;0 if you can see it there.

You may also to track thread identifiers. Process Message should have been called from [21043], and in your logother.txt I can see that this thread continues to process messages, but there is a gap from Sep 22 13:42:23 to Sep 22 13:48:32. Not sure what happened in the meantime.

Needless to say, it is also interesting what happened with thread [21630]. Do you see any logs from this thread afterwards.

scgm11 commented 6 years ago

I attach the logs

There is also a possibility that when this happens, the process of the Voice Engine is killed, but Im not sure yet. so it may be in the middle of something and killed the process... but why the timeout is never honored?

0x7f344.txt 21043.txt 21630.txt

achaloyan commented 6 years ago

Well, now I can track this session to the end, and there is nothing wrong in this flow, besides the fact that channel_opened returned error, which is OK. MRCPRecog did complete in all these cases. Why would you think that your problem is MRCP-related.

scgm11 commented 6 years ago

Because the channel gets stuck at MRCPRecog, and even when I see a hangup the channel is still stuck there... and cant even hangup request....

achaloyan commented 6 years ago

OK, then provide the identified of a stuck channel along with the logs.

In all the attempts I have seen so far, MRCPRecog() did exit. Here is the last statement which you may track

MRCPRecog() exiting