Closed achaloyan closed 9 years ago
Thanks for the issue report. If you experienced an issue, which is more or less
reproducibly, it makes sense to report it.
Running demo synth with your modifications, I've not got Sofia's status phrase
garbled, but I observed several issue, which should be separated.
1. Concurrency and delays caused by Sleep()
You should not use Sleep() in a thread serving multiple sessions, channels. This will
definitely not work the way you expected. However, if this helps identify problems
in
client stack, I never mind.
2. SPEAK-COMPLETE event and double mrcp_application_session_terminate() call.
Please note, mrcp_application_session_terminate() MUST be called only once,
afterward, as soon as session termination has been acknowledged by client stack,
session must be destroyed.
In fact, double mrcp_application_session_terminate() will be sent, if SPEAK-COMPLETE
event is received, and this definitely will cause memory related issues and crashes.
I believe you met this issue, as this is the only case, when sofia_session->mutex can
be NULL. Probably you should do nothing on SPEAK-COMPLETE in the modified scenario.
3. Active sessions and quit
Also you should not "quit" (destroy demo framework), while there are active sessions
in play. I guess this could be other problem you experienced with NULL pointer in
demo framework task.
4. Garbage in Sofia's status phrase string
I assume, this is the actual issue to reproduce. I wonder, if this will be still
reproducible to you, if you ensure point 2 and 3 never happens.
Also please indicate version of Sofia you use. I guess, it's 1.12.10
Reported by achaloyan
on 2009-06-15 08:03:13
Thanks for your answer. I thought I managed to cut and paste the whole log from the
dos box, but apparently this was not the case. Yes, it is sophia sip 1.12.10 (with
the "improved" tport.c)
ad (1) I had some doubts about the sleeps. I was going to write about this, but then
since you suggested yourself to do this (in "http://unimrcp.googlegroups.com/web/
demo_synth_application.c") I continued. So I decided to make short sleeps, and since
I knew that streaming the generated TTS speech would take considerably longer than
those two sleeps, I supposed it would not be too bad. (I must say I initially
thought that each session would have its own tasks/threads, and that multiple
sessions would not have trouble with each other sessions' sleeps, although I did not
dive too deep into the code and logging to figure this out. (I should add an
(optional) thread-id to the log function.)
The alternative solution would be to implement a "delayed send MRCP request, but
conditionally to the current state", but for the purpose of my test (test my plugin)
that was a little too much.
2. SPEAK-COMPLETE event. I'm not sure I fully understand the scope of what you are
saying. In my test I would not expect to see a SPEAK-COMPLETE event at all.
And additionally, I was hoping they were mutually exclusive: a SPEAK-COMPLETE should
not arrive (or at least be processed) if a stop response was received (and vice
versa).
In my case, the stop complete response should arrive well before the SPEAK-COMPLETE
event. Indeed, if I would start a lot of sessions, the sleeps might get the client
into trouble. In my log file I don't see a SPEAK-COMPLETE event ?
Note that there is a kind of race condition here. You cannot be sure what is first:
the stop from the client or the speak-complete event from the server.
I don't think I see this race condition in my log.
I don't understand the double mrcp_application_session_terminate() you see in the
log. In my logging the session pointers for different sessions (one behind the
finished/destroyed other) is the same, but the sessions are different.
I guess when I used your original pause/resume scenario I met this double
mrcp_application_session_terminate() issue, and experienced the crashes with the
null pointer (sofia->mutex). That's exactly why I commented out the
mrcp_application_channel_remove(session,channel);
I'm not sure what my preference is - as a developer i would want my program to
crash, so that I can improve it - but if i would run this in production i would
probably prefer that a problem in one session would not have an adverse effect on
the other sessions (or channels). It may turn out to be very difficult to implement
a proper state machine (client) in this multi-tasked, multi-session, multi-channel,
networked, time-constrained and real time environment. So i would prefer the "api"
to be kind of robust against some (hopefully most) cases - even if they are
developers faults.
(3) as mentioned before, it may be hard to make a proper state machine: one may have
to do some timed garbage collection on sessions/channels as you can never be sure
the server terminates a session. I would prefer to be able to stop and start the
client without a crash taking place... I know this error processing is where it
really becomes hard, and I realise you just supplied a client demo, not a finished
application.
I commented out the block of code after the resume state (so one delay less, no stop
message to be sent and only one mrcp_application_channel_remove(session,channel) -
after the speak complete event) and I still see the corruption. No crashed though.
It is occurs less frequently - though it's not so hard to be a succesful monkey.
Thanks for all your help. Perhaps more later this week...
Reported by garmt.noname
on 2009-06-17 21:24:24
Garmt,
I assume you are on vocation, but when you'll be back, please retest the same with
the latest release. I suspect this issue relates to Issue-29 and APR pools thread safety.
Reported by achaloyan
on 2009-07-23 05:02:41
Any updates? Is it reproducible with 0.7 release?
Reported by achaloyan
on 2009-09-09 18:32:03
Last week I tried to reproduce the problem (without upgrading either trunk or 0.7
release or APR). I failed to reproduce the problem (maybe I used release instead of
debug executables !?). I will give it another try soon.
Reported by garmt.noname
on 2009-09-10 11:45:24
Thanks for the update, Garmt.
Reported by achaloyan
on 2009-09-10 12:57:01
Summarizing the discussion below
http://groups.google.com/group/unimrcp/browse_thread/thread/8d756e7f15d37f98?hl=en
I admit, this issue indeed existed, but should be already fixed in the trunk.
Related changes are: r1155, r1170, r1135.
Reported by achaloyan
on 2009-10-12 17:01:02
Fixed
Reported by achaloyan
on 2010-04-27 12:02:25
Verified
Originally reported on Google Code with ID 33
Reported by
garmt.noname
on 2009-06-13 21:09:25