unispeech / unimrcp

Open source cross-platform implementation of MRCP protocol
http://www.unimrcp.org
Apache License 2.0
364 stars 161 forks source link

Memory corruption in mpf code #29

Closed achaloyan closed 9 years ago

achaloyan commented 9 years ago

Originally reported on Google Code with ID 29

What steps will reproduce the problem?

I'm using libunimrcpclient linked to my application. I use it as a
MRCPv2-only client connected to a Loquendo Speech Server 7.x. I'm doing
stress tests in my application. These stress tests simply consist in
issuing a bunch of TTS requests and a bunch of ASR requests. I have only
one application registered with unimrcp requesting ASR and TTS resources to
do the corresponding requests. Each TTS request is done in its own session.
Each ASR request is done in its own session. Each request is done in its
own MRCP connection.

Arsen, maybe the sample code you made that allowed you to reproduce the
error associated with Issue 27, can serve also to identify this one.

What version of the product are you using? On what operating system?

I use the latest (trunk) libunimrcpclient version compiled by me.

The OS is a Windows Server 2003 SP2 Enterprise Edition.

Please provide any additional information below.

NOTE: I detached this info from Issue 27, because this may be another issue
independent from the one reported at 27.

I attached the traces and also a visual studio 2005 screenshot to
illustrate the point of crash along with the stack contents for each call
from the mpf mm_timer thread context. If you need anything from my side to
help fixing this issue just ask. Thanks in advance.

Reported by cpsoares on 2009-06-01 15:46:04


achaloyan commented 9 years ago
cpsoares,
I know almost nothing about you, but I see you know what are doing and I like the way
you treat and report the issues.
I assume this should be another issue, as I easily reproduced the issue-27, clearly
saw how RTP payload overrode channel identifier and fixed that issue.
Unfortunately I've not managed to reproduce this one so far using the same framework.
However I'm testing against UniMRCP server, and I just guess the problem may occur
testing against Loquendo instead. Anyway, I see nothing extraordinary from the
message flow and logs.
Can you please test using only TTS or only ASR resource just to narrow down the issue?

Reported by achaloyan on 2009-06-01 16:38:23

achaloyan commented 9 years ago
  Hi Arsen,

  After an almost 5 hour "run", I could reproduce the problem using only ASR
requests. The crash is always on the same thread context, ie, mpf mm_timer thread,
internal to UNIMRCP.
  I send the VS2005 crash call stack screenshot along with the last trace file I got
from the MRCP side (if you need all the other trace files just ask, please). Near the
end you can see in the line that reads "5500 - Add Control Channel" the resource
identifier is garbled. Thank you for your time.

Reported by cpsoares on 2009-06-01 23:17:55


achaloyan commented 9 years ago
Hi cpsoares,
Seems I'm stuck with this issue at the moment. Today is the second day I have been
running the same stress/load test without "luck".
However I clearly see memory corruption happened on your end (vs2005 call stack and
channel identifier in logs).
For now I'd suggest to play a bit with MPF integration on your side.
At first, do nothing in recog_app_stream_read callback, frame->type will remain
MEDIA_FRAME_TYPE_NONE, and it will prevent sending any RTP packets to Loquendo, but
hopefully will help narrow down the issue.
Next, if it never crashes, try to simply fill codec buffer with 0
frame->type |= MEDIA_FRAME_TYPE_AUDIO;
memset(frame->codec_frame.buffer,0,frame->codec_frame.size);
and run the same test again
Please let me know about the results.

Reported by achaloyan on 2009-06-03 14:13:51

achaloyan commented 9 years ago
  Hi Arsen,

  Before I try with the modifications you suggested, I broke my application in two
parts to simplify and used only the media part (where the MRCP connector is standing)
to the test. What I do is make several TTS and ASR mixed requests. I put the
application running into purify. It stays running for some minutes, then it crashes
with an IPR (Invalid Pointer Read) and then with an unhandled exception from that IPR.
  The detailed description from purify for te IPR is in attachment and also are the
traces. In the traces, at the end of the "run" (timestamp 16:18:40.296) the "3852 -
Add Control Channel <000073A04A27F399Xspeechsynth> 0BA71508I <-> 0BA71470I [1]" line
has also the resource ID garbled.
  Thanks for your time. Meanwhile I'll try to reproduce the crash in the conditions
you suggest in your last message.

  Regards.

Reported by cpsoares on 2009-06-04 16:43:50


achaloyan commented 9 years ago
  Arsen,

  As per your suggestion, I tried doing nothing in recog_app_stream_read callback.
The result was I could not crash the application. As for the other scenario feeding
audio filled with zeros as samples, the application crashed after some time. So this
clearly points to some issue related to the management of audio buffers in UniMRCP
that is causing mem overruns and thus corruption. I think situations like this are
very difficult to spot event in purify (where you only get a IPR), the situation gets
worse when using apr and memory pools, because the memory is allocated in chunks, if
you overrun one buffer and still you are inside the mem chunk, purify will not complain.

  Regards.

Reported by cpsoares on 2009-06-05 13:24:02

achaloyan commented 9 years ago
Thanks for the detailed and complete information you continuously provide.

Yes, it's clearly something wrong with audio buffers. Tools such as valgring and
purify are not so helpful in case of memory pool allocation.
From other hand, if application crashes earlier running under purify, it smells like
typical race condition issue. I'll try to access purify soon to see if it helps
replicate the problem on my side. Today I've ran more than 1 million ASR and TTS
sessions and this is not the way to go.

Reported by achaloyan on 2009-06-05 14:50:43

achaloyan commented 9 years ago
  Arsen,

  Do you know an easy way to substitute de pool allocs by malloc of the exact size
as
requested? I would not bother for now about the frees as memory can grow a few MEGS
for this test before we can spot where the problem resides. If we had that I could
put the program easily running on purify to try to spot first the invalid memory
write. This could lead us to very important and useful clues about this problem.

  Regards.

Reported by cpsoares on 2009-06-08 14:29:13

achaloyan commented 9 years ago
cpsoares,
I ran the same stress test under purify and got yet another issue, which caused
client stack to crash. I've just fixed it r967. However I assume this is not the
issue you observed and I'm still trying to get it on my side.
Anyway, I'm planning to make a few changes concerning memory pools and allocators.
Currently there is no way to specify memory allocation policy in unimrcp.
More to come...

I use Win XP 32bit, I guess you use Win 2003 32bit. Please confirm it's not 64bit OS.

Reported by achaloyan on 2009-06-08 16:40:10

achaloyan commented 9 years ago
Yes 32 bit OS.

Reported by cpsoares on 2009-06-08 17:18:38

achaloyan commented 9 years ago
I remembered other thing related to the scenario where we don't deliver any sound
frames to unimrcp - The problem may not be 100% related to streaming, as in this
particular scenario, the recognizer also responds with error messages which are
significantly smaller than the usual result messages with all interpretations and
confidences. So, the problem may be anywhere :-D.

Regards.

Reported by cpsoares on 2009-06-08 17:30:21

achaloyan commented 9 years ago
Seems I know where the problem is in. I cannot say I'm happy with what I found...
I changed a few things in the code and now client crashes in a minute under the same
stress test, which used to run for ages before, strange.
More to come...

Reported by achaloyan on 2009-06-09 13:48:18

achaloyan commented 9 years ago
Several issues, which may cause client stack to crash have been fixed recently in trunk.
For now I'd suggest you to upgrade to recent trunk (r984) and re-test.
Also try to play with #define OWN_ALLOCATOR_PER_POOL option introduced in apt_pool.c
Currently it's disabled (commented), but soon I'll enable it by default.
This helps purify and valgrind be more sensitive to memory allocation problems.
By default, APR pools use one common allocator, which actually hides many problems
and looks not so convenient to me.

Reported by achaloyan on 2009-06-15 07:32:50

achaloyan commented 9 years ago
Hi,

I'm going to test the trunk version now. I'll then report the results.

Thanks

Reported by cpsoares on 2009-06-22 17:18:40

achaloyan commented 9 years ago
  Hi,

  Still the memory corruption is there somehow. The sign is as soon as the traces
show a garbled resource identifier the program crashes a few seconds after.
  I will try to do some code review to understand the whole unimrcp (client side)
internals, but this will take me a huge time :-D.
  Next is a piece of the MRCP trace. I don't know if this helps but one common thing
about this memory overrun is that the only place when the garbled resource
identifiers appear is on "Add Control Channel" lines on all stress runs I did.
  Regards.

(---------------) 968 - Receive Signaling Task Message [0]
(---------------) 968 - Receive Answer 0x17fb8050 <new> [c:1 a:1 v:0]
(---------------) 968 - Modify Control Channel 0x17fb8050 <00004F844A41FEC8>
(---------------) 968 - Modify Termination 0x17fb8050 <00004F844A41FEC8>
(---------------) 968 - Receive Signaling Task Message [0]
(---------------) 968 - Receive Answer 0x46d2558 <new> [c:1 a:1 v:0]
(---------------) 968 - Modify Control Channel 0x46d2558 <0000680C4A41FEC8>
(---------------) 968 - Modify Termination 0x46d2558 <0000680C4A41FEC8>
(---------------) 2252 - Signal Signaling Task Message
(---------------) 1364 - Process Control Message
(---------------) 968 - Receive Signaling Task Message [0]
(---------------) 2252 - Receive SIP Event [nua_i_active] Status 200 Call active
(---------------) 968 - Receive Answer 0x472d550 <new> [c:1 a:1 v:0]
(---------------) 968 - Modify Control Channel 0x472d550 <00004AA04A41FEC8>
(---------------) 968 - Modify Termination 0x472d550 <00004AA04A41FEC8>
(---------------) 3120 - Process MPF Message
(---------------) 3120 - Create Audio Bridge
(---------------) 1364 - Established TCP/MRCPv2 Connection 10.2.200.100:1209 <->
10.10.0.101:51030
(---------------) 3120 - Open RTP Transmit 10.2.200.100:59320 -> 10.10.0.101:10046
(---------------) 1364 - Add Control Channel <00004F844A41FEC8@speechrecog>
10.2.200.100:1209 <-> 10.10.0.101:51030 [1]
(---------------) 1364 - Signal Connection Task Message
(---------------) 3120 - Process MPF Message
(---------------) 3120 - Create Audio Bridge
(---------------) 3120 - Open RTP Receive 10.2.200.100:59322 <- 10.10.0.101:10048
playout [50 ms]
(---------------) 3120 - Process MPF Message
(---------------) 3120 - Create Audio Bridge
(---------------) 3120 - Open RTP Receive 10.2.200.100:59324 <- 10.10.0.101:10050
playout [50 ms]
(---------------) 968 - Receive Media Task Message [1]
(---------------) 968 - On Termination Modify 0x17fb8050 <00004F844A41FEC8>
(---------------) 968 - Receive Connection Task Message [1]
(---------------) 968 - On Control Channel Modify 0x17fb8050 <00004F844A41FEC8>
(---------------) 968 - Raise App Response 0x17fb8050 <00004F844A41FEC8> [2] SUCCESS
[0]
(---------------) 968 - Receive Media Task Message [1]
(---------------) 968 - On Termination Modify 0x46d2558 <0000680C4A41FEC8>
(---------------) 968 - Receive Media Task Message [1]
(---------------) 968 - On Termination Modify 0x472d550 <00004AA04A41FEC8>
(---------------) 1364 - Established TCP/MRCPv2 Connection 10.2.200.100:1210 <->
10.10.0.101:51008
(---------------) 1364 - Add Control Channel <0000ð4A41@'C8spe>
10.2.200.100:1210 <-> 10.10.0.101:51008 [1]
(---------------) 1364 - Signal Connection Task Message
(---------------) 968 - Receive Connection Task Message [1]
(---------------) 968 - On Control Channel Modify 0x46d2558 <0000680C4A41FEC8>
(---------------) 968 - Raise App Response 0x46d2558 <0000680C4A41FEC8> [2] SUCCESS
[0]
(---------------) 1344 - Signal Application Task Message
(---------------) 968 - Receive Application Task Message [1]
(---------------) 968 - Receive App MRCP Request 0x17fb8050 <00004F844A41FEC8>
(---------------) 968 - Send MRCP Request 0x17fb8050 <00004F844A41FEC8@speechrecog>
[1]
(---------------) 1344 - Signal Application Task Message
(---------------) 968 - Receive Application Task Message [1]
(---------------) 968 - Receive App MRCP Request 0x46d2558 <0000680C4A41FEC8>
(---------------) 968 - Send MRCP Request 0x46d2558 <0000680C4A41FEC8@speechsynth>
[1]
(---------------) 1364 - Established TCP/MRCPv2 Connection 10.2.200.100:1211 <->
10.10.0.101:51035
(---------------) 1364 - Add Control Channel <00004AA04A41FEC8@speechsynth>
10.2.200.100:1211 <-> 10.10.0.101:51035 [1]
(---------------) 1364 - Signal Connection Task Message
(---------------) 1364 - Send MRCPv2 Stream 10.2.200.100:1209 <-> 10.10.0.101:51030
[196 bytes]
(---------------) MRCP/2.0 196 RECOGNIZE 1
(---------------) Channel-Identifier: 00004F844A41FEC8@speechrecog
(---------------) N-Best-List-Length: 2
(---------------) Speech-Language: pt-pt
(---------------) Content-Type: text/uri-list
(---------------) Content-Length: 22
(---------------) 
(---------------) builtin:grammar/digits
(---------------) 1364 - Send MRCPv2 Stream 10.2.200.100:1210 <-> 10.10.0.101:51008
[141 bytes]
(---------------) MRCP/2.0 141 SPEAK 1
(---------------) Channel-Identifier: 0000680C4A41FEC8@speechsynth
(---------------) Voice-Name: Amalia
(---------------) Content-Type: text/plain
(---------------) Content-Length: 2
(---------------) 
(---------------)  0
(---------------) 1364 - Process Control Message
(---------------) 968 - Receive Connection Task Message [1]
(---------------) 968 - On Control Channel Modify 0x472d550 <00004AA04A41FEC8>
(---------------) 968 - Raise App Response 0x472d550 <00004AA04A41FEC8> [2] SUCCESS
[0]
(---------------) 1344 - Signal Application Task Message
(---------------) 968 - Receive Application Task Message [1]
(---------------) 968 - Receive App MRCP Request 0x472d550 <00004AA04A41FEC8>
(---------------) 968 - Send MRCP Request 0x472d550 <00004AA04A41FEC8@speechsynth>
[1]
(---------------) 1364 - Process Control Message
(---------------) 1364 - Send MRCPv2 Stream 10.2.200.100:1211 <-> 10.10.0.101:51035
[141 bytes]
(---------------) MRCP/2.0 141 SPEAK 1
(---------------) Channel-Identifier: 00004AA04A41FEC8@speechsynth
(---------------) Voice-Name: Amalia
(---------------) Content-Type: text/plain
(---------------) Content-Length: 2
(---------------) 
(---------------)  0
(---------------) 1364 - Receive MRCPv2 Stream 10.2.200.100:1210 <->
10.10.0.101:51008 [83 bytes]
(---------------) MRCP/2.0 83 1 200 IN-PROGRESS
(---------------) Channel-Identifier: 0000680C4A41FEC8@speechsynth
(---------------) 
(---------------) 1364 - Parsed MRCP Message [83]
(---------------) 1364 - Failed to Find Channel <0000680C4A41FEC8@speechsynth> in
Connection 10.2.200.100:1210 <-> 10.10.0.101:51008 [1]
(---------------) 1364 - Receive MRCPv2 Stream 10.2.200.100:1211 <->
10.10.0.101:51035 [83 bytes]
(---------------) MRCP/2.0 83 1 200 IN-PROGRESS
(---------------) Channel-Identifier: 00004AA04A41FEC8@speechsynth
(---------------) 
(---------------) 1364 - Parsed MRCP Message [83]
(---------------) 1364 - Signal Connection Task Message
(---------------) 968 - Receive Connection Task Message [3]
(---------------) 968 - Raise App MRCP Response 0x472d550 <00004AA04A41FEC8>
(24 10:25:39.109) 1364 - Receive MRCPv2 Stream 10.2.200.100:1209 <->
10.10.0.101:51030 [83 bytes]
(---------------) MRCP/2.0 83 1 200 IN-PROGRESS
(---------------) Channel-Identifier: 00004F844A41FEC8@speechrecog
(---------------) 
(---------------) 1364 - Parsed MRCP Message [83]
(---------------) 1364 - Signal Connection Task Message
(---------------) 968 - Receive Connection Task Message [3]
(---------------) 968 - Raise App MRCP Response 0x17fb8050 <00004F844A41FEC8>
(24 10:25:39.125) 1364 - Receive MRCPv2 Stream 10.2.200.100:1176 <->
10.10.0.101:51066 [148 bytes]
(---------------) MRCP/2.0 148 START-OF-INPUT 1 IN-PROGRESS
(---------------) Channel-Identifier: 000060F04A41FEC1@speechrecog
(---------------) Proxy-Sync-Id: 04401E1C00010733
(---------------) Input-Type: speech
(---------------) 
(---------------) 1364 - Parsed MRCP Message [148]
(---------------) 1364 - Signal Connection Task Message
(---------------) 968 - Receive Connection Task Message [3]
(---------------) 968 - Raise App MRCP Event 0x4733808 <000060F04A41FEC1>
(24 10:25:39.156) 1364 - Receive MRCPv2 Stream 10.2.200.100:1204 <->
10.10.0.101:51039 [141 bytes]
(---------------) MRCP/2.0 141 SPEAK-COMPLETE 1 COMPLETE
(---------------) Channel-Identifier: 000057984A41FEC7@speechsynth
(---------------) Completion-Cause: 000 normal
(---------------) Speech-Marker: ""
(---------------) 
(---------------) 1364 - Parsed MRCP Message [141]
(---------------) 1364 - Signal Connection Task Message
(---------------) 968 - Receive Connection Task Message [3]
(---------------) 968 - Raise App MRCP Event 0x17f910b8 <000057984A41FEC7>
(---------------) 1344 - Signal Application Task Message
(---------------) 968 - Receive Application Task Message [0]
(---------------) 968 - Receive App Request 0x17f910b8 <000057984A41FEC7> [1]
(---------------) 968 - Dispatch Application Request 0x17f910b8 <000057984A41FEC7>
[1]
(---------------) 968 - Terminate Session 0x17f910b8 <000057984A41FEC7>
(---------------) 968 - Remove Control Channel 0x17f910b8 <000057984A41FEC7>
(---------------) 968 - Subtract Channel Termination 0x17f910b8 <000057984A41FEC7>
(---------------) 1364 - Process Control Message
(---------------) 1364 - Remove Control Channel <000057984A41FEC7@speechsynth> [0]
(---------------) 1364 - Close TCP/MRCPv2 Connection 10.2.200.100:1204 <->
10.10.0.101:51039
(---------------) 968 - Subtract Termination 0x17f910b8 <000057984A41FEC7>
(---------------) 1364 - Signal Connection Task Message
(---------------) 968 - Receive Connection Task Message [2]
(---------------) 968 - On Control Channel Remove 0x17f910b8 <000057984A41FEC7>
(24 10:25:39.203) 2252 - Receive SIP Event [nua_r_bye] Status 200 OK
(---------------) 2252 - Receive SIP Event [nua_i_state] Status 200 to BYE
(---------------) 2252 - SIP Call State 0x17f910b8 [terminated]
(---------------) 2252 - Signal Signaling Task Message
(---------------) 968 - Receive Signaling Task Message [1]
(---------------) 968 - Receive Terminate Response 0x17f910b8 <000057984A41FEC7>

Reported by cpsoares on 2009-06-24 11:06:28

achaloyan commented 9 years ago
  Hi Arsen,

  After a bit of desperation, the code is really difficult to follow with all the
function pointers and friends, I confess I'm a bit frustrated right now. As I said
in
the last message I discovered a pattern that is always present when the crash is
near. In one of the last stress runs I found a situation that didn't make the
application crash. As always the problem is in the same spot so maybe you can inspect
and see if it gives you any hints. Please look at this piece of trace, on each line
the first integer is the thread id so that the thread context can be easily
identified, please see also my comments:

--> Application request to create a resource (TTS resource):

(---------------) 1340 - Receive Application Task Message [0]
(---------------) 1340 - Receive App Request 0x1c86a110 <new> [2]
(---------------) 1340 - Add MRCP Handle 0x1c86a110
(---------------) 1340 - Dispatch Application Request 0x1c86a110 <new> [2]
(---------------) 1340 - Add Control Channel 0x1c86a110 <new@speechsynth>
(---------------) 1340 - Add RTP Termination 0x1c86a110 <new>
(---------------) 5788 - Process Control Message
(---------------) 5788 - Signal Connection Task Message
(---------------) 1340 - Receive Connection Task Message [0]
(---------------) 1340 - On Control Channel Add 0x1c86a110 <new>
(---------------) 5880 - Process MPF Message
(---------------) 5880 - Add Termination
(---------------) 5880 - Add Context
(---------------) 5880 - Process MPF Message
(---------------) 1340 - Receive Media Task Message [0]
(---------------) 1340 - On Termination Add 0x1c86a110 <new>
(---------------) 5880 - Add Termination
(---------------) 1340 - Receive Media Task Message [0]
(---------------) 1340 - On Termination Add 0x1c86a110 <new>

--> This will send the INVITE to the MRCPv2 server (below the SDP)

(---------------) 1340 - Send Offer 0x1c86a110 <new> [c:1 a:1 v:0]
(---------------) 1340 - Local SDP 0x1c86a110 <new>
(---------------) v=0
(---------------) o=UniMRCPClient 0 0 IN IP4 10.2.200.100
(---------------) s=-
(---------------) c=IN IP4 10.2.200.100
(---------------) t=0 0
(---------------) m=application 9 TCP/MRCPv2 1
(---------------) a=setup:active
(---------------) a=connection:new
(---------------) a=resource:speechsynth
(---------------) a=cmid:1
(---------------) m=audio 59282 RTP/AVP 0 8
(---------------) a=rtpmap:0 PCMU/8000
(---------------) a=rtpmap:8 PCMA/8000
(---------------) a=recvonly
(---------------) a=mid:1
(---------------) 5792 - Receive SIP Event [nua_i_state] Status 0 INVITE sent
(---------------) 5792 - SIP Call State 0x1c86a110 [calling]
(30 11:27:02.562) 5792 - Receive SIP Event [nua_r_invite] Status 200 OK
(---------------) 5792 - Receive SIP Event [nua_i_state] Status 200 OK
(---------------) 5792 - SIP Call State 0x1c86a110 [ready]

--> This is the MRCPv2 Server response SDP with resource identifier and MRCP port

(---------------) 5792 - Remote SDP 0x1c86a110 <new>
(---------------) v=0
(---------------) o=MRCPv2Server 3455349925 3455349925 IN IP4 10.10.0.101
(---------------) s=-
(---------------) c=IN IP4 10.10.0.101
(---------------) t=3455349925 0
(---------------) m=application 51058 TCP/MRCPv2 1
(---------------) a=channel:00006E144A49F625@speechsynth
(---------------) a=connection:new
(---------------) a=setup:passive
(---------------) a=cmid:1
(---------------) m=audio 10074 RTP/AVP 0
(---------------) a=rtpmap:0 PCMU/8000
(---------------) a=sendonly
(---------------) a=mid:1
(---------------) 5792 - Signal Signaling Task Message
(---------------) 5792 - Receive SIP Event [nua_i_active] Status 200 Call active
(---------------) 1340 - Receive Signaling Task Message [0]
(---------------) 1340 - Receive Answer 0x1c86a110 <new> [c:1 a:1 v:0]
(---------------) 1340 - Modify Control Channel 0x1c86a110 <00006E144A49F625>
(---------------) 1340 - Modify Termination 0x1c86a110 <00006E144A49F625>
(---------------) 5880 - Process MPF Message
(---------------) 5788 - Process Control Message
(---------------) 5880 - Create Audio Bridge
(---------------) 5880 - Open RTP Receive 10.2.200.100:59282 <- 10.10.0.101:10074
playout [50 ms]
(---------------) 1340 - Receive Media Task Message [1]
(---------------) 1340 - On Termination Modify 0x1c86a110 <00006E144A49F625>

--> The UniMRCP now tries to connect to the MRCP port

(---------------) 5788 - Established TCP/MRCPv2 Connection 10.2.200.100:3166 <->
10.10.0.101:51058

--> Now, please note, the error when it happens is *ALWAYS* on this line "Add Control
Channel" after a successful connection to the MRCP port, see that the resource
identifier is "10.10.0.101" instead of "00006E144A49F625@speechsynth", this will add
a wrong control channel id in the control channel list which will produce
side-effects at the end.

(---------------) 5788 - Add Control Channel <10.10.0.101> 10.2.200.100:3166 <->
10.10.0.101:51058 [1]
(---------------) 5788 - Signal Connection Task Message
(---------------) 1340 - Receive Connection Task Message [1]

--> The thread 1340 has the correct resource id, though!

(---------------) 1340 - On Control Channel Modify 0x1c86a110 <00006E144A49F625>
(---------------) 1340 - Raise App Response 0x1c86a110 <00006E144A49F625> [2] SUCCESS
[0]
(---------------) 3212 - Signal Application Task Message
(---------------) 1340 - Receive Application Task Message [1]
(---------------) 1340 - Receive App MRCP Request 0x1c86a110 <00006E144A49F625>
(---------------) 1340 - Send MRCP Request 0x1c86a110 <00006E144A49F625@speechsynth>
[1]

--> The same thread that had an incorrect resource id now can send the correct
resource id because it was the application that built the MRCP TTS SPEAK message.

(---------------) 5788 - Process Control Message
(---------------) 5788 - Send MRCPv2 Stream 10.2.200.100:3166 <-> 10.10.0.101:51058
[141 bytes]
(---------------) MRCP/2.0 141 SPEAK 1
(---------------) Channel-Identifier: 00006E144A49F625@speechsynth
(---------------) Voice-Name: Amalia
(---------------) Content-Type: text/plain
(---------------) Content-Length: 2
(---------------) 
(---------------)  0

--> MRCPv2 Server responds with an IN-PROGRESS...

(---------------) 5788 - Receive MRCPv2 Stream 10.2.200.100:3166 <->
10.10.0.101:51058 [83 bytes]
(---------------) MRCP/2.0 83 1 200 IN-PROGRESS
(---------------) Channel-Identifier: 00006E144A49F625@speechsynth
(---------------) 
(---------------) 5788 - Parsed MRCP Message [83]

--> And here is the result: the resource id in the message cannot be correctly
identified because it was previously inserted on the list as "10.10.0.101".

(---------------) 5788 - Failed to Find Channel <00006E144A49F625@speechsynth> in
Connection 10.2.200.100:3166 <-> 10.10.0.101:51058 [1]

  Thanks in advance and best regards.

Reported by cpsoares on 2009-06-30 17:01:52

achaloyan commented 9 years ago
Hi cpsoares,

I was quite busy during the days, anyway thanks for the update and your analyzes.
Hopefully there are only a few questions left.

1. Have you tried to enable OWN_ALLOCATOR_PER_POOL. Does it make any difference to
you?
2. What version of APR do you use?
3. What if I provide small patch against the APR you use?

Reported by achaloyan on 2009-07-01 09:44:53

achaloyan commented 9 years ago
  Hi Arsen,

  1. Yes I have the macro enabled and apparently doesn't make any difference at all.
  2. I use the APR built directly from the UniMRCP libs trunk. In fact I built APR
DLL two months ago and since then I use it as is. The version is 1.3.4.0 for
libaprutil-1.dll, 1.3.3.0 for libapr-1.dll and 1.2.1.0 for libapriconv-1.dll.
  3. No problem at all, I can rebuid the apr with that patch.

  Thanks a lot.

Reported by cpsoares on 2009-07-01 10:47:35

achaloyan commented 9 years ago
Well, here it goes.

1. Apply attached apr_pools.h and apr_pools.c over your version (apr-1.3.3)
apr/include/apr_pools.h
apr/memory/unix/apr_pools.c
2. Rebuild APR
3. Make sure you have new apr.dll in the path (copy it manually or just build
prepare.vcproj again)
4. Apply patch apt_pool.c.diff to apt_pool.c
5. Rebuild UniMRCP
6. Rebuild your application
7. Finally I hope to get positive feedback at this step.

Reported by achaloyan on 2009-07-01 11:33:00


achaloyan commented 9 years ago
  Arsen,

  Just an update on this. Running now my application with the patched source as you
suggested. The application is up for 1 day and 22 hours with constant stress testing
(ASR and TTS requests) without a single issue. I have roughly since then, 800000 MRCP
sessions and the same ammount of calls. My milestone is to run 1000000 MRCP sessions
and same ammount of calls and I'll proclaim this solved for good :-). I'll keep in
touch. BTW, if this proves to solve the issue what will be the official fix? Will it
include a patch in APR? I'm not worried as I can live perfectly with a patch on APR!!

  Thanks and regards.

Reported by cpsoares on 2009-07-03 11:09:42

achaloyan commented 9 years ago
Thanks for the update and good news.
Actually this is a patch we should live with regardless it fixes this issue or not.

Probably this is the biggest mistake I did in the project. I've never thought that
APR memory allocation isn't thread safe. APR allocators are protected by mutex, but
still apr_palloc is NOT thread safe. If I introduce wrapper around apr_palloc, it
doesn't help, because apr_palloc is used internally in APR routines too. Actually I
have no other option but to make APR pools thread safe themselves.
More over, channel identifier was corrupted as a result of typical race, where I fork
the request from application to MPF and MRCPv2 connection tasks and both use the same
pool to allocate memory from. I'm thinking also about subpools. Most probably I'll
create a subpool for MPF context, which should help avoid addition, non-desirable
memory pool locks (performance). 
Anyway memory pool must be thread safe to me. More over, there are a number of
patches available for Sofia-SIP out there. Most probably, I'll end up with dependency
pack for unimrcp, which will include fixes and additions over the released versions
of libraries unimrcp uses.

Reported by achaloyan on 2009-07-03 12:10:04

achaloyan commented 9 years ago
Hi cpsoares,

About your test : can you confirm to us you run a MRCPv2 test with SIP over UDP and

MRCP over TCP (a TCP socket is created for each session) ?

Thanks

Reported by amasse.atwork on 2009-07-03 14:21:59

achaloyan commented 9 years ago
  Hi amasse,

  Yes, SIP over UDP and MRCP over TCP with no connection sharing (unimrcp have issues
when MRCP connection sharing is enabled).

Regards.

Reported by cpsoares on 2009-07-03 14:27:38

achaloyan commented 9 years ago
cpsoares,
What kind of issues have you experienced concerning shared connections?
The only issue I'm aware of is: mem pool of shared connection is cleared only when
there are no more sessions/channels left in that connection.

Reported by achaloyan on 2009-07-03 14:35:43

achaloyan commented 9 years ago
  Hi Arsen,

  When I first tried this project (a couple of months ago) to do my integration I
used a default xml that had shared connections configured. The result was that, due
to race conditions I suppose, some connections were disconnected with pending
sessions (typically sessions that were about to be created). This resulted in some
sessions not to convey properly the MRCP commands to the MRCP server and stalled. I
inclusively referenced this problem in Issue 28 (I think) without however detailing
it because I discovered a workaround by modifying the xml so that it disabled the
MRCP connection sharing which in this case solved completely the problem.

  Now, what I can do is, after I reach the milestone I talked about some messages
ago, I can try to do the same stress tests with MRCP connection sharing enebled and
post here some traces to illustrate the problem (if it yet exists at all, because it
was tried some time ago, in some older revision than the one in trunk right now).

  Regards.

Reported by cpsoares on 2009-07-03 15:03:20

achaloyan commented 9 years ago
Well, cpsoares.
Please retest it later and if the issue you observed still exists, report it as a new
one, as most probably, it is not related to the memory corruption. 
Meantime I guess, issue-28 can be related to issue-29 and it makes sense to retest
it
with the patched APR.
Thanks.

Reported by achaloyan on 2009-07-03 15:56:50

achaloyan commented 9 years ago
Hi Arsen/cpsoares.

I believe the issue on connection sharing has been also resolved by all mofications

done in the issue-7.

Can you remember me how to enable/disable the no connection sharing ?

Note : Arsen, I'm going to create a new subject on the "mail list" about the infinite

increasing memory (APR pool) when connection sharing !!!

Regards.

Reported by amasse.atwork on 2009-07-06 08:24:21

achaloyan commented 9 years ago
  Hi Amasse,

  You can disable MRCP connnection sharing by setting the "offer-new-connection"
parameter to "1" (the value of "0" means share existing connections or create a new
one in case it doesn't exist any connection) on the "connection->agent" context of
the xml configuration file you provide to the unimrcp suite. Here is my xml as an
example.

  Regards.

<!-- UniMRCP client document -->
<unimrcpclient>
  <!-- Client settings (signaling, media, ...) -->
  <settings>
    <!-- SIP, RTSP signaling agents -->
    <signaling>
      <!-- SofiaSIP MRCPv2 signaling agent -->
      <agent name="SIP-Agent-1" class="SofiaSIP">
        <param name="client-ip" value="10.2.200.100"/>
        <!-- <param name="client-ext-ip" value="auto"/> -->
        <param name="client-port" value="13565"/>
        <param name="server-ip" value="10.10.0.101"/>
        <param name="server-port" value="5060"/>
        <!-- <param name="force-destination" value="1"/> -->
        <!-- <param name="sip-transport" value="tcp"/> -->
        <param name="sip-transport" value="udp"/>
        <param name="ua-name" value="UniMRCP SofiaSIP"/>
        <param name="sdp-origin" value="UniMRCPClient"/>
      </agent>
    </signaling>
    <!-- MRCPv2 connection agents -->
    <connection>
      <agent name="MRCPv2-Agent-1">
        <param name="max-connection-count" value="94"/>
        <param name="offer-new-connection" value="1"/>
      </agent>
    </connection>
    <!-- Media engines -->
    <media>
      <engine name="Media-Engine-1"/>
      <rtp name="RTP-Factory-1">
        <param name="rtp-ip" value="auto"/>
        <!-- <param name="rtp-ext-ip" value="auto"/> -->
        <param name="rtp-port-min" value="15000"/>
        <param name="rtp-port-max" value="60000"/>
        <!-- <param name="playout-delay" value="50"/> -->
        <!-- <param name="max-playout-delay" value="200"/> -->
        <!-- <param name="ptime" value="20"/> -->
        <param name="codecs" value="PCMU PCMA"/>
      </rtp>
    </media>
  </settings>
  <!-- Client profiles -->
  <profiles>
    <!-- MRCPv2 default profile -->
    <profile name="MRCPv2-Default">
      <param name="signaling-agent" value="SIP-Agent-1"/>
      <param name="connection-agent" value="MRCPv2-Agent-1"/>
      <param name="media-engine" value="Media-Engine-1"/>
      <param name="rtp-factory" value="RTP-Factory-1"/>
    </profile>
    <!-- more profiles might be added here -->
  </profiles>
</unimrcpclient>

Reported by cpsoares on 2009-07-06 10:24:15

achaloyan commented 9 years ago
Hi,

Anthony, you can force new connection from server config too (unimrcpserver.xml).
<param name="force-new-connection" value="1"/>
In this case unimrcpserver will respond with "new", even if client offers "existing".
Looking forward for your post regarding memory allocation and connection sharing.

Reported by achaloyan on 2009-07-06 11:34:10

achaloyan commented 9 years ago
  Hi Arsen,

  My application made more than 2 million MRCP sessions without a glitch, so I think
this issue is now completely solved. Thanks a lot.
  I'll try to provide you an example for issue 28 at the end of the day.

  Regards.

Reported by cpsoares on 2009-07-07 10:43:46

achaloyan commented 9 years ago
Really good news! Thanks.

Reported by achaloyan on 2009-07-07 13:20:15

achaloyan commented 9 years ago
The patch, which fixes this issues is incorporated into dependency package available
from download area.
See the discussion below for more info
http://groups.google.com/group/unimrcp/browse_frm/thread/68a71c71679e3c24

Reported by achaloyan on 2009-07-23 04:57:44

achaloyan commented 9 years ago

Reported by achaloyan on 2009-08-25 17:43:04

achaloyan commented 9 years ago
Here is an updated patch for apr-1.4.6 as well as an SRPM for Fedora 16. Note that you
seem to have to build and install both the i686 and x86_64 versions to prevent a conflict.

Reported by JerseyChewi on 2012-05-04 11:27:25


achaloyan commented 9 years ago
Hello folks o/~

In trying to build the latest unimrcp on Ubuntu 12.04/amd64 I hit a #78 which references
this bug.

I brought this up on dev@apr.apache.org and this seems to be the conclusio so far:

> The patch adds a mutex to apr_pool_t and locks the mutex in calls to
> apr_palloc, apr_pool_clear and apr_pool_destroy.  That may make those
> individual calls threadsafe but it doesn't really make pool use as a
> whole threadsafe.  If one thread clears a pool every other thread using
> the pool is left with dangling references.  It looks like a lot of
> locking for no real gain.

http://mail-archives.apache.org/mod_mbox/apr-dev/201304.mbox/%3C878v4gjh3u.fsf%40ntlworld.com%3E

Reported by i.galic@brainsware.org on 2013-04-18 11:20:25

achaloyan commented 9 years ago
Hello Igor,

I noticed your post on the APR dev list yesterday and will try to respond there later
today. Thanks for bringing this up.

Reported by achaloyan on 2013-04-18 15:03:33