khaledcollo / rfc5766-turn-server

Automatically exported from code.google.com/p/rfc5766-turn-server
0 stars 0 forks source link

Turn Server Crashing on executing the Clone function - Performance issue #141

Closed GoogleCodeExporter closed 9 years ago

GoogleCodeExporter commented 9 years ago
What steps will reproduce the problem?
1. Run the TURN Server on Public IP
2. Running the PJSIP SIP Client with TURN Enabled with all configurations 
required.
3. Now make a call and calls are working good. All the media flowing through 
Turn Server perfectly.
4. After some days or sometime hours may be after 50 calls. Turn Server process 
gets cored. 

What is the expected output? What do you see instead?
Expected Output: Turn Server should not Crash.
We see a intermittent crash in Turn Server after some calls worked fine. This 
is a performance issue. 

What version of the product are you using? On what operating system?
Version of the Product: turnserver-3.2.4.3
OS Running is : Linux 64 bit
cat /proc/version:
Linux version 2.6.32-431.29.2.el6.x86_64 (mockbuild@c6b9.bsys.dev.centos.org) 
(gcc version 4.4.7 20120313 (Red Hat 4.4.7-4) (GCC) ) #1 SMP Tue Sep 9 21:36:05 
UTC 2014

Please provide any additional information below.
Core File could not be attached to this issue ticket as corefile is 60 MB. The 
gdb backtrace is below:
(gdb) bt
#0  __pthread_mutex_lock (mutex=0x91660d496b1d02d9) at pthread_mutex_lock.c:50
#1  0x0000000000415d75 in locking_function (mode=9, n=18, file=0x33a1b7215b 
"md_rand.c", line=384)
    at src/apps/relay/mainrelay.c:1918
#2  0x00000033a1ae36e6 in ssleay_rand_bytes (buf=0x7fb7c49be9d0 
"\250\352\233ķ\177", num=12, pseudo=0) at md_rand.c:384
#3  0x000000000042e72d in turn_random32_size (ar=0x7fb7c49be9d0, sz=3) at 
src/client/ns_turn_msg.c:56
#4  0x000000000043079b in stun_tid_generate (id=0x7fb7c49be9d0) at 
src/client/ns_turn_msg.c:875
#5  0x00000000004307c8 in stun_tid_generate_in_message_str (buf=0x7fb7bc01884c 
"", id=0x7fb7c49be9d0)
    at src/client/ns_turn_msg.c:882
#6  0x000000000042f49d in stun_init_command_str (message_type=23, 
buf=0x7fb7bc01884c "", len=0x7fb7c49beaa8)
    at src/client/ns_turn_msg.c:352
#7  0x000000000042f590 in stun_init_indication_str (method=7, 
buf=0x7fb7bc01884c "", len=0x7fb7c49beaa8)
    at src/client/ns_turn_msg.c:369
#8  0x000000000044342e in peer_input_handler (s=0x7fb7bc013ad0, event_type=2, 
in_buffer=0x7fb7c49beb60, arg=0x7fb7bc028850,
    can_resume=1) at src/server/ns_turn_server.c:4248
#9  0x000000000040ea2c in socket_input_worker (s=0x7fb7bc013ad0) at 
src/apps/relay/ns_ioalib_engine_impl.c:2737
#10 0x000000000040ec3c in socket_input_handler (fd=68, what=2, 
arg=0x7fb7bc013ad0)
    at src/apps/relay/ns_ioalib_engine_impl.c:2799
#11 0x00007fb7c589677c in event_process_active_single_queue 
(base=0x7fb7bc0008f0, flags=0) at event.c:1350
#12 event_process_active (base=0x7fb7bc0008f0, flags=0) at event.c:1420
#13 event_base_loop (base=0x7fb7bc0008f0, flags=0) at event.c:1621
#14 0x000000000041b0a6 in run_events (eb=0x7fb7bc0008f0) at 
src/apps/relay/netengine.c:1414
#15 0x000000000041b55c in run_general_relay_thread (arg=0x7fb7c49c0010) at 
src/apps/relay/netengine.c:1521
#16 0x000000339a6079d1 in start_thread (arg=0x7fb7c49bf700) at 
pthread_create.c:301
#17 0x000000339a2e886d in clone () at 
../sysdeps/unix/sysv/linux/x86_64/clone.S:115

Thanks and Regards
Varun

Original issue reported on code.google.com by varunps2...@gmail.com on 14 Nov 2014 at 3:55

GoogleCodeExporter commented 9 years ago
After 3.2.4.3, we fixed lots of crashes (especially in the version 3.2.4.5). 
Please updagre to the latest version and re-test it. I am pretty sure that the 
problem must go. Let me know the test results.

Original comment by mom040...@gmail.com on 14 Nov 2014 at 6:49

GoogleCodeExporter commented 9 years ago

Original comment by mom040...@gmail.com on 14 Nov 2014 at 6:59

GoogleCodeExporter commented 9 years ago
Thanks for the information. I have downloaded the latest version and installed 
it. Soon I will let you know about the same crash if it happens in coming weeks 
or so. I will keep updating weekly on it.

Thanks and Regards
Varun

Original comment by varunps2...@gmail.com on 14 Nov 2014 at 3:12

GoogleCodeExporter commented 9 years ago
Hi

I installed the latest version of turnserver after your recommendation. Now 
currently my turnserver version is 3.2.4.6. The core did not happen for around 
two weeks but then again I found the core. Please find the gdb backtrace of the 
core file:

#0  __pthread_mutex_lock (mutex=0x789467ca7f51b20b) at pthread_mutex_lock.c:50
#1  0x0000000000415d6d in locking_function (mode=9, n=18, file=0x33a1b7215b 
"md_rand.c", line=384)
    at src/apps/relay/mainrelay.c:1870
#2  0x00000033a1ae36e6 in ssleay_rand_bytes (buf=0x7f32ffdfd9d0 
"\250\332\337\377\062\177", num=12, pseudo=0)
    at md_rand.c:384
#3  0x000000000042e90a in turn_random32_size (ar=0x7f32ffdfd9d0, sz=3) at 
src/client/ns_turn_msg.c:56
#4  0x00000000004309c2 in stun_tid_generate (id=0x7f32ffdfd9d0) at 
src/client/ns_turn_msg.c:881
#5  0x00000000004309ef in stun_tid_generate_in_message_str (buf=0x7f32f0008c2c 
"", id=0x7f32ffdfd9d0)
    at src/client/ns_turn_msg.c:888
#6  0x000000000042f6b2 in stun_init_command_str (message_type=23, 
buf=0x7f32f0008c2c "", len=0x7f32ffdfdaa8)
    at src/client/ns_turn_msg.c:356
#7  0x000000000042f7a5 in stun_init_indication_str (method=7, 
buf=0x7f32f0008c2c "", len=0x7f32ffdfdaa8)
    at src/client/ns_turn_msg.c:373
#8  0x0000000000443894 in peer_input_handler (s=0x7f32f0008810, event_type=2, 
in_buffer=0x7f32ffdfdb60, arg=0x7f32f00037a0,
    can_resume=1) at src/server/ns_turn_server.c:4278
#9  0x000000000040ecad in socket_input_worker (s=0x7f32f0008810) at 
src/apps/relay/ns_ioalib_engine_impl.c:2790
#10 0x000000000040eebd in socket_input_handler (fd=67, what=2, 
arg=0x7f32f0008810)
    at src/apps/relay/ns_ioalib_engine_impl.c:2852
#11 0x00007f33063a077c in event_process_active_single_queue 
(base=0x7f32f00008f0, flags=0) at event.c:1350
#12 event_process_active (base=0x7f32f00008f0, flags=0) at event.c:1420
#13 event_base_loop (base=0x7f32f00008f0, flags=0) at event.c:1621
#14 0x000000000041b0e1 in run_events (eb=0x7f32f00008f0) at 
src/apps/relay/netengine.c:1427
#15 0x000000000041b58c in run_general_relay_thread (arg=0x7f32ffdff010) at 
src/apps/relay/netengine.c:1531
#16 0x000000339a6079d1 in start_thread (arg=0x7f32ffdfe700) at 
pthread_create.c:301
#17 0x000000339a2e886d in clone () at 
../sysdeps/unix/sysv/linux/x86_64/clone.S:115

Thanks and regards
Varun

Original comment by varunps2...@gmail.com on 29 Nov 2014 at 10:50

GoogleCodeExporter commented 9 years ago
It looks like OpenSSL-related. What OpenSSL version are you using ? Is that the 
same version that came with you OS or you installed it separately ?

I'll be looking into it.

Original comment by mom040...@gmail.com on 29 Nov 2014 at 6:08

GoogleCodeExporter commented 9 years ago
Could you please try to run this with OpenSSL 1.0.1 ? I see that they fixed the 
locks since the older versions (like 0.9.8). 

Original comment by mom040...@gmail.com on 29 Nov 2014 at 6:21

GoogleCodeExporter commented 9 years ago
Specifically, they fixed that line 384 in md_rand.c - they made that lock 
optional. 

Original comment by mom040...@gmail.com on 29 Nov 2014 at 6:23

GoogleCodeExporter commented 9 years ago
Please take the latest code directly from SVN, compile it and try it. I made 
some related fixes. Please let me know how it works for you.

Original comment by mom040...@gmail.com on 30 Nov 2014 at 3:33

GoogleCodeExporter commented 9 years ago
Sure Let me try your suggestions and will update you soon.

Thanks
Varun

Original comment by varunps2...@gmail.com on 30 Nov 2014 at 12:56

GoogleCodeExporter commented 9 years ago
One more update is that I checked the Openssl version of my CentOS Machine that 
is "OpenSSL 1.0.1e-fips 11 Feb 2013".

Original comment by varunps2...@gmail.com on 30 Nov 2014 at 1:41

GoogleCodeExporter commented 9 years ago
OK, I see that they changed the locking-related code between 1.0.1e and 1.0.1j. 
I am not sure that this is the problem, but you may want to try 1.0.1j. Also, 
take the very latest code from this project's SVN, and let me know the result.

I ran the very latest pjnath turn client with this TURN server, and I found 
nothing special in the pjnath-generated traffic that may affect the TURN 
server. I remember that formerly pjnath had some irregularities in their TURN 
protocol implementation, but I did not find anything wrong this time.

Original comment by mom040...@gmail.com on 30 Nov 2014 at 6:16

GoogleCodeExporter commented 9 years ago
Ok now I have upgraded OpenSSL to 1.0.1j and used the very latest code from 
this project's SVN. In last two days no core has been reported but I would like 
to wait more to say my final words as this core comes after some days. I will 
keep updating on this thread.
Thanks
Varun

Original comment by varunps2...@gmail.com on 3 Dec 2014 at 8:01

GoogleCodeExporter commented 9 years ago
Till now no crash has been observed. But let me observe this one more week.

Original comment by varunps2...@gmail.com on 8 Dec 2014 at 8:03

GoogleCodeExporter commented 9 years ago
Hi Think we can close this issue. No more this issue is observed. The issue is 
fixed by upgrading OpenSSL to 1.0.1j and using the very latest code from this 
project's SVN.

Thanks
Varun 

Original comment by varunps2...@gmail.com on 19 Dec 2014 at 4:47

GoogleCodeExporter commented 9 years ago
Varun,

thank you for reporting.

Regards,
Oleg

Original comment by mom040...@gmail.com on 19 Dec 2014 at 6:20