paulhsu / csipsimple

Automatically exported from code.google.com/p/csipsimple
1 stars 1 forks source link

Phone doesn't ring #1793

Closed GoogleCodeExporter closed 9 years ago

GoogleCodeExporter commented 9 years ago
What steps will reproduce the problem?
1. Keep phone idle for a while (half of an hour or may be more)
2. Make a call from any other SIP client to SCC
3. You hear ringing tone in caller's phone, but CSS is silent. When you heng up 
- CSS suddenly wakes up, makes short ringing sound and then indicates that it's 
missed the call. 

What is the expected output? What do you see instead?
Expected: CSS starts ringing when someone calls it.
Instead: after a while CSS quite often misses the first incoming call. Next 
calls arrive just fine.

What version of the product are you using? On what operating system?
r1647
Android 2.2

Please provide any additional information below.
Unfortunately it's not easily reproducible on demand. But it happens quite 
often and bothers me a lot (I can miss incoming calls). I've noticed that 
behavior a long ago. So it's not a problem just with the latest releases.

Making these tests:
1 To make it more representative I use two different SIP providers: FreeeSWITCH 
and remote VoIP service provider. In both cases it happens in similar way, 
meaning that there is nothing wrong with SIP providers, but rather it's the 
problem with CSS itself.

2. I watch registration requests sent by CSS to both providers. They are sent 
on a regular basis. Moreover, in FreeSWITCH log I can see KA packets sent by 
CSS regularly right before I make test calls to it. It leaves out a potential 
problem with NAT, closing its hole.

3. Next (as all followed) call is always rings the phone. It makes impression, 
that CSS comes into a sleep state after a while. The part, that sends KA and 
registration requests, still work fine, but the part, that processes the first 
incoming INVITE request, is in some kind of a sleep. It's just a wild guess, of 
course, but it may explain the logic that I see in such scenario.

Original issue reported on code.google.com by yok...@gmail.com on 26 Jun 2012 at 3:29

GoogleCodeExporter commented 9 years ago
Module that is responsible for processing SIP messages from SIP switch may take 
a nap. I've observed followed scenarios:
1. CSS doesn't react to incoming INVITE messages in real time. Then it may 
suddenly wake up and process the stale INVITE message and start ringing. It 
will ring indefinitely or until I cancel the obsolete incoming call. Then it 
sends 10 call terminating messages to SIP server (which doesn't send anything 
back in response, because there is no such call under its watch anymore).

2. From SIP server's log I see a proper processing of SIP message (including 
sending back "100 Trying", "180 Ringing"), but there is no any ringing tone or 
any other activity observed in the phone...

3. CSS doesn't receive incoming INVITE messages at all. All those incoming 
requests are just dropped and when it finally wakes up - it doesn't remember 
anything about those INVITE messages at all.

The most important thing to note is that in all cases CSS sends KA 2-bytes 
packets on a regular basis. It means:
1. It's TCP/UDP layer (receiving / sending UDP packets) is still working all 
the time.
2. There is no any problems with NAT hole. It's opened all the time (making 3rd 
case above looking weird).

Please look at the part, responsible for processing messages, and check why it 
can enter a sleep mode, while it should always be ready to process all incoming 
messages.

Original comment by yok...@gmail.com on 28 Jun 2012 at 8:22

GoogleCodeExporter commented 9 years ago
Hi, thanks for the detailed tests and scenarios. 

I will try to reproduce on my side.

Some side comments too, the fact that the app goes in suspend mode is not so 
bad. It means that battery management is done properly. Normally event on 
socket should wake up the application. Normally then as soon as I can I take a 
CPU lock the time the sip message is processed. 
There is some recent change in pjsip that flag socket with MSG_NOSIGNAL (it was 
made 10 days ago : https://trac.pjsip.org/repos/changeset/4170). I wonder if 
could not be the root cause. Did you observed the same behavior in version 
older than 10 days ago?
Normally it should affect only send process. But maybe it has deeper impact on 
android and tell the socket to not wake up the app...

Also some question : in your tests are you using TCP or UDP? Does the little 
notification icon of csipsimple appears in status bar (it has impact on how the 
app is considered by the system)? What is the version of android you are using 
(I heard that some 4.0 ROM are buggy with network TCP)? Also are you connected 
over wifi or 3g? 
3g radio module has some complex state machine (that also depends on carrier), 
so it could explain why I don't reproduce.
If over wifi, how do you ensure the wifi is kept active? Is it using csipsimple 
wake lock or using the android dedicated setting (in android network setting). 
Even if I have no doubt on the fact the app is still able to make outgoing 
connections, these settings could have impact on the radio state. 
For example, after a while the wifi and the 3g goes in a "low power" mode where 
latency becomes significantly bigger. It should not be big enough to be 
problematic in our case, but if ROM is buggy it could be.

For example android docs provide this state machine for AT&T 3G network : 
http://developer.android.com/images/efficient-downloads/mobile_radio_state_machi
ne.png 
and they tell that values can vary depending on mobile technology and carrier 
settings. AFAIK, the wifi state machine is similar. It can go faster to a PSP 
mode too if screen goes of on some devices.

Original comment by r3gis...@gmail.com on 29 Jun 2012 at 8:18

GoogleCodeExporter commented 9 years ago
Answering your questions:
> Did you observed the same behavior in version older than 10 days ago?
Yes, I did. May be some details were changed lately (I did not catch the 
difference). But I certainly missed some first calls coming to CSS. It woke up, 
rung for a bit of a sec and then indicated, that the call was missing. It was 
frustrating experience all the times (before and after the latest change in 
pjsip).

> in your tests are you using TCP or UDP?
UDP only, so far.

> Does the little notification icon of csipsimple appears in status bar (it has 
impact on how the app is considered by the system)? 
Yes, it does.

> What is the version of android you are using 
Lately I've moved to Android v2.2.

> Also are you connected over wifi or 3g?
I use WiFi only.

> how do you ensure the wifi is kept active? Is it using csipsimple wake lock 
or using the android dedicated setting (in android network setting).
In the sytem: 
WiFi sleep policy = Never
In CSS: 
Network | Performances | Lock WiFi (Prevent WiFi idle when SIP is connected) - 
Yes 
Network | Connection keep alive (Interval in sec for KA packets) | WiFi UDP 
keep alive = 120 
User Interface | Advanced tweaks | Use partial wake lock - No 
User Interface | Advanced tweaks | Keep awake while on call  - Yes 

The most important thing is when phone sleeps CSS sends KA (2-bytes) packets on 
a regular basis (evey 2 min). Thus it could be active while phone is in Deep 
Sleep mode. I watch those packets in SIP switch log. It means that there is no 
need to set "Use partial wake lock" to Yes (drains battery and CPU can't go 
into Deep Sleep mode). The problem is not with CSS keeping sending those 
packets. CSS is active when it needs to sent them. It with CSS receiving (and 
processing) of incoming messages.

> Normally event on socket should wake up the application.
Therefore, one of the problems is right here. Not all times an event on socket 
wakes up the tread, responsible for processing of incoming SIP messages.

Original comment by yok...@gmail.com on 29 Jun 2012 at 9:32

GoogleCodeExporter commented 9 years ago
During this night CSS (r1662) was sending KA and REGISTER packers on a regular 
basis. But when I tried to call CSS, it did not respond to INVITE messages at 
all. After that it has stopped sending even KA and REGISTER requests...

In an hour I've finally woke up the phone and CSS did not show that it's 
registered (obviously), but did show indication, that I've left message in 
voicemail to its account. It means, that it still getting notifications from 
the SIP switch, while it did not send any KA and REGISTER requests anymore.

Something is definitely wrong with thread, that process INVITE requests after 
phone was in sleep mode for a while.

Original comment by yok...@gmail.com on 2 Jul 2012 at 7:17

GoogleCodeExporter commented 9 years ago
It could be very interesting to have long running logs in this case. 
I introduce an option in "interface and integration" section (while in expert 
mode).
To log inside a file. It could be very useful to see how the app behaves on 
long term (while regular logs are rotated, this one is put in a file).

When you reproduce and go out of logs you can go in the sdcard in 
csipsimple/logs folder and find the log of the day. I would be very interested 
if you can send it just after you get this problem.
Maybe there is something that occurs in logs in this particular case : I mean 
when it stops sending KA... maybe some socket error it's not able to recover.
I think that in other case logs will not be so helpful because I guess that 
things occurs but just too late because the app is woken up by system only 
later when phone cpu is globally woken up.

For now I didn't reproduced yet here with the settings, but as not the same 
device and the same network maybe it occurs less frequently. So I keep it in 
the same configuration than your so that I can try to see what happens on 
android suspend state when it occurs.

Original comment by r3gis...@gmail.com on 2 Jul 2012 at 9:25

GoogleCodeExporter commented 9 years ago
Downloaded the latest r1667. Then I set: "Settings | User interface | Advanced 
Tweaks | Log directly to a file" to "Yes". Run it in sleep mode for next 1.5h. 
Then I made a test call.

Log from SIP switch (FreeSWITCH or FS for short) indicates:
1. Before the call KA and REGISTER are sent regularly
2. CSS ignores bunch of INVITE requests. In 30 sec FS redirects call to VM and 
after VM IVR starts its prompts, it sends the last INVITE message to CSS (FS's 
bug, it should not do that at this time)
3. CSS suddenly wakes up and sends "100 Trying", for which FS sends "CANCEL" 
immediately back to CSS.
4. CSS ignores that last message too and sends 6 "180 Ringing" messages in a 
row to FS.
5. Then finally FS gets "200 OK/CANCEL" message from CSS, followed by "487 
Request Terminated" message.
6. After the call CSS sends REGISTER request to FS...

To collect the log I attached USB, changed it mode to Drive and tried to see 
the log file. There is ".\CSipSimple\logs\logs_07-02-12.txt" file, but it has 
size 0 bytes. Unmounted and disconnected from USB. Waited for SD to settle down 
and opened CSS. Turned off its log option (see beginning of this post) in a 
hope, that it could flush collected data from memory into the file and checked 
the file again. It still empty...

BTW, could you please use ISO standard for date stamps in the log file names? 
So, if I collect them in a while, they would be sorted correctly... ISO 
standard uses '-' delimiters (that delimiter distinguishes it from other date 
formats, that could use dots - EU, slashes - US, etc). First comes year, 
followed by month and then day. (for example "logs_12-07-02.txt" instead or 
"logs_07-02-12.txt") It's easier to maintain files named this way in organized 
fashion.

Original comment by yok...@gmail.com on 2 Jul 2012 at 11:33

GoogleCodeExporter commented 9 years ago
BTW, to reduce some potential ambiguity with old log files collected before the 
proposed date format change you may consider to use short date format (I use it 
all the time). For example: "logs_120702.txt"
It works perfectly in cases if there is a lot of files in one folder and files 
have date stamped names. They are always in a proper order and it's easy to 
find what you need even if there is many of them gathered in one place...

Original comment by yok...@gmail.com on 2 Jul 2012 at 11:47

GoogleCodeExporter commented 9 years ago
Updated CSS to r1671, then rebooted the phone and repeated the test. Now I was 
lucky and log file was created. I've attached it to the mail, I send you 
directly.

Again, during the last 1 hour of phone being in a deep sleep, CSS was sending 
KA (every 2 min) and REGISTER requests (every 10 min) on a regular basis (it's 
from FS log, CSS log doesn't show that at all!)

I made the test call. CSS did not ring. Then, when VM has answered, it suddenly 
woke up and start ringing. It rung until I click on call end button (otherwise 
it would ring forever, please fix it, because it looks like a bug).

From FS's log:
19:58:34, 58:35, 58:37, 58:41, 58:49 - FS sends 5 INVITE messages. CSS drops 
all of them.
19:59:05 - Call is answered by VM (voicemail)
19:59:34 - after I left message, call with VM is ended (FS sends BYE message)
19:59:36, 59:37, 59:39, 59:43 - FS sends 4 NOTIFY messages to CSS
19:59:43 - CSS sends first respond with "100 Trying" message
19:59:44 - CSS sends 6 "180 Ringing" messages
19:59:44 - CSS sends 4 "200 OK/NOTIFY"
19:00:44 - CSS sends "180 Ringing"
19:00:54 - After I clicked on end clall, CSS sends 11 "603 Decline" messages

I'm sorry to mention that, but log, generated by CSS, is close to useless. 
Date/time stamps are messed up (what is the "Dec 31" date, for example?) There 
is no any indications of CSS sending KA and REGISTER messages with 
corresponding timestamps. But anyway, I hope it will somehow help you to nail 
down the problem...

Hint: at 19:59:43 first respond message, coming from CSS, was "100 Trying" 
message, followed by 6 "Ringing" messages. It was sent at the moment, when it's 
time to send reply for "NOTIFY" message. It tells me, that the old and now 
obsolete "INVITE" message was somehow stuck in some queue and was actually 
processed only now...

Original comment by yok...@gmail.com on 3 Jul 2012 at 4:19

GoogleCodeExporter commented 9 years ago
The 31 dec logs are from android system !! Not from csipsimple. It's just 
because the rom logs some wrong thing when some add the alarm in relative time 
instead of absolute. If you have a look "org.jraf.android.nolock" that is not 
me also produce the same kind of log.
To activate the logs about timer you have to switch to verbose mode (use log 
level 5 instead of 4 -- or even level 6).

That's weird that logs are not collected in direct file mode. I'll try to have 
a look on that. And you're right good idea to rename and have different naming 
for logs direct from pjsip and logs from logcat. Normally this mode is better 
because it only logs pjsip and has no limits. Using this mode you should not 
try to use the help menu however but only the things in settings. So your way 
to get logs from usb is the good one, and no need to stop from help menu 
before. You only need to go in settings, "user interface/integration" section 
and set log level to 5 and check the "Use log file" (I maybe forgot to tell 
about the change of log level... could be the reason for the 0 octet file log).

About everything that comes after the droping of the invite message it sounds 
to me somehow logical. Would be great that FS send some error when it sees that 
the app is talking about a SIP INVITE that no longer exists, rather than just 
replying nothing and let the app resend things.

Original comment by r3gis...@gmail.com on 3 Jul 2012 at 8:34

GoogleCodeExporter commented 9 years ago
log files name will be now in yy-MM-dd_kkmmss after r1674. BTW, I also did the 
change for all other files (such as call recording files and backup/restore 
files).

Original comment by r3gis...@gmail.com on 3 Jul 2012 at 9:42

GoogleCodeExporter commented 9 years ago
Thanks a lot for that!

BTW, I started watching CSS communications with FS using Wireshark. CSS may 
transmit completely unnecessary packets. Here is what I've noticed watching 
just common REGISTER request/reply sessions:
1. Case
CSS-FS - REGISTER request 
FS-CSS - 401 unauthorized 
CSS-FS - REGISTER request 
CSS-FS - REGISTER request  <- why it was sent?
FS-CSS - 200 OK

2. Case
CSS-FS - REGISTER request 
CSS-FS - REGISTER request  <- why it was sent?
FS-CSS - 401 unauthorized 
CSS-FS - REGISTER request 
FS-CSS - 200 OK
CSS-FS - REGISTER request  <- why it was sent
FS-CSS - 200 OK            <- unnecessary induced reply

What is actually expected:
CSS-FS - REGISTER request 
FS-CSS - 401 unauthorized 
CSS-FS - REGISTER request 
FS-CSS - 200 OK

If it's not a problem created by CSS, please convey the message to PJSIP 
developers. 

Original comment by yok...@gmail.com on 3 Jul 2012 at 9:11

GoogleCodeExporter commented 9 years ago
The second register is probably just a retry because server is too long to 
reply, no?

In case 2, sounds the "Allow contact rewrite method" is activated. In this 
case, the first register is probably made with private IP. In the 200 ok, pjsip 
detects IP announced in the first register is private and that a public one is 
detected. So it sends a new register with the actual "public ip" (at least the 
one for your server), so that the registration contact is accurate. This 
registration also unregister the previous one. (the "contact rewrite method" 
option set to legacy mode allows to send an unregister first and a new register 
then).
If you don't want that (because your sip server already manage that on its 
side), you can disable the allow contact rewrite option in expert setting mode.
You could also host a stun server on your sip server so that the "public ip" 
(from your server point of view) is already accurate at first registration.

So not a problem in CSipSimple; but I don't think it's a problem with pjsip 
too. Depending on configuration it can produce that. And the second register is 
probably just a simple retry because of network latency (if it has time to send 
a second one, there is probably delay). This is normally harmless and based on 
SIP specifications about retries.

Original comment by r3gis...@gmail.com on 3 Jul 2012 at 9:21

GoogleCodeExporter commented 9 years ago
No, server replies almost immediately. Delay between the 1st and last messages 
is close to hundreds of milliseconds. Sorry that I forgot to mention it my post 
earlier.

Thank you for the tip to clear check at "Allow contact rewrite" (you're right, 
it was checked before). I'll watch for any drawbacks, associated with cleaning 
of this option... Any hints on what it could break?

Today's morning test did not wake up CSS at all. While it's still sending 
KS/REGISTER requests, it did not react on a bunch of incoming INVITE messages 
at all... I'll try to repeat it tomorrow morning (now with CSS log turned on).

Original comment by yok...@gmail.com on 4 Jul 2012 at 4:46

GoogleCodeExporter commented 9 years ago
After a long time running in deep sleep mode, CSS starts missing some KA 
packets. Look at this picture. It's from Wireshark's log. Missed KA packets are 
pointed with red arrows there:
http://img29.imageshack.us/img29/1710/csska.png
Usually, KA and REGISTER come on a regular basis.

Here is continuation of the log above (beginning from packet #44). At this time 
I made first test call (after keeping phone in sleep mode during the night). 
Call was missed completely.
http://img204.imageshack.us/img204/994/cssmissedcall.png
Watch for timestamps on messages. I hope they may give you some hint what's 
wrong with it.

Original comment by yok...@gmail.com on 4 Jul 2012 at 8:07

GoogleCodeExporter commented 9 years ago
Interesting, the call miss seems to correspond to a missing KA packet. Maybe 
the root cause is something wrong with the KA process.
I should try to rework that to make cleaner ids of android "alarms" that are 
supposed to process KA wake ups. Logs from pjsip would be interesting to see in 
this case if we see some complains about not found timers. Since the ka ids are 
based on memory mapping it could behaves differently on different devices and 
explains why I don't reproduce.

Original comment by r3gis...@gmail.com on 4 Jul 2012 at 8:21

GoogleCodeExporter commented 9 years ago
I misplaced some red arrows on my 1st picture. The missed KA packets are 
actually before packets: #10, #15 and #32.

Original comment by yok...@gmail.com on 4 Jul 2012 at 8:51

GoogleCodeExporter commented 9 years ago
Yes I noticed, what's very interesting is that one KA gets broken, it only 
restart after a register. BTW, it could be explained by the fact KA are started 
after registration, and that one KA starts the next KA timer.

So I think that's a good clue. Maybe there's something wrong with the timer 
implementation I did for android that randomly break timers. This would imply 
that randomly some of the timer is dropped and so in case of KA it can be 
recovered by the next registration but in case of a registration it's never 
recovered and could lead to not be registered after some time.

Original comment by r3gis...@gmail.com on 5 Jul 2012 at 8:31

GoogleCodeExporter commented 9 years ago
KA timers are started by a previous KA request or by REGISTER request. That's 
understandable. It may explain why if KA timer is broken (and we don't know 
exactly why it could happen, but anyway) the next KA request was missing before 
packet #10 and $32. But it did not explain the missed KA request before the 
line #15 though, where KA was somehow restarted again... May be timers are not 
at fault here. May be times were shot at all times, but the missing KA packets 
were not sent (by some other reasons). 

About timers sending for REGISTER requests. One night (actually two nights 
before this one) I've noticed, that REGISTER requests have suddenly stopped 2 
times per night for about 45 min each. Then the sequence was somehow restarted 
again. It means that timers for sending next REGISTER requests are started not 
only by those requests alone, but by some other mechanism too. Or, as in the 
case with KA, perhaps timers work, but the requests were simply not sent...

Here is another case of CSS not picking the call. 
Picture #3: http://img820.imageshack.us/img820/803/cssmissedcall2.png
It suddenly wake up at #75, makes a very brief ring. But it's too late. At that 
time call was already redirected to voicemail. Notice the precise timestamp, 
when CSS woke up this time - it's 12:23:47 (it's not the time for next KA - 
12:23:52). Is it the time for next REGISTER request - 12:23:46? Then, why the 
next register request was slightly (2 sec) delayed this time - #87 @12:23:49?

On the pic#3 you can see second (and unnecessary) REGISTER request at #92 too. 
("Allow contact rewrite" is unchecked at this time)

Original comment by yok...@gmail.com on 5 Jul 2012 at 8:05

GoogleCodeExporter commented 9 years ago
Currently running r1678.

After a long time not touching the phone I made 3 test calls. And after that 
CSS has broken its IP connection for next 1.5 hour...

See this picture #4: http://img833.imageshack.us/img833/3532/cssmissedcall4.png
As usual during whole night CSS sent regular KA, REGISTER requests. You can see 
the last such request sent at 11:34:26.

1st test call was made at 11:41:16 (packet #76). CSS did not respond. But at 
11:42:27 it sent its next (and last) KA packet.

After that there is no KA and/or REGISTER requests at all. In 1.5 hours I 
finally opened it and it started sending REGISTER requests by itself (as 
nothing was happened before). You can see it at 13:18:59.

So, why CSS has stopped receiving INVITE packets, when I made 1st call? And why 
it stopped sending any packets (REGISTER and KA) after I made 2nd call at 
11:42:50 (packet #83)?

Here is log created only at 13:18 time (there is no data in the log about any 
earlier events):
-----
D/WifiService(  213): ACTION_SCREEN_ON
D/WifiService(  213): enable and start wifi due to updateWifiState
V/AlarmManager(  213): Adding Alarm{45735328 type 2 com.csipsimple} Jan 02 
05:00:39 am
V/AlarmManager(  213): Alarm triggering: Alarm{45735328 type 2 com.csipsimple}
V/AlarmManager(  213): Adding Alarm{456a8fb8 type 2 com.csipsimple} Jan 02 
05:00:40 am
V/AlarmManager(  213): Alarm triggering: Alarm{456a8fb8 type 2 com.csipsimple}
V/AlarmManager(  213): Adding Alarm{45651890 type 2 com.csipsimple} Jan 02 
05:00:40 am
V/AlarmManager(  213): Alarm triggering: Alarm{45651890 type 2 com.csipsimple}
--------- beginning of /dev/log/main
D/libpjsip(  470): 13:18:35.194   pjsua_call.c  .Incoming Request msg 
INVITE/cseq=30464021 (rdata0x3517b4)
W/libpjsip(  470): 13:18:35.195 sip_transactio  ..Unable to register INVITE 
transaction (key exists)
D/libpjsip(  470): 13:18:35.195   pjsua_core.c  ..TX 336 bytes Response msg 
500/INVITE/cseq=30464021 (tdta0x27ca40) to UDP IP.ADD.RES.FS:5060:
D/libpjsip(  470): SIP/2.0 500 Internal Server Error
D/libpjsip(  470): Via: SIP/2.0/UDP 
IP.ADD.RES.FS;rport=5060;received=IP.ADD.RES.FS;branch=z9hG4bK6F328atg384UD
D/libpjsip(  470): Call-ID: 3b5d9daf-423d-1230-f798-ff424ea7e6fc
D/libpjsip(  470): From: "SIP" <sip:2005@domain.tld>;tag=2HKpNBBHX5ejr
D/libpjsip(  470): To: <sip:2001@IP.ADD.RES.CSS;ob>;tag=z9hG4bK6F328atg384UD
D/libpjsip(  470): CSeq: 30464021 INVITE
D/libpjsip(  470): Content-Length:  0
D/libpjsip(  470): 
D/libpjsip(  470): 
D/libpjsip(  470): --end msg--
-----
It means, that INVITE message was stuck in the IP stack (for the last 1.5 hour) 
and not processed by CSS on a timely manner (perhaps it was received at 
11:41:16, see picture #4). And what about stopping sending REGISTER requests at 
all after that?

Original comment by yok...@gmail.com on 6 Jul 2012 at 8:58

GoogleCodeExporter commented 9 years ago
Here is case, when CSS missed incoming call and then rung indefinitely...
Picture #5: http://img99.imageshack.us/img99/8412/cssmissedcall5.png

As usual, during the night CSS was sending KA/REGISTER requests regularly, 
until I made incoming call (11:30:39, #59). In the middle of the incoming call 
(at 11:30:57 #64) CSS sends its scheduled KA packet. After call is redirected 
to voicemail SIP server starts sending NOTIFY requests (at 11:31:29 #66). CSS 
wakes up and starts ringing (frame #69). After that it will ring forever, or 
until I stop it by pressing End call button (at 11:37:59 #88). 

BTW, if it happened at night and I simply silenced the phone by pressing VolUp 
or VolDn buttons, it would simply discharge the battery by silent ringing 
(keeping busy SIP server)...

The log collected during the night and this test call 
(logs_12-07-07_114554.txt) has size 120KiB. First 1/3 of it is data gathered 
before this line:
--------- beginning of /dev/log/main
and the first actual SIP message has timestamp at 11:34:30 (frame #79 in the 
picture, sending RINGING packet). So, it's not useful here at all. 

I've noticed that CSS sometimes may produce "pjsiplogs_*" files instead of 
"logs_*". How to force one over another? In other words, how to make it create 
"pjsiplogs_*" files? And is there a way to increase the size of the log 
generated?

Original comment by yok...@gmail.com on 7 Jul 2012 at 7:53

GoogleCodeExporter commented 9 years ago
About logs :

 In interface/integration section you have two options about logs :

 - The log level. This globally set the logging level of the application (both android and native part). To get interesting logs set it to 4 or 5 (5 could be interesting to get all pjsip timers logs.
 - The log to file option : this one introduced recently allow pjsip to log directly into a file. By default, pjsip will log in "logcat" tool that is the official android platform logging utility. As you noticed this one does automatically rotation on logs, and the 
--------- beginning of /dev/log/main
line is the beggining of the main logger buffer. What is before correspond to 
other loggers. 
It rotates automatically. This made it quiet useless if you'd like to get logs 
from long running tasks.
That's why the option to log directly to a file instead is there.
In the case it logs to "logcat", no file are created. It's only created when 
you go in help menu to send the logs by mail. So if you don't plan to log by 
mail, you can also collect these logs using alogcat application (or any other 
application on the android market able to collect android logs).

In the case it logs to a file, all pjsip logs are written to a pjsiplogs_* file.
In your case, it's the preferred way since you are trying to collect something 
that runs in background. So definitely, don't try to use the help menu anymore. 
Instead, go in settings / user interface section and turn log level to 4 or 5, 
and check the box about logging to a file. That's it. Don't play again with 
help / record logs menu because it will change again these expert settings. 
Just leave things like that, and it will produce pjsip logs inside the 
pjsiplogs_* files.

These files are not rotated however, so for long term, keep in mind that you 
have to disable the option to not run out of free space on the sdcard ;).

Original comment by r3gis...@gmail.com on 8 Jul 2012 at 5:22

GoogleCodeExporter commented 9 years ago
OK, I've tried to get long logs (r1682):
"Settings | User interface | Native dialer integration | Call logs integration" 
- yes 
"Settings | User interface | Advanced tweaks | Log level" = 1 (I'll make it 
bigger when it starts working)
"Settings | User interface | Advanced tweaks | Log directly to a file" - yes
Now I see 3 files:
pjsiplogs_12-07-09_165749.txt
pjsiplogs_12-07-09_171652.txt
pjsiplogs_12-07-09_183139.txt

All of them have size 0 bytes. Timestamps indicate connection time (more 
precise, time of first registration after setting change or location change).

Do you flush those logs on a regular basis? (like every 4K or on time manner)

Original comment by yok...@gmail.com on 10 Jul 2012 at 1:57

GoogleCodeExporter commented 9 years ago
>> "Settings | User interface | Advanced tweaks | Log level" = 1 (I'll make it 
bigger when it starts working)

If you don't start bigger it will never start working ! ;). If the log level is 
1, it logs *ONLY* errors ;)... ( 1 = Errors only, 2 = adds warning, 3 = adds 
info, 4 = adds debug, 5 = adds verbose).
So your files will have a 0 byte size. It's absolutely normal ;)
Start with 4 or 5 as I said, you'll see that logs will be significantly more 
interesting :)

Else, the next nightly build will introduce a cleaner way to manage timers. It 
will be interesting to see how it goes with this new approach on this problem. 

Original comment by r3gis...@gmail.com on 11 Jul 2012 at 9:57

GoogleCodeExporter commented 9 years ago
r1688 - logging to file doesn't work. I set Log Level = 4 (debug), waited for 
the whole night, made test call and there is nothing in the log. :( The latest 
log (timestamped yesterday's evening) is 8KiB in size... 

How to make logging to work reliably?

Today's test failed again (as all test calls made past several mornings). It's 
documented as Wireshark's screen shot (similar to the already published above) 
and as a log, made by FreeSWITCH. I'd like to share with you the log, created 
by CSS, but there is no one that has any data about the test at all...

Original comment by yok...@gmail.com on 14 Jul 2012 at 7:01

GoogleCodeExporter commented 9 years ago
Forgot to mention, why CSS's logs make it so hard to read in any text editor 
without preliminary editing (inserting proper 'new line' symbols)? Why log 
files use '0x0A' as a line delimiter? I know for sure that CSS can make '0x0D, 
0x0A' as a next line code. For example, there is: '0D 0A 0D 0A 0A 2D 2D 65 6E 
64 20 6D 73 67 2D 2D 0A', which in plain text looks like '--end msg--'. As you 
see at the beginning it uses conventional '0D 0A' as new line. But in the 
middle and at its end it uses '0A'. Can you actually see CSS's generated logs 
with any plain text editor (e.g. Notepad, EditPlus, etc)? I'm asking, because I 
can't...

Original comment by yok...@gmail.com on 14 Jul 2012 at 7:36

GoogleCodeExporter commented 9 years ago
I'll try to debug why log level set to 4 + log to file option activated doesn't 
work.
I also observe some weird thing : the file logger seems to cache writing and 
only write by block when it has cpu activity given. I guess that's an 
optimization of pjsip but in our case very annoying.

About the CR vs LF vs CRLF, android is a linux system so it adds 0x0A (\n) as 
end of line of logs. It's also how pjsip adds logs, it's not proper to 
csipsimple. 

As I'm running on linux, I never had any problems with the fact it adds only \n.
Anyway, on linux, all editors we have are clever and detects ends of line 
dynamically and are able to support CR (mac), LF (linux) and CRLF (windows) 
seamlessly. 
EditPlus or advanced editors should have an option to autodetect... On linux 
even basic command line programs to edit files like "vim" are able to detect 
end of line dynamically. ;)

Original comment by r3gis...@gmail.com on 14 Jul 2012 at 8:30

GoogleCodeExporter commented 9 years ago
After a closer look on android source code the log file is not flushed and it's 
indeed an optimization (else the app is extremely slow).

It's only flushed at the end of logs which occurs if you go in option and press 
back button (it will finalize the file and produce a new one).

For the root problem I'll do a commit with a try to re-introduce the thread 
count option. By default it's now 0 to reduce as much as possible battery 
impact. But maybe it was not a good idea on some devices.

Original comment by r3gis...@gmail.com on 14 Jul 2012 at 9:32

GoogleCodeExporter commented 9 years ago
The thread count option is now available again. To change it, in expert setting 
mode, in Media section you'll see a new option "Thread count". By default it's 
0. Try to change it to 1. It could maybe help with the root problem.

Original comment by r3gis...@gmail.com on 15 Jul 2012 at 10:00

GoogleCodeExporter commented 9 years ago
r1691 - yesterday I finally got lucky and CSS made its log, containing 
debugging events inside. :)

There is no any error code at the time of the test call (log level - 4).

Here is sequence of events (1005 calls 1002) with timestamps:
14:41:57.903 - 1002 sends REGISTER request. Next REGISTER request is supposed 
to be at [14:51:53].
14:50:32     - 1005 calls and 1002 misses the call.
14:51:22.534 - 1002 wakes up, processes all 6 INVITE requests and sends 6 
Ringing messages to FS. But at this time, it's too late - the call was already 
redirected to VM.
14:51:53.543 - 1002 sends REGISTER request.

Here is my vision of the problem:
CSS has a thread, that (perhaps?) is always active (it sends KA requests every 
2 min and REGISTER requests every 10 min). This thread DOES NOT wake up another 
CSS thread, that can process incoming packets (6 INVITE messages) and they are 
just sitting in the IP stack... Later on the latter CSS thread wakes up and 
starts processing those messages from the stack (sending 6 Ringing responses to 
FS).

I guess the problem is with the fact that packet receiving thread doesn't wake 
up thread that could process INVITE request...

BTW, we have to do something with endless ringing, occurring in such cases. CSS 
makes ringing in behalf of the obsolete call (call is already abandoned by SIP 
server) and never stops. Imagine if you left your phone on your desk or in a 
locker room... It will ring until it will deplete the battery. Not good ;) I 
guess may be some timer (2 min or something) to cancel the abandoned call would 
be appropriate here.

Original comment by yok...@gmail.com on 15 Jul 2012 at 6:19

GoogleCodeExporter commented 9 years ago
I'll set "Thread count" to 1 and check it tomorrow with latest r1696.

Original comment by yok...@gmail.com on 15 Jul 2012 at 6:23

GoogleCodeExporter commented 9 years ago
Running r1696 with "Thread count" = 1 and loglevel = 5.

Test call (after CSS was at sleep during the night):
1) 11:02:05 - made test call
2) 11:02:24 - CSS starts processing all of 5 INVOTE requests received earlier 
(see p.1)
3) 11:02:25 - CSS wakes up and starts ringing
4) 11:02:36 - I pressed Call Answer button, but at the same time call was 
canceled (redirected to VM)
5) 11:02:59 - CSS got notification about voice message in its VM
6) 11:03:00 - CSS sends next REGISTER request as usual

Here is picture of WS log: 
http://img401.imageshack.us/img401/7404/cssmissedcall10.png
Here is cut from CSS log related to the test: http://pastebin.com/6sKm47EC

1. CSS sends REGISTER requests every 10 min
2. CSS sends KA requests every 2 min
3. I did not remove any lines from the log, just added some new empty lines for 
better readability.
4. There is no errors during the tests call.

Question:
CSS still not process 5 INVITE messages at the time, they arrive. It suddenly 
woke up at 11:02:24. Why it woke up at that particular time? Next REGISTER 
request is scheduled at ≈11:03:00 and not at 11:02:24.

Original comment by yok...@gmail.com on 16 Jul 2012 at 8:36

GoogleCodeExporter commented 9 years ago
Thanks for the test. 

Indeed the fact it wakes up at 11:02:24 precisely is very weird. One thing that 
comes in my mind : it's maybe the entire android wifi driver that was just 
sleeping and wakes up only at this time and then notify and push all datas to 
csipsimple.

In wireshark logs do you see some ICMP packets? 
It could be interesting to test with TCP too. As TCP layer provides hacks from 
the very lower layer, it may allow to determine if what is not replying is 
pjsip/csipsimple or android itself that is in some PSP mode (power save 
polling). 
I think that if we see that the TCP layer shows reply from android but that the 
app is not woken up it's probably something within the app. If we see the TCP 
layer replying only when the app wakes up it probably means that we have to 
search something with the android transport and to try to put it in another 
mode where requests are processed immediately or to see if it's not possible to 
reconfigure the device so that PSP mode is more aggressive (I know that it 
depends on devices and manufacturers settings -- normally it should not go more 
than 5 seconds but some manufacturers plays on that to save more battery)

Side point : I don't forget the issue where the app remains in ringing mode. I 
also observed that for other root cause and it's something important I have to 
fix. I'm trying to find the more robust way to do that.

Original comment by r3gis...@gmail.com on 16 Jul 2012 at 9:01

GoogleCodeExporter commented 9 years ago
If I set transport both UDP and TCP, CSS uses UDP. If I try to deny (un-check) 
UDP, CSS doesn't register to SIP server. CSS log indicates:
--------
16:15:47.577    pjsua_acc.c  .Acc 1: setting registration..
16:15:47.577    pjsua_acc.c  ..Unable to generate suitable Contact header for 
registration: Unsupported transport (PJSIP_EUNSUPTRANSPORT) [status=171060]
16:15:47.578    pjsua_acc.c  ..Unable to create registration: Unsupported 
transport (PJSIP_EUNSUPTRANSPORT) [status=171060]
--------
What I'm missing here?

Original comment by yok...@gmail.com on 16 Jul 2012 at 11:38

GoogleCodeExporter commented 9 years ago
Sorry I didn't precise how to configure an account to TCP :)
In fact it's an account setting. The global setting is just to tell whether the 
entire stack should or not start the transport. Then accounts will use the 
transport if necessary.
So in global settings you should leave both TCP and UDP enabled.

To configure your account to use TCP, go in accounts view and transform your 
account into an advanced account (long press the account row, choose wizard and 
choose advanced) (if your account is already expert one, the option is also 
available in expert mode). Then in account settings in advanced mode you'll 
find the option about TCP (it's TCP option if advanced, and Transport > TCP if 
expert).

Original comment by r3gis...@gmail.com on 17 Jul 2012 at 8:33

GoogleCodeExporter commented 9 years ago
[deleted comment]
GoogleCodeExporter commented 9 years ago
Thank you for instructions. I set account to use TCP and it works well.

Meantime I made yet another test call (r1698, UDP). Result is similar:
http://img708.imageshack.us/img708/8400/cssmissedcall11.png
11:01:26 - CSS sends REGISTER request
11:09:59 - test call begins, CSS misses 6 INVITE messages. I left message in VM
11:10:47 - NOTIFY message is sent to CSS (new voicemail)
11:10:48 - CSS wakes up and sends 6 Ringing messages back to FS
11:11:21 - CSS rends next REGISTER request

And we see again, that CSS can start processing INVITE requests [11:10:48] 
before it wakes up for sending next REGISTER request [11:11:21].

Log file doesn't show anything why and how CSS woke up. Here is cut from the 
log. You see the last KA packet sent on schedule, followed by sudden CSS wake 
up to process the old INVITE message:
----------
11:09:26.676 timer_android.  Registered thread timer_thread_640
11:09:26.676 timer_android.  FIRE timer 0 of heap 5
11:09:26.676    pjsua_acc.c  Sending 2 bytes keep-alive packet for acc 2 to 
IP.ADD.RES.FS:5060
11:09:26.677   tdta0x4153b8  Destroying txdata raw
11:09:26.678 timer_android.  Scheduling timer 0 of 5 in 120000 ms
11:09:26.691 timer_android.  FIRE done and released
11:10:47.533 sip_endpoint.c !Processing incoming message: Request msg 
INVITE/cseq=30938235 (rdata0x27cecc)
11:10:47.533   pjsua_core.c  .RX 1273 bytes Request msg INVITE/cseq=30938235 
(rdata0x27cecc) from UDP IP.ADD.RES.FS:5060:
INVITE sip:1002@IP.ADD.RES.CSS:35709;ob SIP/2.0
----------

I'll test CSS using TCP transport later on and let you know results.

Original comment by yok...@gmail.com on 17 Jul 2012 at 7:27

GoogleCodeExporter commented 9 years ago
I can't get incoming calls with CSS running with TCP transport mode. See new 
issue, describing the problem:
http://code.google.com/p/csipsimple/issues/detail?id=1830

Original comment by yok...@gmail.com on 17 Jul 2012 at 11:58

GoogleCodeExporter commented 9 years ago
Well, after setting "Allow contact rewrite" option to "yes" I was able to 
receive calls all the time. And it looks like it works reliably (so far ;) ).

But another day, another story... Now, when I start getting incoming calls 
reliably, I've noticed another problem. CSS indicates that call, that I've just 
answered, was missing... every time I accept the call. Moreover, if I keep call 
active more then 1 min, CSS drops the call and sends BYE message to SIP server. 
Why is that? I've discovered, that now FS refuses to send ACK messages back to 
CSS. And that causes CSS to send 10 more 200 OK messages to FS, followed by 
BYE...

At first, I begun to blame FS. Why it stops sending ACK messages to CSS? 
Googling the problem returned nothing... Then I tried to make very similar 
call, but now in the opposite direction - from CSS to OBi100 ATA device 
(working with TCP transport). All calls were fine. Looking at FS logs I see 
that FS sends to OBi100 ACK messages all the time.

So, what makes FreeSWITCH to stop sending ACK messages back to CSS after 
receiving 200 OK/INVITE? And why if I toggle in CSS transport from TCP to UDP 
everything returns back to normal and FS start to send those ACK messages 
without any hesitation?  Do you have any idea? Have you noticed this problem 
before?

Original comment by yok...@gmail.com on 21 Jul 2012 at 3:57

GoogleCodeExporter commented 9 years ago
Great :) Good news for reliability. So it sounds something with UDP only, and 
it's probably linked more to the lower android network layer than to pjsip or 
something I changed for android (I didn't modify things about TCP/UDP).

About the missing ACK it's very interesting. I raised recently something to 
pjsip guys about tag comparison that is involved in ACK process. They fixed it 
but was a big change. On my side I didn't noticed regression, but not to 
exclude it introduced a regression.

Can you collect some wireshark trace from the FS side? (Or some logs from 
csipsimple). Hopefully there is some clue on why the 200OK of pjsip is not good 
for freeswitch.

Original comment by r3gis...@gmail.com on 21 Jul 2012 at 8:47

GoogleCodeExporter commented 9 years ago
Here is log collected by Wireshark yesterday: 
http://img155.imageshack.us/img155/3686/cssmissedcall12.png
And here is CSS's log: http://pastebin.com/LhnUEBaE
It has only two 200 OK/INVITE messages, but nothing was changed since then...

Original comment by yok...@gmail.com on 21 Jul 2012 at 7:13

GoogleCodeExporter commented 9 years ago
Talking about TCP and reliability. Today's morning I've noticed problem. Before 
that CSS was sending REGISTER requests regularly (last time @11:15:26). 
1. I made incoming call. At 11:15:29 FS sends INVITE message, which CSS 
suddenly ignored at this time. TCP layer dropped staled connection. 
2. 11:18:27 - CSS suddenly starts ringing, sending (it thinks, it does) replies 
to FS
3. 11:20:00 - I stop CSS ringing by pressing "End Call". CSS indicated NO 
MISSED CALLS after that (and why is that?)
4. 11:25:22 - CSS sends REGISTER request (how it's possible, if TCP is broken!)
5. 11:25:58 - same as above
6. 11:27:31 - TCP connection restored and CSS finally sends REGISTER request.

Why broken TCP is not detected earlier? TCP layer should indicate that since 
11:15:29. And upper layer (CSS) should react accordingly! But it still pretend 
as it's working as usual (since p.2, 11:18:27)

Wireshark: http://img26.imageshack.us/img26/6678/cssmissedcall13.png
CSS: http://pastebin.com/TTiR7RT3

Original comment by yok...@gmail.com on 21 Jul 2012 at 8:12

GoogleCodeExporter commented 9 years ago
Excellent logs.

Well that's a little bit what I expected (and also feared).

At point 1, note that CSipSimple the TCP layer is NOT broken yet! 
Two reasons : 
- There is no logs indicating the TCP socket is broken in pjsip (the logs you 
see about termination of the Register transaction are not linked with the 
transport)
 - The line are black, and that the problem is on TCP layer that's trying to retransmit packets ! 
It's not pjsip that ignores the packet, it's the transport layer. The phone is 
unreachable ! You see that the TCP layer of the phone doesn't reply anything to 
FS. And FS (actually the server TCP transport layer, because FS is not aware of 
this transport OSI layer) has to retransmit. It's TCP retransmissions that we 
see here. So on very lower transport layer.
So here, it's not pjsip that lost the packet but it's the lower TCP layer of 
the phone. I was not able to affirm for UDP packets (since we can't see lower 
layer activity on UDP). But here it's pretty clear : it's the phone lower 
transport layer that is just sleeping so deep that it doesn't reply to incoming 
packets.
And against that there is nothing the application can do because it's not at 
application OSI layer, it's at transport layer.
Also, at this point, it's not yet broken. TCP is made to make reliable 
transport of packet. And that's not because of the fact some packet are not 
received on the other side that the link is broken. On a good and fast linux 
machine, if you unplug and replug a cable while downloading you should be able 
to recover the download without needing the application to recover by other 
applicative ways.

Also interesting point is that it's not about CPU, because CPU is activated by 
CSipSimple at 11:16:40 and 11:17:31. So it's really about the network driver 
that is really asleep regardless CPU activity.

Point 2. (or just before point 2). It's very very interesting too here. If you 
have a look to pjsip logs you'll see that at 11:18:26.847 the application sends 
a keep alive packet. It's pjsip that decide to send something on the TCP 
socket. And actually it seems to succeed... the TCP socket is not broken ! 
And not only that but it seems to also wake up the network driver that was 
asleep before. And at this time the driver will be able to proceed the incoming 
TCP at 11:18:27... just when we see the wireshark log about TCP continuation.

It leads to point 2. incoming packet is read properly by pjsip and it tries to 
reply.
So it seems back to normality at this point.
Unfortunately, when it tries to reply at 11:18:27, we can see a RST of the TCP 
socket. In your wireshark logs I don't see the difference between incoming and 
outgoing packets so hard to say what is the side that RST the socket. 
Anyway, here it becomes weird. At this point, normally pjsip should be informed 
of the fact the TCP socket has been broken. Apparently from logs it's not. As 
it doesn't see that the TCP socket is broken now (it was NOT broken before); it 
just retry. Apparently the lower unix socket layer doesn't inform the stack 
that there is a broken pipe. This is very very strange. This could be something 
in the sip stack or something in the network driver. As I've already observed 
broken TCP pipe detected in pjsip and as the network driver seems to already 
have weird behavior in point 1 to 2, I would more think about something in the 
android network driver... but I can't for sure affirm that.

For point 3. It seems normal behavior for me. If you manually decline a call, 
it's not missed. It's incoming not replied but not missed (it should appear in 
call logs however, isn't it?)

Before point 4. Now as we've seen previously TCP socket is broken. (from point 
2 and not from point 1 ! but it's broken now). However, pjsip seems to continue 
trying to send keep alive packets. These packets are not seen on wireshark logs 
which confirms that the TCP socket is not endpoint to endpoint available 
anymore. However still weird that android linux lower layer doesn't report the 
fact the TCP socket is broken. 

For point 4 and 5. 11:25:22 and 11:25:28. We are exactly in the same case than 
in my previous point. Nothing changed, pjsip thinks the socket is still alive 
(and apparently so does the lower linux socket layer). So it just sends the 
packets. It could be very interesting to see tcpdump logs from the phone 
itself. Maybe actually the phone has still a tcp socket connected to the 
router. In which case it's a router in the middle that does bad work.

Before point 6. You can notice some very interesting log :
11:27:04.167   tcpc0x2cec3c  TCP transport destroyed normally
...
This comes from when the linux socket layer saying to pjsip that the TCP socket 
is finished !! It only gets the information now !! 
So now that pjsip knows from the linux socket layer that the socket is down... 
it can fix it and create a new socket !! And at this point it will recover 
things normally.

So, to sum up :
The socket is broken at 11:18:27. When we see RST in wireshark logs. 
The reason for this reset is not clear, but what's sure is that pjsip is 
informed about that only at 11:27:04. Meanwhile, pjsip and apparently the linux 
socket layer both thinks the socket is still alive. Packets can be sent but 
never reach your SIP server. Two possible reason in my opinion : 
* the network (wifi?) driver get broken due to some activity during the time it 
was completely asleep.
or
* there is some network equipment in the middle that is broken due to the same 
previous point. This network equipment leave one socket open on the side 
connected to the phone but has it broken on the side of the sip server. This 
solution is maybe more probable.

In any case, I don't think it's something with application since everything 
here occurs at transport layer level. Besides the application detects some TCP 
termination (very late but it does). It's not an error termination but some 
natural ending of the TCP socket. Pjsip doesn't invent this event and it's not 
on writing something. So it comes from outside... either the network driver or 
the router in the middle that finally decided to cut this TCP socket.

What do you think?

Side note and possible solution. Apparently you reach this case because of the 
deep sleep mode of the phone that is recovered by the fact pjsip sends the keep 
alive. So maybe (even if crappy and battery draining), it could be a solution 
to reduce TCP keep alive interval.

Original comment by r3gis...@gmail.com on 21 Jul 2012 at 9:17

GoogleCodeExporter commented 9 years ago
Here is yet another log, indicating failure of CSipSimple (r1736) to take 
incoming call.
http://img696.imageshack.us/img696/9238/cssmissedcall14.png

Note:
1. 10:43:01 - call arrives
2. 10:43:49 - CSS starts ringing (call was already transferred to VM) and then 
rings endlessly
3. 10:43:51 - CSS sends REGISTER request. "Contact" contains WAN IP of CSS
4. 10:43:55 - CSS sends REGISTER request. "Contact" contains LAN IP of CSS
5. 10:43:55 - CSS sends REGISTER request. "Contact" contains WAN IP of CSS

Problems:
1. CSS was too late to take incoming call again. TCP was not reset and 
connection was re-used in next communications.
2. Why CSS sends REGISTER request (p.4 above), pointing to its LAN address?

First bunch of black requests (7) on the Wireshark's log are coming form FS to 
CSS, while the second bunch (7) are coming back from CSS to FS.  

CSS has not produced any log at this time (log option is always set to ON). I 
have logs made before and after the event, but not at the time, the event has 
happened...

To reproduce the problem I suggest:
1. Set registration time to 10 min (or more), keep CSS running over night. 
Don't touch phone.
2. Run Wireshark at SIP server side. Watch for couple of REGISTER requests 
coming form CSS to SIP server on regular basis. Between REGISTER requests see 3 
"Continuation" requests coming every 3 min from CSS.
3. Calculate the time CSS is going to send next REGISTER request (it's easy, if 
you set expiration time to 10 min). Make test call to CSS 1 minute earlier then 
that time and watch...

Original comment by yok...@gmail.com on 26 Jul 2012 at 10:56

GoogleCodeExporter commented 9 years ago
About problems :
1. Indeed, it seems like in previous trace be something on transport layer. TCP 
packets are mark with retransmission packets as if it's the lower layer that 
doesn't respond. I thought again about the problem after latest analysis and 
thought that it was not totally sure it could not be something with pjsip that 
is not in select mode on the socket. Something that could be interesting in 
this state is to try to run a command line "ping" from the sip server to the 
device IP address. Just to see it's alive to ping request. (also ensure before 
that ping works when the phone is not sleeping to be sure ping are not blocked 
by something in the middle ;) ).

2. Are you sure the p.4 is not a Register with "Expires : 0"? It could be an 
unregister of a previous registration. Even if it's very weird that it does 
that not at startup. Unless the sip stack is restarting... but it should not 
since normally it's with a current call ongoing here.

I hope that's not a crash of the sip stack. It's the only thing that could 
logically explain that if it doesn't have an Expires : 0. 
Do you have corresponding csipsimple logs? (You can send it by mail if you 
prefer for your privacy). Or alternatively, did you observed a crash / 
unexpected close of the call screen ?

Original comment by r3gis...@gmail.com on 27 Jul 2012 at 8:52

GoogleCodeExporter commented 9 years ago
> Something that could be interesting in this state is to try to run a command 
line "ping" from the sip server to the device IP address. Just to see it's 
alive to ping request. (also ensure before that ping works when the phone is 
not sleeping to be sure ping are not blocked by something in the middle ;) ).

The problem here is - CSS is behind NAT router. Ping may reach only WAN address 
of the router, not CSS.

> 2. Are you sure the p.4 is not a Register with "Expires : 0"? It could be an 
unregister of a previous registration. Even if it's very weird that it does 
that not at startup. Unless the sip stack is restarting... but it should not 
since normally it's with a current call ongoing here.

I'm pretty sure that p.4 was a common sequence of two REGISTER exchanges (each 
containing 4 packets, not counting TCP ACK). First provides LAN IP:port in 
"Contact" data, while the second (p.5) contains everything the same, except 
"Contact" now contains correct WAN IP:port. At this time I'm not sure about p.3 
though (sorry, my fault). It could have "Expires : 0". But why? And, talking 
about potential sip stack crash - keep in mind, that TCP port (59187) was 
exactly the same before and after that time. In other words, TCP connection was 
not dropped, or otherwise, the port would be changed.

While currently I'm still waiting for the "Contact" data fix 
(http://code.google.com/p/csipsimple/issues/detail?id=1830 - it prevents me 
from using CSS with TCP transport, because all incoming calls are terminated by 
CSS in 1 min), I'm trying to use Sipdroid and watch, how it runs under the very 
same conditions. Today morning I tried to make similar test call and Sipdroid 
took the call just fine. There is no any conclusion from this test, of course. 
And I'll try to watch it tomorrow and see if it exhibits similar problems as 
CSS. Sipdroid is a way less configurable, than CSS. The difference (that I 
can't change) here is - with UDP transport it has 3600 registration time and it 
sends KA (2 bytes packages, similar to CSS) every 60 sec. But I don't think it 
could make any difference to the tests...

Original comment by yok...@gmail.com on 28 Jul 2012 at 3:34

GoogleCodeExporter commented 9 years ago
At this time I can't use TCP transport, because all incoming calls are 
terminated by CSS in 1 min (see post above). Thus, I switched back to UDP 
transport.

Here is one more example of CSS (r1765) missing incoming call. Look at this 
Wireshark log:
http://img705.imageshack.us/img705/3338/cssmissedcall15.png

1. 21:31:24 - normal registration
2. 21:33:26 and 21:35:26 - two KA packets, sent by CSS
3. 21:35:49 - new incoming call
4. 21:37:26 and 21:39:27 - two KA packets, sent by CSS
5. 21:41:20 - REGISTER request sent, 401 Unauthorized message received
6. 21:41:21 - CSS starts ringing...
7. 21:41:35 - incoming call (obsolete at this time) is set to Ignore and CSS 
sends multiple (11) 605 Decline messages

As  you can see CSS starts ringing almost 6 minutes after the call was 
initiated. It's too late at this time...

Note:
KA packets are sent on a regular basis, even after CSS was ignoring new 
incoming INVITE messages. It looks like CSS was not waken up by incoming IP 
packets. They were received by phone's IP stack, but it did not call CSS to 
process them... Here is the problem!

You can reproduce it when:
1. Allow smartphone run without connection to PC or any other device for 20 
min. It should be in sleep mode at this time.
2. Watch communications with SIP server (FreeSWITCH) with Wireshark and 
initiate next call in couple of minutes after nest registration occurs.
3. Cancel the call and wait until followed registration process will start. It 
will make CSS ring...

As you can see, at this time CSS can't be used as reliable phone replacement. 
It could miss important incoming calls :(

Original comment by yok...@gmail.com on 11 Aug 2012 at 5:19

GoogleCodeExporter commented 9 years ago
Hi r3gis, I encountered similar problems. After some debugging, I suspect that 
the CPU does wake up upon receiving INVITE. This wake-up was triggered by OS. 
However, the OS quickly puts the device back to sleep mode again, thus CSS 
doesn't have enough time to retrieve the INVITE message from the IP stack and 
process it accordingly. At a later time when CPU wakes up again (long enough), 
the INVITE is then processed.

I made the above conclusion by doing the following testing:
1. I install a simple app of mine. It simply vibrate(500) every 1 sec in a 
while loop. This way, I know when the CPU really sleeps. Testing indicates that 
all devices randomly wakes up every second or minutes (even if you don't touch 
it for hours)
2. I then start CSS and make calls to it from another SIP client
3.1 Because the OS wakes up device randomly, sometimes the INVITEs are picked 
up immediately if coincidently the device is awake
3.2 In other cases I ALWAYS observe brief vibration when a call is made, but 
the call is not processed timely because device falls to sleep again (no 
vibration). The call would show up if I turn on device screen (meaning force 
CPU out of sleep).

Hope this helps.

Robin

Original comment by jingjing...@gmail.com on 6 Sep 2012 at 10:05

GoogleCodeExporter commented 9 years ago
Hi Robin,

Many many thanks for this test very very very interesting ! I didn't thought 
about an app that vibrates to indicate sleep mode. Very clever :)

So would mean that the cpu lock made in the app as soon as I receive the 
incoming call from the sip stack is made too late. At least on some devices. 
(It could explain why I don't reproduce the problem with recent devices).

Maybe could be interesting to see if CPU lock can be done elsewhere before sip 
parsing in pjsip directly. I'll try to have a look.

Original comment by r3gis...@gmail.com on 6 Sep 2012 at 10:23

GoogleCodeExporter commented 9 years ago
It seems to me that device ALWAYS wakes up when receiving an incoming packet 
(e.g. INVITE). But the waking period could be very brief, in which case the 
INVITE is queued in low layer of the stack and doesn't have time to reach app 
layer (thus lock CPU as required).

Do you have the same assumption (device ALWAYS wakes up when receiving an 
incoming packet)?

Original comment by jingjing...@gmail.com on 7 Sep 2012 at 3:09

GoogleCodeExporter commented 9 years ago
Yes that's the assumption I did.
However I thought it keep the phone wake up for a bigger delay.
In fact the app already take a wake lock as soon as it receive an invite from 
the server.
However it's the application layer that do that. And between network and app 
layer there is the native sip stack parsing. Maybe this parsing time is too 
long for some devices.

Original comment by r3gis...@gmail.com on 7 Sep 2012 at 3:19