Closed GoogleCodeExporter closed 9 years ago
Module that is responsible for processing SIP messages from SIP switch may take
a nap. I've observed followed scenarios:
1. CSS doesn't react to incoming INVITE messages in real time. Then it may
suddenly wake up and process the stale INVITE message and start ringing. It
will ring indefinitely or until I cancel the obsolete incoming call. Then it
sends 10 call terminating messages to SIP server (which doesn't send anything
back in response, because there is no such call under its watch anymore).
2. From SIP server's log I see a proper processing of SIP message (including
sending back "100 Trying", "180 Ringing"), but there is no any ringing tone or
any other activity observed in the phone...
3. CSS doesn't receive incoming INVITE messages at all. All those incoming
requests are just dropped and when it finally wakes up - it doesn't remember
anything about those INVITE messages at all.
The most important thing to note is that in all cases CSS sends KA 2-bytes
packets on a regular basis. It means:
1. It's TCP/UDP layer (receiving / sending UDP packets) is still working all
the time.
2. There is no any problems with NAT hole. It's opened all the time (making 3rd
case above looking weird).
Please look at the part, responsible for processing messages, and check why it
can enter a sleep mode, while it should always be ready to process all incoming
messages.
Original comment by yok...@gmail.com
on 28 Jun 2012 at 8:22
Hi, thanks for the detailed tests and scenarios.
I will try to reproduce on my side.
Some side comments too, the fact that the app goes in suspend mode is not so
bad. It means that battery management is done properly. Normally event on
socket should wake up the application. Normally then as soon as I can I take a
CPU lock the time the sip message is processed.
There is some recent change in pjsip that flag socket with MSG_NOSIGNAL (it was
made 10 days ago : https://trac.pjsip.org/repos/changeset/4170). I wonder if
could not be the root cause. Did you observed the same behavior in version
older than 10 days ago?
Normally it should affect only send process. But maybe it has deeper impact on
android and tell the socket to not wake up the app...
Also some question : in your tests are you using TCP or UDP? Does the little
notification icon of csipsimple appears in status bar (it has impact on how the
app is considered by the system)? What is the version of android you are using
(I heard that some 4.0 ROM are buggy with network TCP)? Also are you connected
over wifi or 3g?
3g radio module has some complex state machine (that also depends on carrier),
so it could explain why I don't reproduce.
If over wifi, how do you ensure the wifi is kept active? Is it using csipsimple
wake lock or using the android dedicated setting (in android network setting).
Even if I have no doubt on the fact the app is still able to make outgoing
connections, these settings could have impact on the radio state.
For example, after a while the wifi and the 3g goes in a "low power" mode where
latency becomes significantly bigger. It should not be big enough to be
problematic in our case, but if ROM is buggy it could be.
For example android docs provide this state machine for AT&T 3G network :
http://developer.android.com/images/efficient-downloads/mobile_radio_state_machi
ne.png
and they tell that values can vary depending on mobile technology and carrier
settings. AFAIK, the wifi state machine is similar. It can go faster to a PSP
mode too if screen goes of on some devices.
Original comment by r3gis...@gmail.com
on 29 Jun 2012 at 8:18
Answering your questions:
> Did you observed the same behavior in version older than 10 days ago?
Yes, I did. May be some details were changed lately (I did not catch the
difference). But I certainly missed some first calls coming to CSS. It woke up,
rung for a bit of a sec and then indicated, that the call was missing. It was
frustrating experience all the times (before and after the latest change in
pjsip).
> in your tests are you using TCP or UDP?
UDP only, so far.
> Does the little notification icon of csipsimple appears in status bar (it has
impact on how the app is considered by the system)?
Yes, it does.
> What is the version of android you are using
Lately I've moved to Android v2.2.
> Also are you connected over wifi or 3g?
I use WiFi only.
> how do you ensure the wifi is kept active? Is it using csipsimple wake lock
or using the android dedicated setting (in android network setting).
In the sytem:
WiFi sleep policy = Never
In CSS:
Network | Performances | Lock WiFi (Prevent WiFi idle when SIP is connected) -
Yes
Network | Connection keep alive (Interval in sec for KA packets) | WiFi UDP
keep alive = 120
User Interface | Advanced tweaks | Use partial wake lock - No
User Interface | Advanced tweaks | Keep awake while on call - Yes
The most important thing is when phone sleeps CSS sends KA (2-bytes) packets on
a regular basis (evey 2 min). Thus it could be active while phone is in Deep
Sleep mode. I watch those packets in SIP switch log. It means that there is no
need to set "Use partial wake lock" to Yes (drains battery and CPU can't go
into Deep Sleep mode). The problem is not with CSS keeping sending those
packets. CSS is active when it needs to sent them. It with CSS receiving (and
processing) of incoming messages.
> Normally event on socket should wake up the application.
Therefore, one of the problems is right here. Not all times an event on socket
wakes up the tread, responsible for processing of incoming SIP messages.
Original comment by yok...@gmail.com
on 29 Jun 2012 at 9:32
During this night CSS (r1662) was sending KA and REGISTER packers on a regular
basis. But when I tried to call CSS, it did not respond to INVITE messages at
all. After that it has stopped sending even KA and REGISTER requests...
In an hour I've finally woke up the phone and CSS did not show that it's
registered (obviously), but did show indication, that I've left message in
voicemail to its account. It means, that it still getting notifications from
the SIP switch, while it did not send any KA and REGISTER requests anymore.
Something is definitely wrong with thread, that process INVITE requests after
phone was in sleep mode for a while.
Original comment by yok...@gmail.com
on 2 Jul 2012 at 7:17
It could be very interesting to have long running logs in this case.
I introduce an option in "interface and integration" section (while in expert
mode).
To log inside a file. It could be very useful to see how the app behaves on
long term (while regular logs are rotated, this one is put in a file).
When you reproduce and go out of logs you can go in the sdcard in
csipsimple/logs folder and find the log of the day. I would be very interested
if you can send it just after you get this problem.
Maybe there is something that occurs in logs in this particular case : I mean
when it stops sending KA... maybe some socket error it's not able to recover.
I think that in other case logs will not be so helpful because I guess that
things occurs but just too late because the app is woken up by system only
later when phone cpu is globally woken up.
For now I didn't reproduced yet here with the settings, but as not the same
device and the same network maybe it occurs less frequently. So I keep it in
the same configuration than your so that I can try to see what happens on
android suspend state when it occurs.
Original comment by r3gis...@gmail.com
on 2 Jul 2012 at 9:25
Downloaded the latest r1667. Then I set: "Settings | User interface | Advanced
Tweaks | Log directly to a file" to "Yes". Run it in sleep mode for next 1.5h.
Then I made a test call.
Log from SIP switch (FreeSWITCH or FS for short) indicates:
1. Before the call KA and REGISTER are sent regularly
2. CSS ignores bunch of INVITE requests. In 30 sec FS redirects call to VM and
after VM IVR starts its prompts, it sends the last INVITE message to CSS (FS's
bug, it should not do that at this time)
3. CSS suddenly wakes up and sends "100 Trying", for which FS sends "CANCEL"
immediately back to CSS.
4. CSS ignores that last message too and sends 6 "180 Ringing" messages in a
row to FS.
5. Then finally FS gets "200 OK/CANCEL" message from CSS, followed by "487
Request Terminated" message.
6. After the call CSS sends REGISTER request to FS...
To collect the log I attached USB, changed it mode to Drive and tried to see
the log file. There is ".\CSipSimple\logs\logs_07-02-12.txt" file, but it has
size 0 bytes. Unmounted and disconnected from USB. Waited for SD to settle down
and opened CSS. Turned off its log option (see beginning of this post) in a
hope, that it could flush collected data from memory into the file and checked
the file again. It still empty...
BTW, could you please use ISO standard for date stamps in the log file names?
So, if I collect them in a while, they would be sorted correctly... ISO
standard uses '-' delimiters (that delimiter distinguishes it from other date
formats, that could use dots - EU, slashes - US, etc). First comes year,
followed by month and then day. (for example "logs_12-07-02.txt" instead or
"logs_07-02-12.txt") It's easier to maintain files named this way in organized
fashion.
Original comment by yok...@gmail.com
on 2 Jul 2012 at 11:33
BTW, to reduce some potential ambiguity with old log files collected before the
proposed date format change you may consider to use short date format (I use it
all the time). For example: "logs_120702.txt"
It works perfectly in cases if there is a lot of files in one folder and files
have date stamped names. They are always in a proper order and it's easy to
find what you need even if there is many of them gathered in one place...
Original comment by yok...@gmail.com
on 2 Jul 2012 at 11:47
Updated CSS to r1671, then rebooted the phone and repeated the test. Now I was
lucky and log file was created. I've attached it to the mail, I send you
directly.
Again, during the last 1 hour of phone being in a deep sleep, CSS was sending
KA (every 2 min) and REGISTER requests (every 10 min) on a regular basis (it's
from FS log, CSS log doesn't show that at all!)
I made the test call. CSS did not ring. Then, when VM has answered, it suddenly
woke up and start ringing. It rung until I click on call end button (otherwise
it would ring forever, please fix it, because it looks like a bug).
From FS's log:
19:58:34, 58:35, 58:37, 58:41, 58:49 - FS sends 5 INVITE messages. CSS drops
all of them.
19:59:05 - Call is answered by VM (voicemail)
19:59:34 - after I left message, call with VM is ended (FS sends BYE message)
19:59:36, 59:37, 59:39, 59:43 - FS sends 4 NOTIFY messages to CSS
19:59:43 - CSS sends first respond with "100 Trying" message
19:59:44 - CSS sends 6 "180 Ringing" messages
19:59:44 - CSS sends 4 "200 OK/NOTIFY"
19:00:44 - CSS sends "180 Ringing"
19:00:54 - After I clicked on end clall, CSS sends 11 "603 Decline" messages
I'm sorry to mention that, but log, generated by CSS, is close to useless.
Date/time stamps are messed up (what is the "Dec 31" date, for example?) There
is no any indications of CSS sending KA and REGISTER messages with
corresponding timestamps. But anyway, I hope it will somehow help you to nail
down the problem...
Hint: at 19:59:43 first respond message, coming from CSS, was "100 Trying"
message, followed by 6 "Ringing" messages. It was sent at the moment, when it's
time to send reply for "NOTIFY" message. It tells me, that the old and now
obsolete "INVITE" message was somehow stuck in some queue and was actually
processed only now...
Original comment by yok...@gmail.com
on 3 Jul 2012 at 4:19
The 31 dec logs are from android system !! Not from csipsimple. It's just
because the rom logs some wrong thing when some add the alarm in relative time
instead of absolute. If you have a look "org.jraf.android.nolock" that is not
me also produce the same kind of log.
To activate the logs about timer you have to switch to verbose mode (use log
level 5 instead of 4 -- or even level 6).
That's weird that logs are not collected in direct file mode. I'll try to have
a look on that. And you're right good idea to rename and have different naming
for logs direct from pjsip and logs from logcat. Normally this mode is better
because it only logs pjsip and has no limits. Using this mode you should not
try to use the help menu however but only the things in settings. So your way
to get logs from usb is the good one, and no need to stop from help menu
before. You only need to go in settings, "user interface/integration" section
and set log level to 5 and check the "Use log file" (I maybe forgot to tell
about the change of log level... could be the reason for the 0 octet file log).
About everything that comes after the droping of the invite message it sounds
to me somehow logical. Would be great that FS send some error when it sees that
the app is talking about a SIP INVITE that no longer exists, rather than just
replying nothing and let the app resend things.
Original comment by r3gis...@gmail.com
on 3 Jul 2012 at 8:34
log files name will be now in yy-MM-dd_kkmmss after r1674. BTW, I also did the
change for all other files (such as call recording files and backup/restore
files).
Original comment by r3gis...@gmail.com
on 3 Jul 2012 at 9:42
Thanks a lot for that!
BTW, I started watching CSS communications with FS using Wireshark. CSS may
transmit completely unnecessary packets. Here is what I've noticed watching
just common REGISTER request/reply sessions:
1. Case
CSS-FS - REGISTER request
FS-CSS - 401 unauthorized
CSS-FS - REGISTER request
CSS-FS - REGISTER request <- why it was sent?
FS-CSS - 200 OK
2. Case
CSS-FS - REGISTER request
CSS-FS - REGISTER request <- why it was sent?
FS-CSS - 401 unauthorized
CSS-FS - REGISTER request
FS-CSS - 200 OK
CSS-FS - REGISTER request <- why it was sent
FS-CSS - 200 OK <- unnecessary induced reply
What is actually expected:
CSS-FS - REGISTER request
FS-CSS - 401 unauthorized
CSS-FS - REGISTER request
FS-CSS - 200 OK
If it's not a problem created by CSS, please convey the message to PJSIP
developers.
Original comment by yok...@gmail.com
on 3 Jul 2012 at 9:11
The second register is probably just a retry because server is too long to
reply, no?
In case 2, sounds the "Allow contact rewrite method" is activated. In this
case, the first register is probably made with private IP. In the 200 ok, pjsip
detects IP announced in the first register is private and that a public one is
detected. So it sends a new register with the actual "public ip" (at least the
one for your server), so that the registration contact is accurate. This
registration also unregister the previous one. (the "contact rewrite method"
option set to legacy mode allows to send an unregister first and a new register
then).
If you don't want that (because your sip server already manage that on its
side), you can disable the allow contact rewrite option in expert setting mode.
You could also host a stun server on your sip server so that the "public ip"
(from your server point of view) is already accurate at first registration.
So not a problem in CSipSimple; but I don't think it's a problem with pjsip
too. Depending on configuration it can produce that. And the second register is
probably just a simple retry because of network latency (if it has time to send
a second one, there is probably delay). This is normally harmless and based on
SIP specifications about retries.
Original comment by r3gis...@gmail.com
on 3 Jul 2012 at 9:21
No, server replies almost immediately. Delay between the 1st and last messages
is close to hundreds of milliseconds. Sorry that I forgot to mention it my post
earlier.
Thank you for the tip to clear check at "Allow contact rewrite" (you're right,
it was checked before). I'll watch for any drawbacks, associated with cleaning
of this option... Any hints on what it could break?
Today's morning test did not wake up CSS at all. While it's still sending
KS/REGISTER requests, it did not react on a bunch of incoming INVITE messages
at all... I'll try to repeat it tomorrow morning (now with CSS log turned on).
Original comment by yok...@gmail.com
on 4 Jul 2012 at 4:46
After a long time running in deep sleep mode, CSS starts missing some KA
packets. Look at this picture. It's from Wireshark's log. Missed KA packets are
pointed with red arrows there:
http://img29.imageshack.us/img29/1710/csska.png
Usually, KA and REGISTER come on a regular basis.
Here is continuation of the log above (beginning from packet #44). At this time
I made first test call (after keeping phone in sleep mode during the night).
Call was missed completely.
http://img204.imageshack.us/img204/994/cssmissedcall.png
Watch for timestamps on messages. I hope they may give you some hint what's
wrong with it.
Original comment by yok...@gmail.com
on 4 Jul 2012 at 8:07
Interesting, the call miss seems to correspond to a missing KA packet. Maybe
the root cause is something wrong with the KA process.
I should try to rework that to make cleaner ids of android "alarms" that are
supposed to process KA wake ups. Logs from pjsip would be interesting to see in
this case if we see some complains about not found timers. Since the ka ids are
based on memory mapping it could behaves differently on different devices and
explains why I don't reproduce.
Original comment by r3gis...@gmail.com
on 4 Jul 2012 at 8:21
I misplaced some red arrows on my 1st picture. The missed KA packets are
actually before packets: #10, #15 and #32.
Original comment by yok...@gmail.com
on 4 Jul 2012 at 8:51
Yes I noticed, what's very interesting is that one KA gets broken, it only
restart after a register. BTW, it could be explained by the fact KA are started
after registration, and that one KA starts the next KA timer.
So I think that's a good clue. Maybe there's something wrong with the timer
implementation I did for android that randomly break timers. This would imply
that randomly some of the timer is dropped and so in case of KA it can be
recovered by the next registration but in case of a registration it's never
recovered and could lead to not be registered after some time.
Original comment by r3gis...@gmail.com
on 5 Jul 2012 at 8:31
KA timers are started by a previous KA request or by REGISTER request. That's
understandable. It may explain why if KA timer is broken (and we don't know
exactly why it could happen, but anyway) the next KA request was missing before
packet #10 and $32. But it did not explain the missed KA request before the
line #15 though, where KA was somehow restarted again... May be timers are not
at fault here. May be times were shot at all times, but the missing KA packets
were not sent (by some other reasons).
About timers sending for REGISTER requests. One night (actually two nights
before this one) I've noticed, that REGISTER requests have suddenly stopped 2
times per night for about 45 min each. Then the sequence was somehow restarted
again. It means that timers for sending next REGISTER requests are started not
only by those requests alone, but by some other mechanism too. Or, as in the
case with KA, perhaps timers work, but the requests were simply not sent...
Here is another case of CSS not picking the call.
Picture #3: http://img820.imageshack.us/img820/803/cssmissedcall2.png
It suddenly wake up at #75, makes a very brief ring. But it's too late. At that
time call was already redirected to voicemail. Notice the precise timestamp,
when CSS woke up this time - it's 12:23:47 (it's not the time for next KA -
12:23:52). Is it the time for next REGISTER request - 12:23:46? Then, why the
next register request was slightly (2 sec) delayed this time - #87 @12:23:49?
On the pic#3 you can see second (and unnecessary) REGISTER request at #92 too.
("Allow contact rewrite" is unchecked at this time)
Original comment by yok...@gmail.com
on 5 Jul 2012 at 8:05
Currently running r1678.
After a long time not touching the phone I made 3 test calls. And after that
CSS has broken its IP connection for next 1.5 hour...
See this picture #4: http://img833.imageshack.us/img833/3532/cssmissedcall4.png
As usual during whole night CSS sent regular KA, REGISTER requests. You can see
the last such request sent at 11:34:26.
1st test call was made at 11:41:16 (packet #76). CSS did not respond. But at
11:42:27 it sent its next (and last) KA packet.
After that there is no KA and/or REGISTER requests at all. In 1.5 hours I
finally opened it and it started sending REGISTER requests by itself (as
nothing was happened before). You can see it at 13:18:59.
So, why CSS has stopped receiving INVITE packets, when I made 1st call? And why
it stopped sending any packets (REGISTER and KA) after I made 2nd call at
11:42:50 (packet #83)?
Here is log created only at 13:18 time (there is no data in the log about any
earlier events):
-----
D/WifiService( 213): ACTION_SCREEN_ON
D/WifiService( 213): enable and start wifi due to updateWifiState
V/AlarmManager( 213): Adding Alarm{45735328 type 2 com.csipsimple} Jan 02
05:00:39 am
V/AlarmManager( 213): Alarm triggering: Alarm{45735328 type 2 com.csipsimple}
V/AlarmManager( 213): Adding Alarm{456a8fb8 type 2 com.csipsimple} Jan 02
05:00:40 am
V/AlarmManager( 213): Alarm triggering: Alarm{456a8fb8 type 2 com.csipsimple}
V/AlarmManager( 213): Adding Alarm{45651890 type 2 com.csipsimple} Jan 02
05:00:40 am
V/AlarmManager( 213): Alarm triggering: Alarm{45651890 type 2 com.csipsimple}
--------- beginning of /dev/log/main
D/libpjsip( 470): 13:18:35.194 pjsua_call.c .Incoming Request msg
INVITE/cseq=30464021 (rdata0x3517b4)
W/libpjsip( 470): 13:18:35.195 sip_transactio ..Unable to register INVITE
transaction (key exists)
D/libpjsip( 470): 13:18:35.195 pjsua_core.c ..TX 336 bytes Response msg
500/INVITE/cseq=30464021 (tdta0x27ca40) to UDP IP.ADD.RES.FS:5060:
D/libpjsip( 470): SIP/2.0 500 Internal Server Error
D/libpjsip( 470): Via: SIP/2.0/UDP
IP.ADD.RES.FS;rport=5060;received=IP.ADD.RES.FS;branch=z9hG4bK6F328atg384UD
D/libpjsip( 470): Call-ID: 3b5d9daf-423d-1230-f798-ff424ea7e6fc
D/libpjsip( 470): From: "SIP" <sip:2005@domain.tld>;tag=2HKpNBBHX5ejr
D/libpjsip( 470): To: <sip:2001@IP.ADD.RES.CSS;ob>;tag=z9hG4bK6F328atg384UD
D/libpjsip( 470): CSeq: 30464021 INVITE
D/libpjsip( 470): Content-Length: 0
D/libpjsip( 470):
D/libpjsip( 470):
D/libpjsip( 470): --end msg--
-----
It means, that INVITE message was stuck in the IP stack (for the last 1.5 hour)
and not processed by CSS on a timely manner (perhaps it was received at
11:41:16, see picture #4). And what about stopping sending REGISTER requests at
all after that?
Original comment by yok...@gmail.com
on 6 Jul 2012 at 8:58
Here is case, when CSS missed incoming call and then rung indefinitely...
Picture #5: http://img99.imageshack.us/img99/8412/cssmissedcall5.png
As usual, during the night CSS was sending KA/REGISTER requests regularly,
until I made incoming call (11:30:39, #59). In the middle of the incoming call
(at 11:30:57 #64) CSS sends its scheduled KA packet. After call is redirected
to voicemail SIP server starts sending NOTIFY requests (at 11:31:29 #66). CSS
wakes up and starts ringing (frame #69). After that it will ring forever, or
until I stop it by pressing End call button (at 11:37:59 #88).
BTW, if it happened at night and I simply silenced the phone by pressing VolUp
or VolDn buttons, it would simply discharge the battery by silent ringing
(keeping busy SIP server)...
The log collected during the night and this test call
(logs_12-07-07_114554.txt) has size 120KiB. First 1/3 of it is data gathered
before this line:
--------- beginning of /dev/log/main
and the first actual SIP message has timestamp at 11:34:30 (frame #79 in the
picture, sending RINGING packet). So, it's not useful here at all.
I've noticed that CSS sometimes may produce "pjsiplogs_*" files instead of
"logs_*". How to force one over another? In other words, how to make it create
"pjsiplogs_*" files? And is there a way to increase the size of the log
generated?
Original comment by yok...@gmail.com
on 7 Jul 2012 at 7:53
About logs :
In interface/integration section you have two options about logs :
- The log level. This globally set the logging level of the application (both android and native part). To get interesting logs set it to 4 or 5 (5 could be interesting to get all pjsip timers logs.
- The log to file option : this one introduced recently allow pjsip to log directly into a file. By default, pjsip will log in "logcat" tool that is the official android platform logging utility. As you noticed this one does automatically rotation on logs, and the
--------- beginning of /dev/log/main
line is the beggining of the main logger buffer. What is before correspond to
other loggers.
It rotates automatically. This made it quiet useless if you'd like to get logs
from long running tasks.
That's why the option to log directly to a file instead is there.
In the case it logs to "logcat", no file are created. It's only created when
you go in help menu to send the logs by mail. So if you don't plan to log by
mail, you can also collect these logs using alogcat application (or any other
application on the android market able to collect android logs).
In the case it logs to a file, all pjsip logs are written to a pjsiplogs_* file.
In your case, it's the preferred way since you are trying to collect something
that runs in background. So definitely, don't try to use the help menu anymore.
Instead, go in settings / user interface section and turn log level to 4 or 5,
and check the box about logging to a file. That's it. Don't play again with
help / record logs menu because it will change again these expert settings.
Just leave things like that, and it will produce pjsip logs inside the
pjsiplogs_* files.
These files are not rotated however, so for long term, keep in mind that you
have to disable the option to not run out of free space on the sdcard ;).
Original comment by r3gis...@gmail.com
on 8 Jul 2012 at 5:22
OK, I've tried to get long logs (r1682):
"Settings | User interface | Native dialer integration | Call logs integration"
- yes
"Settings | User interface | Advanced tweaks | Log level" = 1 (I'll make it
bigger when it starts working)
"Settings | User interface | Advanced tweaks | Log directly to a file" - yes
Now I see 3 files:
pjsiplogs_12-07-09_165749.txt
pjsiplogs_12-07-09_171652.txt
pjsiplogs_12-07-09_183139.txt
All of them have size 0 bytes. Timestamps indicate connection time (more
precise, time of first registration after setting change or location change).
Do you flush those logs on a regular basis? (like every 4K or on time manner)
Original comment by yok...@gmail.com
on 10 Jul 2012 at 1:57
>> "Settings | User interface | Advanced tweaks | Log level" = 1 (I'll make it
bigger when it starts working)
If you don't start bigger it will never start working ! ;). If the log level is
1, it logs *ONLY* errors ;)... ( 1 = Errors only, 2 = adds warning, 3 = adds
info, 4 = adds debug, 5 = adds verbose).
So your files will have a 0 byte size. It's absolutely normal ;)
Start with 4 or 5 as I said, you'll see that logs will be significantly more
interesting :)
Else, the next nightly build will introduce a cleaner way to manage timers. It
will be interesting to see how it goes with this new approach on this problem.
Original comment by r3gis...@gmail.com
on 11 Jul 2012 at 9:57
r1688 - logging to file doesn't work. I set Log Level = 4 (debug), waited for
the whole night, made test call and there is nothing in the log. :( The latest
log (timestamped yesterday's evening) is 8KiB in size...
How to make logging to work reliably?
Today's test failed again (as all test calls made past several mornings). It's
documented as Wireshark's screen shot (similar to the already published above)
and as a log, made by FreeSWITCH. I'd like to share with you the log, created
by CSS, but there is no one that has any data about the test at all...
Original comment by yok...@gmail.com
on 14 Jul 2012 at 7:01
Forgot to mention, why CSS's logs make it so hard to read in any text editor
without preliminary editing (inserting proper 'new line' symbols)? Why log
files use '0x0A' as a line delimiter? I know for sure that CSS can make '0x0D,
0x0A' as a next line code. For example, there is: '0D 0A 0D 0A 0A 2D 2D 65 6E
64 20 6D 73 67 2D 2D 0A', which in plain text looks like '--end msg--'. As you
see at the beginning it uses conventional '0D 0A' as new line. But in the
middle and at its end it uses '0A'. Can you actually see CSS's generated logs
with any plain text editor (e.g. Notepad, EditPlus, etc)? I'm asking, because I
can't...
Original comment by yok...@gmail.com
on 14 Jul 2012 at 7:36
I'll try to debug why log level set to 4 + log to file option activated doesn't
work.
I also observe some weird thing : the file logger seems to cache writing and
only write by block when it has cpu activity given. I guess that's an
optimization of pjsip but in our case very annoying.
About the CR vs LF vs CRLF, android is a linux system so it adds 0x0A (\n) as
end of line of logs. It's also how pjsip adds logs, it's not proper to
csipsimple.
As I'm running on linux, I never had any problems with the fact it adds only \n.
Anyway, on linux, all editors we have are clever and detects ends of line
dynamically and are able to support CR (mac), LF (linux) and CRLF (windows)
seamlessly.
EditPlus or advanced editors should have an option to autodetect... On linux
even basic command line programs to edit files like "vim" are able to detect
end of line dynamically. ;)
Original comment by r3gis...@gmail.com
on 14 Jul 2012 at 8:30
After a closer look on android source code the log file is not flushed and it's
indeed an optimization (else the app is extremely slow).
It's only flushed at the end of logs which occurs if you go in option and press
back button (it will finalize the file and produce a new one).
For the root problem I'll do a commit with a try to re-introduce the thread
count option. By default it's now 0 to reduce as much as possible battery
impact. But maybe it was not a good idea on some devices.
Original comment by r3gis...@gmail.com
on 14 Jul 2012 at 9:32
The thread count option is now available again. To change it, in expert setting
mode, in Media section you'll see a new option "Thread count". By default it's
0. Try to change it to 1. It could maybe help with the root problem.
Original comment by r3gis...@gmail.com
on 15 Jul 2012 at 10:00
r1691 - yesterday I finally got lucky and CSS made its log, containing
debugging events inside. :)
There is no any error code at the time of the test call (log level - 4).
Here is sequence of events (1005 calls 1002) with timestamps:
14:41:57.903 - 1002 sends REGISTER request. Next REGISTER request is supposed
to be at [14:51:53].
14:50:32 - 1005 calls and 1002 misses the call.
14:51:22.534 - 1002 wakes up, processes all 6 INVITE requests and sends 6
Ringing messages to FS. But at this time, it's too late - the call was already
redirected to VM.
14:51:53.543 - 1002 sends REGISTER request.
Here is my vision of the problem:
CSS has a thread, that (perhaps?) is always active (it sends KA requests every
2 min and REGISTER requests every 10 min). This thread DOES NOT wake up another
CSS thread, that can process incoming packets (6 INVITE messages) and they are
just sitting in the IP stack... Later on the latter CSS thread wakes up and
starts processing those messages from the stack (sending 6 Ringing responses to
FS).
I guess the problem is with the fact that packet receiving thread doesn't wake
up thread that could process INVITE request...
BTW, we have to do something with endless ringing, occurring in such cases. CSS
makes ringing in behalf of the obsolete call (call is already abandoned by SIP
server) and never stops. Imagine if you left your phone on your desk or in a
locker room... It will ring until it will deplete the battery. Not good ;) I
guess may be some timer (2 min or something) to cancel the abandoned call would
be appropriate here.
Original comment by yok...@gmail.com
on 15 Jul 2012 at 6:19
I'll set "Thread count" to 1 and check it tomorrow with latest r1696.
Original comment by yok...@gmail.com
on 15 Jul 2012 at 6:23
Running r1696 with "Thread count" = 1 and loglevel = 5.
Test call (after CSS was at sleep during the night):
1) 11:02:05 - made test call
2) 11:02:24 - CSS starts processing all of 5 INVOTE requests received earlier
(see p.1)
3) 11:02:25 - CSS wakes up and starts ringing
4) 11:02:36 - I pressed Call Answer button, but at the same time call was
canceled (redirected to VM)
5) 11:02:59 - CSS got notification about voice message in its VM
6) 11:03:00 - CSS sends next REGISTER request as usual
Here is picture of WS log:
http://img401.imageshack.us/img401/7404/cssmissedcall10.png
Here is cut from CSS log related to the test: http://pastebin.com/6sKm47EC
1. CSS sends REGISTER requests every 10 min
2. CSS sends KA requests every 2 min
3. I did not remove any lines from the log, just added some new empty lines for
better readability.
4. There is no errors during the tests call.
Question:
CSS still not process 5 INVITE messages at the time, they arrive. It suddenly
woke up at 11:02:24. Why it woke up at that particular time? Next REGISTER
request is scheduled at ≈11:03:00 and not at 11:02:24.
Original comment by yok...@gmail.com
on 16 Jul 2012 at 8:36
Thanks for the test.
Indeed the fact it wakes up at 11:02:24 precisely is very weird. One thing that
comes in my mind : it's maybe the entire android wifi driver that was just
sleeping and wakes up only at this time and then notify and push all datas to
csipsimple.
In wireshark logs do you see some ICMP packets?
It could be interesting to test with TCP too. As TCP layer provides hacks from
the very lower layer, it may allow to determine if what is not replying is
pjsip/csipsimple or android itself that is in some PSP mode (power save
polling).
I think that if we see that the TCP layer shows reply from android but that the
app is not woken up it's probably something within the app. If we see the TCP
layer replying only when the app wakes up it probably means that we have to
search something with the android transport and to try to put it in another
mode where requests are processed immediately or to see if it's not possible to
reconfigure the device so that PSP mode is more aggressive (I know that it
depends on devices and manufacturers settings -- normally it should not go more
than 5 seconds but some manufacturers plays on that to save more battery)
Side point : I don't forget the issue where the app remains in ringing mode. I
also observed that for other root cause and it's something important I have to
fix. I'm trying to find the more robust way to do that.
Original comment by r3gis...@gmail.com
on 16 Jul 2012 at 9:01
If I set transport both UDP and TCP, CSS uses UDP. If I try to deny (un-check)
UDP, CSS doesn't register to SIP server. CSS log indicates:
--------
16:15:47.577 pjsua_acc.c .Acc 1: setting registration..
16:15:47.577 pjsua_acc.c ..Unable to generate suitable Contact header for
registration: Unsupported transport (PJSIP_EUNSUPTRANSPORT) [status=171060]
16:15:47.578 pjsua_acc.c ..Unable to create registration: Unsupported
transport (PJSIP_EUNSUPTRANSPORT) [status=171060]
--------
What I'm missing here?
Original comment by yok...@gmail.com
on 16 Jul 2012 at 11:38
Sorry I didn't precise how to configure an account to TCP :)
In fact it's an account setting. The global setting is just to tell whether the
entire stack should or not start the transport. Then accounts will use the
transport if necessary.
So in global settings you should leave both TCP and UDP enabled.
To configure your account to use TCP, go in accounts view and transform your
account into an advanced account (long press the account row, choose wizard and
choose advanced) (if your account is already expert one, the option is also
available in expert mode). Then in account settings in advanced mode you'll
find the option about TCP (it's TCP option if advanced, and Transport > TCP if
expert).
Original comment by r3gis...@gmail.com
on 17 Jul 2012 at 8:33
[deleted comment]
Thank you for instructions. I set account to use TCP and it works well.
Meantime I made yet another test call (r1698, UDP). Result is similar:
http://img708.imageshack.us/img708/8400/cssmissedcall11.png
11:01:26 - CSS sends REGISTER request
11:09:59 - test call begins, CSS misses 6 INVITE messages. I left message in VM
11:10:47 - NOTIFY message is sent to CSS (new voicemail)
11:10:48 - CSS wakes up and sends 6 Ringing messages back to FS
11:11:21 - CSS rends next REGISTER request
And we see again, that CSS can start processing INVITE requests [11:10:48]
before it wakes up for sending next REGISTER request [11:11:21].
Log file doesn't show anything why and how CSS woke up. Here is cut from the
log. You see the last KA packet sent on schedule, followed by sudden CSS wake
up to process the old INVITE message:
----------
11:09:26.676 timer_android. Registered thread timer_thread_640
11:09:26.676 timer_android. FIRE timer 0 of heap 5
11:09:26.676 pjsua_acc.c Sending 2 bytes keep-alive packet for acc 2 to
IP.ADD.RES.FS:5060
11:09:26.677 tdta0x4153b8 Destroying txdata raw
11:09:26.678 timer_android. Scheduling timer 0 of 5 in 120000 ms
11:09:26.691 timer_android. FIRE done and released
11:10:47.533 sip_endpoint.c !Processing incoming message: Request msg
INVITE/cseq=30938235 (rdata0x27cecc)
11:10:47.533 pjsua_core.c .RX 1273 bytes Request msg INVITE/cseq=30938235
(rdata0x27cecc) from UDP IP.ADD.RES.FS:5060:
INVITE sip:1002@IP.ADD.RES.CSS:35709;ob SIP/2.0
----------
I'll test CSS using TCP transport later on and let you know results.
Original comment by yok...@gmail.com
on 17 Jul 2012 at 7:27
I can't get incoming calls with CSS running with TCP transport mode. See new
issue, describing the problem:
http://code.google.com/p/csipsimple/issues/detail?id=1830
Original comment by yok...@gmail.com
on 17 Jul 2012 at 11:58
Well, after setting "Allow contact rewrite" option to "yes" I was able to
receive calls all the time. And it looks like it works reliably (so far ;) ).
But another day, another story... Now, when I start getting incoming calls
reliably, I've noticed another problem. CSS indicates that call, that I've just
answered, was missing... every time I accept the call. Moreover, if I keep call
active more then 1 min, CSS drops the call and sends BYE message to SIP server.
Why is that? I've discovered, that now FS refuses to send ACK messages back to
CSS. And that causes CSS to send 10 more 200 OK messages to FS, followed by
BYE...
At first, I begun to blame FS. Why it stops sending ACK messages to CSS?
Googling the problem returned nothing... Then I tried to make very similar
call, but now in the opposite direction - from CSS to OBi100 ATA device
(working with TCP transport). All calls were fine. Looking at FS logs I see
that FS sends to OBi100 ACK messages all the time.
So, what makes FreeSWITCH to stop sending ACK messages back to CSS after
receiving 200 OK/INVITE? And why if I toggle in CSS transport from TCP to UDP
everything returns back to normal and FS start to send those ACK messages
without any hesitation? Do you have any idea? Have you noticed this problem
before?
Original comment by yok...@gmail.com
on 21 Jul 2012 at 3:57
Great :) Good news for reliability. So it sounds something with UDP only, and
it's probably linked more to the lower android network layer than to pjsip or
something I changed for android (I didn't modify things about TCP/UDP).
About the missing ACK it's very interesting. I raised recently something to
pjsip guys about tag comparison that is involved in ACK process. They fixed it
but was a big change. On my side I didn't noticed regression, but not to
exclude it introduced a regression.
Can you collect some wireshark trace from the FS side? (Or some logs from
csipsimple). Hopefully there is some clue on why the 200OK of pjsip is not good
for freeswitch.
Original comment by r3gis...@gmail.com
on 21 Jul 2012 at 8:47
Here is log collected by Wireshark yesterday:
http://img155.imageshack.us/img155/3686/cssmissedcall12.png
And here is CSS's log: http://pastebin.com/LhnUEBaE
It has only two 200 OK/INVITE messages, but nothing was changed since then...
Original comment by yok...@gmail.com
on 21 Jul 2012 at 7:13
Talking about TCP and reliability. Today's morning I've noticed problem. Before
that CSS was sending REGISTER requests regularly (last time @11:15:26).
1. I made incoming call. At 11:15:29 FS sends INVITE message, which CSS
suddenly ignored at this time. TCP layer dropped staled connection.
2. 11:18:27 - CSS suddenly starts ringing, sending (it thinks, it does) replies
to FS
3. 11:20:00 - I stop CSS ringing by pressing "End Call". CSS indicated NO
MISSED CALLS after that (and why is that?)
4. 11:25:22 - CSS sends REGISTER request (how it's possible, if TCP is broken!)
5. 11:25:58 - same as above
6. 11:27:31 - TCP connection restored and CSS finally sends REGISTER request.
Why broken TCP is not detected earlier? TCP layer should indicate that since
11:15:29. And upper layer (CSS) should react accordingly! But it still pretend
as it's working as usual (since p.2, 11:18:27)
Wireshark: http://img26.imageshack.us/img26/6678/cssmissedcall13.png
CSS: http://pastebin.com/TTiR7RT3
Original comment by yok...@gmail.com
on 21 Jul 2012 at 8:12
Excellent logs.
Well that's a little bit what I expected (and also feared).
At point 1, note that CSipSimple the TCP layer is NOT broken yet!
Two reasons :
- There is no logs indicating the TCP socket is broken in pjsip (the logs you
see about termination of the Register transaction are not linked with the
transport)
- The line are black, and that the problem is on TCP layer that's trying to retransmit packets !
It's not pjsip that ignores the packet, it's the transport layer. The phone is
unreachable ! You see that the TCP layer of the phone doesn't reply anything to
FS. And FS (actually the server TCP transport layer, because FS is not aware of
this transport OSI layer) has to retransmit. It's TCP retransmissions that we
see here. So on very lower transport layer.
So here, it's not pjsip that lost the packet but it's the lower TCP layer of
the phone. I was not able to affirm for UDP packets (since we can't see lower
layer activity on UDP). But here it's pretty clear : it's the phone lower
transport layer that is just sleeping so deep that it doesn't reply to incoming
packets.
And against that there is nothing the application can do because it's not at
application OSI layer, it's at transport layer.
Also, at this point, it's not yet broken. TCP is made to make reliable
transport of packet. And that's not because of the fact some packet are not
received on the other side that the link is broken. On a good and fast linux
machine, if you unplug and replug a cable while downloading you should be able
to recover the download without needing the application to recover by other
applicative ways.
Also interesting point is that it's not about CPU, because CPU is activated by
CSipSimple at 11:16:40 and 11:17:31. So it's really about the network driver
that is really asleep regardless CPU activity.
Point 2. (or just before point 2). It's very very interesting too here. If you
have a look to pjsip logs you'll see that at 11:18:26.847 the application sends
a keep alive packet. It's pjsip that decide to send something on the TCP
socket. And actually it seems to succeed... the TCP socket is not broken !
And not only that but it seems to also wake up the network driver that was
asleep before. And at this time the driver will be able to proceed the incoming
TCP at 11:18:27... just when we see the wireshark log about TCP continuation.
It leads to point 2. incoming packet is read properly by pjsip and it tries to
reply.
So it seems back to normality at this point.
Unfortunately, when it tries to reply at 11:18:27, we can see a RST of the TCP
socket. In your wireshark logs I don't see the difference between incoming and
outgoing packets so hard to say what is the side that RST the socket.
Anyway, here it becomes weird. At this point, normally pjsip should be informed
of the fact the TCP socket has been broken. Apparently from logs it's not. As
it doesn't see that the TCP socket is broken now (it was NOT broken before); it
just retry. Apparently the lower unix socket layer doesn't inform the stack
that there is a broken pipe. This is very very strange. This could be something
in the sip stack or something in the network driver. As I've already observed
broken TCP pipe detected in pjsip and as the network driver seems to already
have weird behavior in point 1 to 2, I would more think about something in the
android network driver... but I can't for sure affirm that.
For point 3. It seems normal behavior for me. If you manually decline a call,
it's not missed. It's incoming not replied but not missed (it should appear in
call logs however, isn't it?)
Before point 4. Now as we've seen previously TCP socket is broken. (from point
2 and not from point 1 ! but it's broken now). However, pjsip seems to continue
trying to send keep alive packets. These packets are not seen on wireshark logs
which confirms that the TCP socket is not endpoint to endpoint available
anymore. However still weird that android linux lower layer doesn't report the
fact the TCP socket is broken.
For point 4 and 5. 11:25:22 and 11:25:28. We are exactly in the same case than
in my previous point. Nothing changed, pjsip thinks the socket is still alive
(and apparently so does the lower linux socket layer). So it just sends the
packets. It could be very interesting to see tcpdump logs from the phone
itself. Maybe actually the phone has still a tcp socket connected to the
router. In which case it's a router in the middle that does bad work.
Before point 6. You can notice some very interesting log :
11:27:04.167 tcpc0x2cec3c TCP transport destroyed normally
...
This comes from when the linux socket layer saying to pjsip that the TCP socket
is finished !! It only gets the information now !!
So now that pjsip knows from the linux socket layer that the socket is down...
it can fix it and create a new socket !! And at this point it will recover
things normally.
So, to sum up :
The socket is broken at 11:18:27. When we see RST in wireshark logs.
The reason for this reset is not clear, but what's sure is that pjsip is
informed about that only at 11:27:04. Meanwhile, pjsip and apparently the linux
socket layer both thinks the socket is still alive. Packets can be sent but
never reach your SIP server. Two possible reason in my opinion :
* the network (wifi?) driver get broken due to some activity during the time it
was completely asleep.
or
* there is some network equipment in the middle that is broken due to the same
previous point. This network equipment leave one socket open on the side
connected to the phone but has it broken on the side of the sip server. This
solution is maybe more probable.
In any case, I don't think it's something with application since everything
here occurs at transport layer level. Besides the application detects some TCP
termination (very late but it does). It's not an error termination but some
natural ending of the TCP socket. Pjsip doesn't invent this event and it's not
on writing something. So it comes from outside... either the network driver or
the router in the middle that finally decided to cut this TCP socket.
What do you think?
Side note and possible solution. Apparently you reach this case because of the
deep sleep mode of the phone that is recovered by the fact pjsip sends the keep
alive. So maybe (even if crappy and battery draining), it could be a solution
to reduce TCP keep alive interval.
Original comment by r3gis...@gmail.com
on 21 Jul 2012 at 9:17
Here is yet another log, indicating failure of CSipSimple (r1736) to take
incoming call.
http://img696.imageshack.us/img696/9238/cssmissedcall14.png
Note:
1. 10:43:01 - call arrives
2. 10:43:49 - CSS starts ringing (call was already transferred to VM) and then
rings endlessly
3. 10:43:51 - CSS sends REGISTER request. "Contact" contains WAN IP of CSS
4. 10:43:55 - CSS sends REGISTER request. "Contact" contains LAN IP of CSS
5. 10:43:55 - CSS sends REGISTER request. "Contact" contains WAN IP of CSS
Problems:
1. CSS was too late to take incoming call again. TCP was not reset and
connection was re-used in next communications.
2. Why CSS sends REGISTER request (p.4 above), pointing to its LAN address?
First bunch of black requests (7) on the Wireshark's log are coming form FS to
CSS, while the second bunch (7) are coming back from CSS to FS.
CSS has not produced any log at this time (log option is always set to ON). I
have logs made before and after the event, but not at the time, the event has
happened...
To reproduce the problem I suggest:
1. Set registration time to 10 min (or more), keep CSS running over night.
Don't touch phone.
2. Run Wireshark at SIP server side. Watch for couple of REGISTER requests
coming form CSS to SIP server on regular basis. Between REGISTER requests see 3
"Continuation" requests coming every 3 min from CSS.
3. Calculate the time CSS is going to send next REGISTER request (it's easy, if
you set expiration time to 10 min). Make test call to CSS 1 minute earlier then
that time and watch...
Original comment by yok...@gmail.com
on 26 Jul 2012 at 10:56
About problems :
1. Indeed, it seems like in previous trace be something on transport layer. TCP
packets are mark with retransmission packets as if it's the lower layer that
doesn't respond. I thought again about the problem after latest analysis and
thought that it was not totally sure it could not be something with pjsip that
is not in select mode on the socket. Something that could be interesting in
this state is to try to run a command line "ping" from the sip server to the
device IP address. Just to see it's alive to ping request. (also ensure before
that ping works when the phone is not sleeping to be sure ping are not blocked
by something in the middle ;) ).
2. Are you sure the p.4 is not a Register with "Expires : 0"? It could be an
unregister of a previous registration. Even if it's very weird that it does
that not at startup. Unless the sip stack is restarting... but it should not
since normally it's with a current call ongoing here.
I hope that's not a crash of the sip stack. It's the only thing that could
logically explain that if it doesn't have an Expires : 0.
Do you have corresponding csipsimple logs? (You can send it by mail if you
prefer for your privacy). Or alternatively, did you observed a crash /
unexpected close of the call screen ?
Original comment by r3gis...@gmail.com
on 27 Jul 2012 at 8:52
> Something that could be interesting in this state is to try to run a command
line "ping" from the sip server to the device IP address. Just to see it's
alive to ping request. (also ensure before that ping works when the phone is
not sleeping to be sure ping are not blocked by something in the middle ;) ).
The problem here is - CSS is behind NAT router. Ping may reach only WAN address
of the router, not CSS.
> 2. Are you sure the p.4 is not a Register with "Expires : 0"? It could be an
unregister of a previous registration. Even if it's very weird that it does
that not at startup. Unless the sip stack is restarting... but it should not
since normally it's with a current call ongoing here.
I'm pretty sure that p.4 was a common sequence of two REGISTER exchanges (each
containing 4 packets, not counting TCP ACK). First provides LAN IP:port in
"Contact" data, while the second (p.5) contains everything the same, except
"Contact" now contains correct WAN IP:port. At this time I'm not sure about p.3
though (sorry, my fault). It could have "Expires : 0". But why? And, talking
about potential sip stack crash - keep in mind, that TCP port (59187) was
exactly the same before and after that time. In other words, TCP connection was
not dropped, or otherwise, the port would be changed.
While currently I'm still waiting for the "Contact" data fix
(http://code.google.com/p/csipsimple/issues/detail?id=1830 - it prevents me
from using CSS with TCP transport, because all incoming calls are terminated by
CSS in 1 min), I'm trying to use Sipdroid and watch, how it runs under the very
same conditions. Today morning I tried to make similar test call and Sipdroid
took the call just fine. There is no any conclusion from this test, of course.
And I'll try to watch it tomorrow and see if it exhibits similar problems as
CSS. Sipdroid is a way less configurable, than CSS. The difference (that I
can't change) here is - with UDP transport it has 3600 registration time and it
sends KA (2 bytes packages, similar to CSS) every 60 sec. But I don't think it
could make any difference to the tests...
Original comment by yok...@gmail.com
on 28 Jul 2012 at 3:34
At this time I can't use TCP transport, because all incoming calls are
terminated by CSS in 1 min (see post above). Thus, I switched back to UDP
transport.
Here is one more example of CSS (r1765) missing incoming call. Look at this
Wireshark log:
http://img705.imageshack.us/img705/3338/cssmissedcall15.png
1. 21:31:24 - normal registration
2. 21:33:26 and 21:35:26 - two KA packets, sent by CSS
3. 21:35:49 - new incoming call
4. 21:37:26 and 21:39:27 - two KA packets, sent by CSS
5. 21:41:20 - REGISTER request sent, 401 Unauthorized message received
6. 21:41:21 - CSS starts ringing...
7. 21:41:35 - incoming call (obsolete at this time) is set to Ignore and CSS
sends multiple (11) 605 Decline messages
As you can see CSS starts ringing almost 6 minutes after the call was
initiated. It's too late at this time...
Note:
KA packets are sent on a regular basis, even after CSS was ignoring new
incoming INVITE messages. It looks like CSS was not waken up by incoming IP
packets. They were received by phone's IP stack, but it did not call CSS to
process them... Here is the problem!
You can reproduce it when:
1. Allow smartphone run without connection to PC or any other device for 20
min. It should be in sleep mode at this time.
2. Watch communications with SIP server (FreeSWITCH) with Wireshark and
initiate next call in couple of minutes after nest registration occurs.
3. Cancel the call and wait until followed registration process will start. It
will make CSS ring...
As you can see, at this time CSS can't be used as reliable phone replacement.
It could miss important incoming calls :(
Original comment by yok...@gmail.com
on 11 Aug 2012 at 5:19
Hi r3gis, I encountered similar problems. After some debugging, I suspect that
the CPU does wake up upon receiving INVITE. This wake-up was triggered by OS.
However, the OS quickly puts the device back to sleep mode again, thus CSS
doesn't have enough time to retrieve the INVITE message from the IP stack and
process it accordingly. At a later time when CPU wakes up again (long enough),
the INVITE is then processed.
I made the above conclusion by doing the following testing:
1. I install a simple app of mine. It simply vibrate(500) every 1 sec in a
while loop. This way, I know when the CPU really sleeps. Testing indicates that
all devices randomly wakes up every second or minutes (even if you don't touch
it for hours)
2. I then start CSS and make calls to it from another SIP client
3.1 Because the OS wakes up device randomly, sometimes the INVITEs are picked
up immediately if coincidently the device is awake
3.2 In other cases I ALWAYS observe brief vibration when a call is made, but
the call is not processed timely because device falls to sleep again (no
vibration). The call would show up if I turn on device screen (meaning force
CPU out of sleep).
Hope this helps.
Robin
Original comment by jingjing...@gmail.com
on 6 Sep 2012 at 10:05
Hi Robin,
Many many thanks for this test very very very interesting ! I didn't thought
about an app that vibrates to indicate sleep mode. Very clever :)
So would mean that the cpu lock made in the app as soon as I receive the
incoming call from the sip stack is made too late. At least on some devices.
(It could explain why I don't reproduce the problem with recent devices).
Maybe could be interesting to see if CPU lock can be done elsewhere before sip
parsing in pjsip directly. I'll try to have a look.
Original comment by r3gis...@gmail.com
on 6 Sep 2012 at 10:23
It seems to me that device ALWAYS wakes up when receiving an incoming packet
(e.g. INVITE). But the waking period could be very brief, in which case the
INVITE is queued in low layer of the stack and doesn't have time to reach app
layer (thus lock CPU as required).
Do you have the same assumption (device ALWAYS wakes up when receiving an
incoming packet)?
Original comment by jingjing...@gmail.com
on 7 Sep 2012 at 3:09
Yes that's the assumption I did.
However I thought it keep the phone wake up for a bigger delay.
In fact the app already take a wake lock as soon as it receive an invite from
the server.
However it's the application layer that do that. And between network and app
layer there is the native sip stack parsing. Maybe this parsing time is too
long for some devices.
Original comment by r3gis...@gmail.com
on 7 Sep 2012 at 3:19
Original issue reported on code.google.com by
yok...@gmail.com
on 26 Jun 2012 at 3:29