Open GoogleCodeExporter opened 9 years ago
Hi,
I noticed the same deadlock in a unit test and it was due to a synchronous
method call from a SessionStateListener registered to my sms receiver.
I didn't apply the patch yet because I didn't really need a
sessionStateListener.
The problem I have now is that in my JBoss-Spring environment (where I have 2
message driven beans using respectively an sms sender-only and an sms
receiver-only, both built on top of jsmpp). I am not sure whether the problem
is conductible to this deadlock or if it's some kind of infinite loop in the
SMPPSession.
The result is that after more or less 15h of activity (not systematically, but
it happens at least once per day) the server CPU goes to 100% and it stops
logging. The last visible lines of log are always from the SMPPSession class:
2011-10-13 08:55:39,435 INFO [org.jsmpp.session.SMPPSession] Thread-85954
PDUReaderWorker stop
2011-10-13 08:55:39,468 INFO [org.jsmpp.session.SMPPSession] WorkManager(2)-3
Connected
2011-10-13 08:55:39,471 INFO [org.jsmpp.session.SMPPSession] Thread-87055
Starting PDUReaderWorker with processor degree:3 ...
Basically it happens during an operation of rebind (session.unbindAndClose(),
session.connectAndBind()), which I do after every 5 minutes of idle time (idle
time is computed from the last received message).
The part of the code that looks a bit "smelly" is the run() method in the
PDUReaderWorker:
@Override
public void run() {
logger.info("Starting PDUReaderWorker with processor degree:{} ...",
getPduProcessorDegree());
while (isReadPdu()) {
readPDU();
}
close();
executorService.shutdown();
logger.info("PDUReaderWorker stop");
}
In the readPDU() method, SocketTimeoutExceptions are caught without being
logged, but the notifyNoActivity() method is called. In case the session is in
a state != bound, nothing really happens and the readPDU() method is called
again.
I don't know if that scenario is possible, but in case the session - for any
reason - caused all the time a SocketTimeoutException and it's state was !=
bound we would have an infinite loop here. What do you think? Would that be
possible?
Is actually anybody maintaining the code of this project anymore?
Cheers and thanks for any help!
Andrea
Original comment by rapow...@gmail.com
on 14 Oct 2011 at 10:14
The problem really boils down to that due to that every method in the
AbstractSessionContext and its children are marked synchronized, it is not safe
to call anything from the session object from another thread (including
unbinding or getSessionState itself !).
This method of global locking is unnecessary. Try the patch, it is applicable
to the library in general, not to my own specific use of it.
Original comment by cpmdxe...@gmail.com
on 14 Oct 2011 at 11:12
Ok I will try your patch and let you know if that worked! Thanks for the quick
answer :)
Original comment by rapow...@gmail.com
on 14 Oct 2011 at 11:14
Hi, sorry was trying to apply the patch and I noticed that it doesn't refer to
the latest release.. did you apply it to trunk? Is it a stable version?
Original comment by rapow...@gmail.com
on 14 Oct 2011 at 11:55
Apply to trunk (r267)
Original comment by cpmdxe...@gmail.com
on 14 Oct 2011 at 3:41
Hi,
I applied patch to trunk and deployed, but the problem does not seem to be
gone..
As explained in my first message, we are restarting the 2 receivers (that
connect to separate SMSCs) every 5 minutes (by calling
session.unbindAndClose(), session.connectAndBind() on both). The 2 senders are
up the whole time (no connection rebind there).
So, technically, on the log we are supposed to see, every 5 minutes, both
receivers starting and stopping their EnquireLinkSender and PDUReaderWorker.
So, something like:
Thread-210674 PDUReaderWorker stop
Thread-211781 Starting PDUReaderWorker with processor degree:3 ...
Thread-211783 Starting EnquireLinkSender
Thread-210676 EnquireLinkSender stop
Thread-210840 PDUReaderWorker stop
Thread-211962 Starting PDUReaderWorker with processor degree:3 ...
Thread-211963 Starting EnquireLinkSender
Thread-210843 EnquireLinkSender stop
I made a lookup within the log to check that every thread-id is visible twice
(start and stop). What I noticed, in the last lines of log (before the server
went to 100% CPU and stopped logging) is that only one PDUReaderWorker is
stopped, while the other PDUReaderWorker and the 2 EnquireLinkSender are still
running (or just didn't log the stop message, which I doubt).
Do you still think this is related to the deadlock you patched?
Cheers and thanks for the help
Andrea
Original comment by rapow...@gmail.com
on 18 Oct 2011 at 9:10
It might be another deadlock, or perhaps my patch was not enough.
Are you unbinding absolutely every 5 minutes (ie: even when there is incoming
traffic ?), if so, unbindAndClose may be deadlocking with something that is
calling a method on the session object from within your attached
ServerMessageReceiverListener.
Try to duplicate the issue by using SMPPSim and a high volume of deliver_sm to
your application, and then keep stopping/starting the client, when it deadlocks
look at the stack on all threads.
In my case, one thread was waiting on getSessionState (which is called from
within my message receiver listener) and another was waiting on unbindAndClose
(called from the main program thread trying to shutdown).
It has to happen at the right moment, it is reproducible 60% of the time.
Original comment by cpmdxe...@gmail.com
on 18 Oct 2011 at 9:20
I see what you mean.. The problem is that the NORMAL BEHAVIOR it would be that
the receivers are stopped only if nothing is received in the last 5 minutes. So
ideally the would almost never be stopped.
However currently in the test environment there is no incoming activity yet, so
we restart systematically every 5 minutes.
On the other side there is some sending activity going on, but the sender has
its own connection and is bound in TX mode. Basically there is no transceiver
in the system.
Original comment by rapow...@gmail.com
on 18 Oct 2011 at 9:29
I encounter the same problem too when using the library to implement a smpp
server. The client cannot bind as receivers again after suddently stopped with
the following logging
2012-07-12 09:44:15,058 [Thread-169] INFO org.jsmpp.session.SMPPServerSession
- EnquireLinkSender stop
2012-07-12 09:44:16,056 [main] INFO jifu.client.Smmc - Wait for SMMC to
connect and listening on port=5017 listenTimeout=10000
2012-07-12 09:44:16,056 [main] INFO jifu.client.Smmc - Accepting SMMC
connection for session 4e5e50ef
2012-07-12 09:44:16,056 [Thread-172] INFO org.jsmpp.session.SMPPServerSession
- Starting PDUReaderWorker with processor degree:10 ...
2012-07-12 09:44:16,057 [Thread-172] INFO org.jsmpp.session.SMPPServerSession
- PDUReaderWorker stop
2012-07-12 09:44:16,058 [main] ERROR jifu.client.Smmc -
java.net.SocketException: Socket closed
2012-07-12 09:44:16,058 [Thread-171] INFO org.jsmpp.session.SMPPServerSession
- Starting EnquireLinkSender
Original comment by qiuji...@gmail.com
on 13 Jul 2012 at 3:18
Original issue reported on code.google.com by
cpmdxe...@gmail.com
on 27 Jun 2011 at 12:40Attachments: