IWSN on portal server won't shut down in time

danbim commented 12 years ago

From the log files

TERM trapped.  Shutting down.
2012-07-10 13:42:40,735 | rService-MessageReaderThread 7 | MessageTools$MessageReader     | DEBUG | Received uninitialized message. That usually happens when the connection was broken.
2012-07-10 13:42:40,735 | rService-MessageReaderThread 5 | MessageTools$MessageReader     | DEBUG | Received uninitialized message. That usually happens when the connection was broken.
2012-07-10 13:42:40,736 | Service-MessageReaderThread 14 | MessageTools$MessageReader     | DEBUG | Received uninitialized message. That usually happens when the connection was broken.
2012-07-10 13:42:40,739 | rService-MessageReaderThread 8 | MessageTools$MessageReader     | DEBUG | Received uninitialized message. That usually happens when the connection was broken.
2012-07-10 13:42:40,740 | Service-MessageReaderThread 10 | MessageTools$MessageReader     | DEBUG | Received uninitialized message. That usually happens when the connection was broken.
2012-07-10 13:42:40,753 | rService-MessageReaderThread 0 | MessageTools$MessageReader     | DEBUG | Received uninitialized message. That usually happens when the connection was broken.
2012-07-10 13:42:40,764 | Service-MessageReaderThread 15 | MessageTools$MessageReader     | DEBUG | Received uninitialized message. That usually happens when the connection was broken.
2012-07-10 13:42:40,777 | rService-MessageReaderThread 1 | MessageTools$MessageReader     | DEBUG | Received uninitialized message. That usually happens when the connection was broken.
2012-07-10 13:42:40,806 | Shutdown-Thread                | IWSN                           | INFO  | Stopping iWSN...
2012-07-10 13:42:40,806 | Shutdown-Thread                | IWSN                           | DEBUG | Stopping DOM observer...
2012-07-10 13:42:40,807 | Shutdown-Thread                | IWSN                           | DEBUG | Stopping application manager...
2012-07-10 13:42:40,807 | Shutdown-Thread                | WSNService                     | INFO  | Stopping WSN service...
2012-07-10 13:42:40,811 | DeliveryWorker 0               | DeliveryWorker                 | DEBUG | http://wisebed.itm.uni-luebeck.de:8080/soap/controller/aHR0cDovL3dpc2ViZWQuaXRtLnVuaS1sdWViZWNrLmRlOjg4ODgvd3NuL0RDRUFBMTJBNUJBNkM4QzY2RTJBNDIwRkNDNDhBMEM3 => Calling experimentEnded() as experiment ended and all messages have been delivered.
2012-07-10 13:42:40,812 | Shutdown-Thread                | DeliveryManager                | DEBUG | Stopping delivery manager (asynchronously)...
2012-07-10 13:42:40,815 | DeliveryWorker 0               | DeliveryManager                | DEBUG | http://wisebed.itm.uni-luebeck.de:8080/soap/controller/aHR0cDovL3dpc2ViZWQuaXRtLnVuaS1sdWViZWNrLmRlOjg4ODgvd3NuL0RDRUFBMTJBNUJBNkM4QzY2RTJBNDIwRkNDNDhBMEM3 => Removing controller endpoint from the set of controllers.
2012-07-10 13:42:40,815 | rService-MessageReaderThread 9 | MessageTools$MessageReader     | DEBUG | Received uninitialized message. That usually happens when the connection was broken.
2012-07-10 13:42:40,816 | Shutdown-Thread                | WSNApp                         | INFO  | Stopping WSNApp...
2012-07-10 13:42:40,816 | Shutdown-Thread                | WSNApp                         | INFO  | Setting ChannelPipeline to default configuration for all nodes...
2012-07-10 13:42:40,817 | DeliveryManager-ShutdownThread | DeliveryManager                | DEBUG | Stopped delivery manager!
2012-07-10 13:42:41,045 | rService-MessageReaderThread 3 | MessageTools$MessageReader     | DEBUG | Received uninitialized message. That usually happens when the connection was broken.
2012-07-10 13:42:41,076 | Service-MessageReaderThread 12 | MessageTools$MessageReader     | DEBUG | Received uninitialized message. That usually happens when the connection was broken.
2012-07-10 13:42:41,077 | Service-MessageReaderThread 17 | MessageTools$MessageReader     | DEBUG | Received uninitialized message. That usually happens when the connection was broken.
2012-07-10 13:42:41,081 | Service-MessageReaderThread 13 | MessageTools$MessageReader     | DEBUG | Received uninitialized message. That usually happens when the connection was broken.
2012-07-10 13:42:41,094 | Service-MessageReaderThread 16 | MessageTools$MessageReader     | DEBUG | Received uninitialized message. That usually happens when the connection was broken.
2012-07-10 13:42:41,112 | rService-MessageReaderThread 6 | MessageTools$MessageReader     | DEBUG | Received uninitialized message. That usually happens when the connection was broken.
2012-07-10 13:42:41,114 | Service-MessageReaderThread 11 | MessageTools$MessageReader     | DEBUG | Received uninitialized message. That usually happens when the connection was broken.
2012-07-10 13:42:41,117 | rService-MessageReaderThread 4 | MessageTools$MessageReader     | DEBUG | Received uninitialized message. That usually happens when the connection was broken.
2012-07-10 13:42:41,785 | rService-MessageReaderThread 2 | MessageTools$MessageReader     | DEBUG | Received uninitialized message. That usually happens when the connection was broken.
2012-07-10 13:42:54,565 | gingService-DispatcherThread 0 | ConnectionService              | INFO  | Exception while trying to establish connection to testbeduzl01.wisebed.itm.uni-luebeck.de:8880. Cause: null
2012-07-10 13:42:54,569 | gingService-DispatcherThread 0 | UnreliableMessagingService     | WARN  | No connection to urn:wisebed:uzl1:0x2202. Dropping message to: "urn:wisebed:uzl1:0x2202"
from: "wisebed.itm.uni-luebeck.de"
msgType: "de.uniluebeck.itm.tr.runtime.wsnapp.WSNApp/LISTENER_MANAGEMENT"
priority: 2
validUntil: 1341920634560
payload: "\b\001\022\032wisebed.itm.uni-luebeck.de"

------------------------------------------------------------------------
Shutdown: Timed out waiting for the JVM to terminate.
The JVM was launched with debug options so this may be because the JVM
is currently suspended by a debugger.  Any future timeouts during this
JVM invocation will be silently ignored.
------------------------------------------------------------------------

2012-07-10 13:43:22,894 | gingService-DispatcherThread 0 | ConnectionService              | INFO  | Exception while trying to establish connection to testbeduzl16.wisebed.itm.uni-luebeck.de:8880. Cause: null
2012-07-10 13:43:22,895 | gingService-DispatcherThread 0 | UnreliableMessagingService     | WARN  | No connection to urn:wisebed:uzl1:0x213c. Dropping message to: "urn:wisebed:uzl1:0x213c"
from: "wisebed.itm.uni-luebeck.de"
msgType: "de.uniluebeck.itm.tr.runtime.wsnapp.WSNApp/LISTENER_MANAGEMENT"
priority: 2
validUntil: 1341920662892
payload: "\b\001\022\032wisebed.itm.uni-luebeck.de"

danbim commented 12 years ago

This only happens if a user was already connected to the testbed, i.e. if there's an active connection to an experiment.

danbim commented 12 years ago

Still an issue it seems...

danbim commented 12 years ago

The error seems to be linked to the fact that WSNApp tries to reconfigure the default pipeline on the gateway hosts when they are already down, resulting in a timeout. From the logs...

TERM trapped.  Shutting down.
2012-07-24 14:51:43,629 | Shutdown-Thread                | IWSN                           | INFO  | Stopping IWSN...
2012-07-24 14:51:43,631 | Shutdown-Thread                | SessionManagementSoapService   | INFO  | Stopped Session Management service on http://wisebed.itm.uni-luebeck.de:8888/sessions
2012-07-24 14:51:43,631 | Shutdown-Thread                | WSNSoapService                 | INFO  | Stopping WSN service SOAP API...
2012-07-24 14:51:43,635 | Shutdown-Thread                | WSNSoapService                 | INFO  | Stopped WSN service SOAP API on http://wisebed.itm.uni-luebeck.de:8888/wsn/DC32D9E2352DC4B248C620AEC63FD530
2012-07-24 14:51:43,635 | Shutdown-Thread                | WSNServiceImpl                 | INFO  | Stopping WSN service...
2012-07-24 14:51:43,638 | Shutdown-Thread                | WSNServiceImpl                 | INFO  | Stopped WSN service!
2012-07-24 14:51:43,638 | Shutdown-Thread                | WSNApp                         | INFO  | Stopping WSNApp...
2012-07-24 14:51:43,638 | Shutdown-Thread                | WSNApp                         | INFO  | Setting ChannelPipeline to default configuration for all nodes...
2012-07-24 14:51:53,640 | Shutdown-Thread                | WSNApp                         | ERROR | {}
java.util.concurrent.ExecutionException: de.uniluebeck.itm.tr.iwsn.overlay.messaging.reliable.ReliableMessagingTimeoutException
    at com.google.common.util.concurrent.AbstractFuture$Sync.getValue(AbstractFuture.java:289)
    at com.google.common.util.concurrent.AbstractFuture$Sync.get(AbstractFuture.java:276)
    at com.google.common.util.concurrent.AbstractFuture.get(AbstractFuture.java:111)
    at de.uniluebeck.itm.tr.runtime.wsnapp.WSNAppImpl.setDefaultPipelineOnReservedNodes(WSNAppImpl.java:1266)
    at de.uniluebeck.itm.tr.runtime.wsnapp.WSNAppImpl.doStop(WSNAppImpl.java:527)
    at com.google.common.util.concurrent.AbstractService.stop(AbstractService.java:115)
    at com.google.common.util.concurrent.AbstractService.stopAndWait(AbstractService.java:134)
    at de.uniluebeck.itm.tr.runtime.portalapp.WSNServiceHandle.stop(WSNServiceHandle.java:98)
    at de.uniluebeck.itm.tr.runtime.portalapp.SessionManagementServiceImpl.freeInternal(SessionManagementServiceImpl.java:493)
    at de.uniluebeck.itm.tr.runtime.portalapp.SessionManagementServiceImpl.stop(SessionManagementServiceImpl.java:195)
    at de.uniluebeck.itm.tr.runtime.portalapp.PortalServerApplication.doStop(PortalServerApplication.java:119)
    at com.google.common.util.concurrent.AbstractService.stop(AbstractService.java:115)
    at com.google.common.util.concurrent.AbstractService.stopAndWait(AbstractService.java:134)
    at de.uniluebeck.itm.tr.iwsn.IWSNApplicationManager.stop(IWSNApplicationManager.java:273)
    at de.uniluebeck.itm.tr.iwsn.IWSNImpl.stop(IWSNImpl.java:76)
    at de.uniluebeck.itm.tr.iwsn.IWSNServer$1.run(IWSNServer.java:190)
    at java.lang.Thread.run(Thread.java:662)

danbim commented 12 years ago

The issue is not really an issue. The deployment scripts assume a certain timeout until they consider the stopping of IWSN failed. The actual timeouts that are hit when shutting down the IWSN portal after or concurrently with the IWSN gateways seem to be higher. Therefore only the deployment scripts show an error, the IWSN portal are finely stopped.

In order to find a cleaner solution I'll not close this ticket but move it to the next milestone...

itm / testbed-runtime

IWSN on portal server won't shut down in time #185