tobof / openhab-addons

The next-generation open Home Automation Bus (openHAB)
Eclipse Public License 2.0
39 stars 30 forks source link

Binding not reconnecting to ethernet gateway #90

Closed FredericMa closed 6 years ago

FredericMa commented 7 years ago

Expected Behavior

If the gateway goes offline it should automatically reconnect when it is back online.

Current Behavior

If I want to program my arduino with new software it gets disconnected from the network (obvious) but when I reconnect it to the network, the MySensors binding doesn't initiate a reconnect to the gateway. It looks like it doesn't even know that the gateway went offline since its status remains online. This is the configuration of the gateway: image

Note: all my sensors are connected to the gateway itself but I don't think this could cause the issue.

Greetings, Frederic

andreacioni commented 7 years ago

@FredericMa thanks for reporting the issue here on GitHub. This problem seems related to the NetworkSanityChecker that I've implemented last year. Personally I've never experienced issues with my serial gateway probably this behaviour affects only Ethernet gateway. I'll build one of it an try to reproduce your problem 👍

andreacioni commented 7 years ago

Found one possible cause of this issue. As I see in the configuration you have attached, Disable network sanity checker is not checked (so it is false). Unfortunately inside thing-types.xml that parameter is defined as:

<parameter name="networkSanCheckEnabled" type="boolean" required="false">
<label>Disable network sanity check</label>
<description>Network sanity check periodically ensure that gateway is up and running</description>
<default>false</default>
</parameter>

You see that its name is networkSanCheckEnabled but the description say something different. So I think that the problem is coming from an ambiguity. To be sure I look for its usage in the code and I found this:

/**
     * Startup the gateway
     */
    public void startup() {

        myCon.initialize();

        myEventRegister.addEventListener(this);

        if (myConf.getEnableNetworkSanCheck()) { <- HERE THE AMBIGUITY
            myNetSanCheck = new MySensorsNetworkSanityChecker(this, myEventRegister, myCon);
        }
    }

Temporary solution for @FredericMa is to enable the Disable network sanity checker and see if something changes. Keep in mind that NetworkSanityChecker (in your configuration) start every 1 minute and wait for 2 attempts before disconnection and try to re-connecting.

About the concrete solution we have to wait @tobof opinion.

FredericMa commented 6 years ago

Hi @andreacioni,

Thanks for the suggestion! I can confirm that this fixed the problem in my case!

Thanks for the quick response!

Greetings, Frederic

nofoxgit commented 6 years ago

Hi @andreacioni, I've tried to set NetworSanityChecker to "false" in my .things file and nothing happened. Sanity Checker stopps but gateway did not reconnect after disconnecting and connecting network cable again.

andreacioni commented 6 years ago

Hi @nofoxgit. Thanks for your test. Have you wait all attempts of the NetworkSanityChecker (accordingly to your gateway configuration inside .things file)? I ask you that because disconnections are not immediately recognized by the binding. So you have to wait since all remaining attempts expires. Could you please,also post the log (DEBUG level) of your situation?

nofoxgit commented 6 years ago

Ok, I have networkSanCheckConnectionFailAttempts set to 100, it seams that I missunderstood what the author have on his mind ? Now I set it to 2. After disconnecting network cable sanityChecker check for gateway 2 times and than i have this messages:

2017-08-22 16:52:00.538 [INFO ] [ateway.MySensorsNetworkSanityChecker] - Network Sanity Checker thread stopped 2017-08-22 16:52:00.544 [DEBUG] [ensors.factory.MySensorsCacheFactory] - Writing on cache given_ids, content: [0,1,2,3,4,5,6,7,8] 2017-08-22 16:52:00.547 [ERROR] [al.protocol.ip.MySensorsIpConnection] - Failed connecting to bridge...next retry in 10 seconds (Retry No.:1) 2017-08-22 16:52:00.549 [DEBUG] [al.protocol.ip.MySensorsIpConnection] - Disconnecting from IP bridge ... 2017-08-22 16:52:00.552 [INFO ] [ateway.MySensorsNetworkSanityChecker] - Network Sanity Checker thread stopped 2017-08-22 16:52:00.560 [DEBUG] [ensors.factory.MySensorsCacheFactory] - Writing on cache given_ids, content: [0,1,2,3,4,5,6,7,8] 2017-08-22 16:52:10.570 [DEBUG] [al.protocol.ip.MySensorsIpConnection] - Connecting to IP bridge [192.168.0.71:5003] 2017-08-22 16:52:13.732 [ERROR] [al.protocol.ip.MySensorsIpConnection] - Error while trying to connect InputStreamReader java.net.NoRouteToHostException: No route to host (Host unreachable) at java.net.PlainSocketImpl.socketConnect(Native Method)[:1.8.0_121] at java.net.AbstractPlainSocketImpl.doConnect(AbstractPlainSocketImpl.java:350)[:1.8.0_121] at java.net.AbstractPlainSocketImpl.connectToAddress(AbstractPlainSocketImpl.java:206)[:1.8.0_121] at java.net.AbstractPlainSocketImpl.connect(AbstractPlainSocketImpl.java:188)[:1.8.0_121] at java.net.SocksSocketImpl.connect(SocksSocketImpl.java:392)[:1.8.0_121] at java.net.Socket.connect(Socket.java:589)[:1.8.0_121] at java.net.Socket.connect(Socket.java:538)[:1.8.0_121] at java.net.Socket.<init>(Socket.java:434)[:1.8.0_121] at java.net.Socket.<init>(Socket.java:211)[:1.8.0_121] at org.openhab.binding.mysensors.internal.protocol.ip.MySensorsIpConnection.establishConnection(MySensorsIpConnection.java:46)[181:org.openhab.binding.mysensors:2.2.0.201707241115] at org.openhab.binding.mysensors.internal.protocol.MySensorsAbstractConnection.connect(MySensorsAbstractConnection.java:150)[181:org.openhab.binding.mysensors:2.2.0.201707241115] at org.openhab.binding.mysensors.internal.protocol.MySensorsAbstractConnection.run(MySensorsAbstractConnection.java:127)[181:org.openhab.binding.mysensors:2.2.0.201707241115] at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511)[:1.8.0_121] at java.util.concurrent.FutureTask.runAndReset(FutureTask.java:308)[:1.8.0_121] at java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.access$301(ScheduledThreadPoolExecutor.java:180)[:1.8.0_121] at java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.run(ScheduledThreadPoolExecutor.java:294)[:1.8.0_121] at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)[:1.8.0_121] at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)[:1.8.0_121] at java.lang.Thread.run(Thread.java:745)[:1.8.0_121] 2017-08-22 16:52:13.738 [INFO ] [ateway.MySensorsNetworkSanityChecker] - Network Sanity Checker thread stopped 2017-08-22 16:52:13.742 [DEBUG] [ensors.factory.MySensorsCacheFactory] - Writing on cache given_ids, content: [0,1,2,3,4,5,6,7,8] 2017-08-22 16:52:13.744 [ERROR] [al.protocol.ip.MySensorsIpConnection] - Failed connecting to bridge...next retry in 10 seconds (Retry No.:2) 2017-08-22 16:52:13.746 [DEBUG] [al.protocol.ip.MySensorsIpConnection] - Disconnecting from IP bridge ... 2017-08-22 16:52:13.747 [INFO ] [ateway.MySensorsNetworkSanityChecker] - Network Sanity Checker thread stopped 2017-08-22 16:52:13.752 [DEBUG] [ensors.factory.MySensorsCacheFactory] - Writing on cache given_ids, content: [0,1,2,3,4,5,6,7,8] 2017-08-22 16:52:23.755 [DEBUG] [al.protocol.ip.MySensorsIpConnection] - Connecting to IP bridge [192.168.0.71:5003] 2017-08-22 16:52:26.852 [ERROR] [al.protocol.ip.MySensorsIpConnection] - Error while trying to connect InputStreamReader java.net.NoRouteToHostException: No route to host (Host unreachable)

Every 10 seconds Binding tries to reconnect with the bridge and when I connect network cable it reconnects!! :)

I have my gateway configured in the .things file and I have networkSanCheckEnabled="true" there. When I have this configured by PaperUI discovery function, SanityCheck didn't work in both states of DisableNetworkSanityCheck switch!
And now the most funny part! As I wrote earlier I have networkSanCheckEnabled="true" in my configuration file. And I have checked what is the state of DisableNetworkSanityCheck switch.. Guess what ? It is set to ON! So according to PaperUI now I have sanityCheck disabled but according to .thing sfile I have it enabled.

So my solution is to configure OpenHab2 binding with the .things file and set SanityCheck for true and not much amount of retries. I.e. Bridge mysensors:bridge-eth:gateway [ ipAddress="192.168.0.71", tcpPort=5003, sendDelay=200, networkSanCheckEnabled=true, networkSanCheckInterval=2, networkSanCheckConnectionFailAttempts=2 ]

It seams that the biding must first set the bridge to OFFLINE state and than it attempts to reconnect the gateway.

tobof commented 6 years ago

I've changed this configuration parameter to remove the double negative. Looks like I've forgotten to change the label. networkSanCheckEnabled=true enables the network sanity check. :)

Additionally:

@andreacioni : Have you played with setKeepAlive(true) on the Socket? Maybe this also would improve the chance to detect a lost connection.

tobof commented 6 years ago

@andreacioni Does not really belong to this issue, but while looking at setKeepAlive(true) I found setTcpNoDelay(true) for small content. Maybe we're able to tune the communication with that parameter.

andreacioni commented 6 years ago

Hi @tobof, I took a look inside documentation of Socket class in Java 8. Unfortunately I' think your suggested options doesn't help us to better detect disconnections 😞

Maybe I don't understand why you're suggesting them. Did you have an idea on how to combine them to achieve a faster feedback on connection failure?

tobof commented 6 years ago

Hi @andreacioni !

The primary and best way to detect a lost connection is the sanity checker. Maybe the interval should be changed to seconds instead of minutes and with that would allow a heartbeat every 20s or so?

andreacioni commented 6 years ago

Yes, I had the same idea! But, we have a backward compatibility problem. If we change this settings now (and we maintain the same parameter name in things-type.xml) users that set 1 minute in current binding will have the NetworkSanityChecker that starts every seconds... 😨

In my opinion the benefit of changing minutes to seconds covers the risk (especially now that the binding is going to be a part of the official OH distribution).

Another solution is to add a parameter in things-type.xml to allow the user to select the time unit, but not sure it is the best way.

jechiasson commented 6 years ago

I am seeing similar issues using the serial gateway with OH2. When OH2 restarts, it never queries the gateway for the current sensors states. Is the intent of the heartbeat and/or network sanity check to periodically request node and sensor states in case of restart?

tobof commented 6 years ago

@jechiasson Requesting the current state of a sensor is only possible, if the node is programmed that way, which is not default in MySensors. So the binding will not request status but will listen for updates.

@andreacioni An additional idea: add an (for a while) additional parameter networkSanCheckIntervalSec and mark the old one deprecated. We may then use some logic on initialization of the gateway like:

if networkSanCheckInterval is set (not null) => use it otherwise check for networkSanCheckIntervalSec if set => use it (in seconds) otherwise => use default

Some time in the feature we may remove networkSanCheckInterval and the addiational logic.

jechiasson commented 6 years ago

@tobof If the default is changed will the OH2 binding then do the query on startup? What about the nodes - do they send their child states on startup to the gateway?

jechiasson commented 6 years ago

I cycled the SanityCheck off, and captured the log. It appears that the locally attached sensors do send their state: ==> /var/log/openhab2/events.log <== 2017-10-19 12:13:15.198 [ThingUpdatedEvent ] - Thing 'mysensors:bridge-ser:51656f7d' has been updated.

==> /var/log/openhab2/openhab.log <== 2017-10-19 12:13:15.212 [DEBUG] [ensors.factory.MySensorsCacheFactory] - Cache file: given_ids.cached exist. 2017-10-19 12:13:15.217 [DEBUG] [ensors.factory.MySensorsCacheFactory] - Cache (given_ids) content: [0] 2017-10-19 12:13:15.219 [DEBUG] [col.serial.MySensorsSerialConnection] - Enabling connection watchdog 2017-10-19 12:13:15.222 [DEBUG] [col.serial.MySensorsSerialConnection] - Connecting to /dev/ttyArduino [baudRate:115200] 2017-10-19 12:13:15.226 [DEBUG] [org.openhab.binding.mysensors ] - ServiceEvent REGISTERED - {org.eclipse.smarthome.config.discovery.DiscoveryService}={service.id=303, service.bundleid=186, service.scope=singleton} - org.openhab.binding.mysensors 2017-10-19 12:13:15.230 [DEBUG] [.discovery.MySensorsDiscoveryService] - Starting MySensors discovery scan 2017-10-19 12:13:15.230 [DEBUG] [nsors.handler.MySensorsBridgeHandler] - Initialization of the MySensors bridge DONE! 2017-10-19 12:13:15.236 [DEBUG] [col.serial.MySensorsSerialConnection] - Final port list: /dev/ttyArduino 2017-10-19 12:13:15.255 [DEBUG] [col.serial.MySensorsSerialConnection] - Successfully connected to serial port. 2017-10-19 12:13:15.255 [DEBUG] [col.serial.MySensorsSerialConnection] - Waiting 3 seconds to allow correct reset trigger on serial connection opening 2017-10-19 12:13:18.257 [DEBUG] [rsAbstractConnection$MySensorsWriter] - Sending to MySensors: 0;0;3;0;2; 2017-10-19 12:13:18.257 [DEBUG] [rsAbstractConnection$MySensorsReader] - Message from gateway received: 0;255;3;0;9;MCO:BGN:INIT GW,CP=R-NGA--,VER=2.1.1 2017-10-19 12:13:18.258 [DEBUG] [rsAbstractConnection$MySensorsReader] - Message from gateway received: 0;255;3;0;14;Gateway startup complete. 2017-10-19 12:13:18.259 [DEBUG] [rsAbstractConnection$MySensorsReader] - Message from gateway received: 0;255;0;0;17;2.1.1 2017-10-19 12:13:18.259 [DEBUG] [rs.internal.gateway.MySensorsGateway] - Presentation Message received 2017-10-19 12:13:18.261 [DEBUG] [ensors.factory.MySensorsCacheFactory] - Writing on cache given_ids, content: [0] 2017-10-19 12:13:18.262 [WARN ] [.discovery.MySensorsDiscoveryService] - Cannot automatic discover thing node: 0, child: 255 please insert it manually 2017-10-19 12:13:18.262 [DEBUG] [rsAbstractConnection$MySensorsReader] - Message from gateway received: 0;1;0;0;0; 2017-10-19 12:13:18.262 [DEBUG] [rs.internal.gateway.MySensorsGateway] - Presentation Message received 2017-10-19 12:13:18.262 [DEBUG] [ensors.factory.MySensorsCacheFactory] - Writing on cache given_ids, content: [0] 2017-10-19 12:13:18.262 [DEBUG] [.discovery.MySensorsDiscoveryService] - Preparing new thing for inbox: mysensors:door 2017-10-19 12:13:18.265 [DEBUG] [.discovery.MySensorsDiscoveryService] - Discovered device submitted 2017-10-19 12:13:18.265 [DEBUG] [rsAbstractConnection$MySensorsReader] - Message from gateway received: 0;3;0;0;0; 2017-10-19 12:13:18.265 [DEBUG] [rs.internal.gateway.MySensorsGateway] - Presentation Message received 2017-10-19 12:13:18.265 [DEBUG] [ensors.factory.MySensorsCacheFactory] - Writing on cache given_ids, content: [0] 2017-10-19 12:13:18.266 [DEBUG] [.discovery.MySensorsDiscoveryService] - Preparing new thing for inbox: mysensors:door 2017-10-19 12:13:18.266 [DEBUG] [.discovery.MySensorsDiscoveryService] - Discovered device submitted 2017-10-19 12:13:18.266 [DEBUG] [rsAbstractConnection$MySensorsReader] - Message from gateway received: 0;255;3;0;9;MCO:BGN:STP 2017-10-19 12:13:18.266 [DEBUG] [rsAbstractConnection$MySensorsReader] - Message from gateway received: 0;255;3;0;9;MCO:REG:NOT NEEDED 2017-10-19 12:13:18.266 [DEBUG] [rsAbstractConnection$MySensorsReader] - Message from gateway received: 0;255;3;0;9;MCO:BGN:INIT OK,TSP=NA 2017-10-19 12:13:18.266 [DEBUG] [rsAbstractConnection$MySensorsReader] - Message from gateway received: 0;1;1;0;16;0 2017-10-19 12:13:18.266 [DEBUG] [rs.internal.gateway.MySensorsGateway] - Node 0 found in gateway 2017-10-19 12:13:18.266 [DEBUG] [rs.internal.gateway.MySensorsGateway] - Child 1 found in node 0 2017-10-19 12:13:18.268 [DEBUG] [rsAbstractConnection$MySensorsReader] - Message from gateway received: 0;3;1;0;16;1 2017-10-19 12:13:18.268 [DEBUG] [rs.internal.gateway.MySensorsGateway] - Node 0 found in gateway 2017-10-19 12:13:18.268 [DEBUG] [rs.internal.gateway.MySensorsGateway] - Child 3 found in node 0 2017-10-19 12:13:18.271 [DEBUG] [rsAbstractConnection$MySensorsReader] - Message from gateway received: 0;255;3;0;2;2.1.1 2017-10-19 12:13:18.272 [DEBUG] [rsAbstractConnection$MySensorsReader] - Good,Gateway is up and running! (Ver:2.1.1) 2017-10-19 12:13:18.272 [DEBUG] [ensors.factory.MySensorsCacheFactory] - Writing on cache given_ids, content: [0] 2017-10-19 12:13:18.273 [INFO ] [col.serial.MySensorsSerialConnection] - Successfully connected to MySensors Bridge.

Are these the initial state messages? 2017-10-19 12:13:18.266 [DEBUG] [rsAbstractConnection$MySensorsReader] - Message from gateway received: 0;1;1;0;16;0 2017-10-19 12:13:18.268 [DEBUG] [rsAbstractConnection$MySensorsReader] - Message from gateway received: 0;3;1;0;16;1