openhab / openhab-addons

Add-ons for openHAB
https://www.openhab.org/
Eclipse Public License 2.0
1.88k stars 3.59k forks source link

[boschshc] Discovery Frequently Fails with ConnectExceptions and SocketTimeoutExceptions #16174

Closed david-pace closed 9 months ago

david-pace commented 10 months ago

Expected Behavior

Current Behavior

When setting the log level to DEBUG for the boschshc binding, we see a lot of errors like the following in the log:

Discovering failed with exception java.net.ConnectException: Connection refused
Discovering failed with exception java.net.SocketTimeoutException: Connect Timeout

The exceptions as such are to be expected, since the discovery tries to call the URL https://[IP]:8443/smarthome/public/information on every device that advertises a service with the mDNS pattern _http._tcp.local.. Since the pattern is very generic, it can be expected that some devices will not accept a connection with that URL on port 8443 and thus the discovery gets a connection refused response or runs into a timeout.

However, the following things are not expected:

For analysis, I enhanced the logging on my system and I see "blocks" of ConnectExceptions and SocketTimeoutExceptions like this:

00:19:35.434 [DEBUG] [discovery.BridgeDiscoveryParticipant] - Discovery for IP xxx.xxx.xxx.48 failed with exception java.net.ConnectException: Connection refused
00:19:35.435 [DEBUG] [discovery.BridgeDiscoveryParticipant] - Discovery for IP xxx.xxx.xxx.48 failed with exception java.net.ConnectException: Connection refused
00:19:35.434 [DEBUG] [discovery.BridgeDiscoveryParticipant] - Discovery for IP xxx.xxx.xxx.48 failed with exception java.net.ConnectException: Connection refused
00:19:35.440 [DEBUG] [discovery.BridgeDiscoveryParticipant] - Discovery for IP xxx.xxx.xxx.48 failed with exception java.net.ConnectException: Connection refused
00:19:35.440 [DEBUG] [discovery.BridgeDiscoveryParticipant] - Discovery for IP xxx.xxx.xxx.48 failed with exception java.net.ConnectException: Connection refused
00:19:35.440 [DEBUG] [discovery.BridgeDiscoveryParticipant] - Discovery for IP xxx.xxx.xxx.48 failed with exception java.net.ConnectException: Connection refused
00:19:35.444 [DEBUG] [discovery.BridgeDiscoveryParticipant] - Discovery for IP xxx.xxx.xxx.48 failed with exception java.net.ConnectException: Connection refused
00:19:35.444 [DEBUG] [discovery.BridgeDiscoveryParticipant] - Discovery for IP xxx.xxx.xxx.48 failed with exception java.net.ConnectException: Connection refused
00:19:35.444 [DEBUG] [discovery.BridgeDiscoveryParticipant] - Discovery for IP xxx.xxx.xxx.48 failed with exception java.net.ConnectException: Connection refused

When setting a breakpoint in the debugger, I see that multiple threads try to query the same IP address concurrently. In this case the device is concurrently queried in 9 threads. I don't know if this is relevant but the device with the .48 address is a printer in my case.

Issues with the discovery were also reported by our users @xxthunder and @mike-bike in https://github.com/openhab/openhab-addons/pull/16002 and https://github.com/openhab/openhab-addons/issues/15912.

Possible Solution

Maybe @GerdZanker could take a look at this because he initially implemented the discovery and might know some important details?

Steps to Reproduce (for Bugs)

Prerequisites: at least one device in the network apart from the Bosch Smart Home Controller that advertises a HTTP server (e.g. a printer).

  1. Set the log level of the boschshc binding to DEBUG or TRACE
  2. Start openHAB
  3. Analyze the content of the openhab.log file

Context

Discovery should run regularly in a stable manner and provide appropriate log entries

Your Environment

david-pace commented 10 months ago

Here is some additional debug information:

I enhanced the logging to print the IP address (which is always the same in the following log entries, an IP of a printer), the thread's toString() representation, the thread ID and the exception. Here are the relevant log entries grouped by concurrency (log entries of different threads happening in the same second are grouped together):

16:28:00.754 [DEBUG] [discovery.BridgeDiscoveryParticipant] - Discovery for IP xxx.xxx.xxx.48 in thread Thread[JmDNS pool-6-thread-1,5,main] (thread ID 114) failed with exception java.net.ConnectException: Connection refused
16:28:00.754 [DEBUG] [discovery.BridgeDiscoveryParticipant] - Discovery for IP xxx.xxx.xxx.48 in thread Thread[JmDNS pool-7-thread-1,5,main] (thread ID 116) failed with exception java.net.ConnectException: Connection refused
16:28:00.754 [DEBUG] [discovery.BridgeDiscoveryParticipant] - Discovery for IP xxx.xxx.xxx.48 in thread Thread[JmDNS pool-8-thread-1,5,main] (thread ID 115) failed with exception java.net.ConnectException: Connection refused
16:28:00.757 [DEBUG] [discovery.BridgeDiscoveryParticipant] - Discovery for IP xxx.xxx.xxx.48 in thread Thread[JmDNS pool-6-thread-1,5,main] (thread ID 114) failed with exception java.net.ConnectException: Connection refused
16:28:00.758 [DEBUG] [discovery.BridgeDiscoveryParticipant] - Discovery for IP xxx.xxx.xxx.48 in thread Thread[JmDNS pool-7-thread-1,5,main] (thread ID 116) failed with exception java.net.ConnectException: Connection refused
16:28:00.757 [DEBUG] [discovery.BridgeDiscoveryParticipant] - Discovery for IP xxx.xxx.xxx.48 in thread Thread[JmDNS pool-8-thread-1,5,main] (thread ID 115) failed with exception java.net.ConnectException: Connection refused
16:28:00.760 [DEBUG] [discovery.BridgeDiscoveryParticipant] - Discovery for IP xxx.xxx.xxx.48 in thread Thread[JmDNS pool-7-thread-1,5,main] (thread ID 116) failed with exception java.net.ConnectException: Connection refused
16:28:00.760 [DEBUG] [discovery.BridgeDiscoveryParticipant] - Discovery for IP xxx.xxx.xxx.48 in thread Thread[JmDNS pool-8-thread-1,5,main] (thread ID 115) failed with exception java.net.ConnectException: Connection refused
16:28:00.760 [DEBUG] [discovery.BridgeDiscoveryParticipant] - Discovery for IP xxx.xxx.xxx.48 in thread Thread[JmDNS pool-6-thread-1,5,main] (thread ID 114) failed with exception java.net.ConnectException: Connection refused

16:28:22.681 [DEBUG] [discovery.BridgeDiscoveryParticipant] - Discovery for IP xxx.xxx.xxx.48 in thread Thread[JmDNS pool-8-thread-1,5,main] (thread ID 115) failed with exception java.net.ConnectException: Connection refused
16:28:22.681 [DEBUG] [discovery.BridgeDiscoveryParticipant] - Discovery for IP xxx.xxx.xxx.48 in thread Thread[JmDNS pool-7-thread-1,5,main] (thread ID 116) failed with exception java.net.ConnectException: Connection refused
16:28:22.681 [DEBUG] [discovery.BridgeDiscoveryParticipant] - Discovery for IP xxx.xxx.xxx.48 in thread Thread[OH-discovery-2,5,main] (thread ID 93) failed with exception java.net.ConnectException: Connection refused
16:28:22.681 [DEBUG] [discovery.BridgeDiscoveryParticipant] - Discovery for IP xxx.xxx.xxx.48 in thread Thread[JmDNS pool-6-thread-1,5,main] (thread ID 114) failed with exception java.net.ConnectException: Connection refused
16:28:22.681 [DEBUG] [discovery.BridgeDiscoveryParticipant] - Discovery for IP xxx.xxx.xxx.48 in thread Thread[OH-discovery-1,5,main] (thread ID 92) failed with exception java.net.ConnectException: Connection refused
16:28:22.686 [DEBUG] [discovery.BridgeDiscoveryParticipant] - Discovery for IP xxx.xxx.xxx.48 in thread Thread[OH-discovery-2,5,main] (thread ID 93) failed with exception java.net.ConnectException: Connection refused
16:28:22.686 [DEBUG] [discovery.BridgeDiscoveryParticipant] - Discovery for IP xxx.xxx.xxx.48 in thread Thread[OH-discovery-1,5,main] (thread ID 92) failed with exception java.net.ConnectException: Connection refused
16:28:22.691 [DEBUG] [discovery.BridgeDiscoveryParticipant] - Discovery for IP xxx.xxx.xxx.48 in thread Thread[OH-discovery-1,5,main] (thread ID 92) failed with exception java.net.ConnectException: Connection refused
16:28:22.691 [DEBUG] [discovery.BridgeDiscoveryParticipant] - Discovery for IP xxx.xxx.xxx.48 in thread Thread[OH-discovery-2,5,main] (thread ID 93) failed with exception java.net.ConnectException: Connection refused

16:29:08.864 [DEBUG] [discovery.BridgeDiscoveryParticipant] - Discovery for IP xxx.xxx.xxx.48 in thread Thread[JmDNS pool-6-thread-1,5,main] (thread ID 114) failed with exception java.net.SocketTimeoutException: Connect Timeout
16:29:08.864 [DEBUG] [discovery.BridgeDiscoveryParticipant] - Discovery for IP xxx.xxx.xxx.48 in thread Thread[JmDNS pool-7-thread-1,5,main] (thread ID 116) failed with exception java.net.SocketTimeoutException: Connect Timeout
16:29:08.864 [DEBUG] [discovery.BridgeDiscoveryParticipant] - Discovery for IP xxx.xxx.xxx.48 in thread Thread[JmDNS pool-8-thread-1,5,main] (thread ID 115) failed with exception java.net.SocketTimeoutException: Connect Timeout

16:29:52.350 [DEBUG] [discovery.BridgeDiscoveryParticipant] - Discovery for IP xxx.xxx.xxx.48 in thread Thread[JmDNS pool-8-thread-1,5,main] (thread ID 115) failed with exception java.net.ConnectException: Connection refused
16:29:52.350 [DEBUG] [discovery.BridgeDiscoveryParticipant] - Discovery for IP xxx.xxx.xxx.48 in thread Thread[JmDNS pool-6-thread-1,5,main] (thread ID 114) failed with exception java.net.ConnectException: Connection refused
16:29:52.350 [DEBUG] [discovery.BridgeDiscoveryParticipant] - Discovery for IP xxx.xxx.xxx.48 in thread Thread[JmDNS pool-7-thread-1,5,main] (thread ID 116) failed with exception java.net.ConnectException: Connection refused
16:29:52.587 [DEBUG] [discovery.BridgeDiscoveryParticipant] - Discovery for IP xxx.xxx.xxx.48 in thread Thread[JmDNS pool-8-thread-1,5,main] (thread ID 115) failed with exception java.net.ConnectException: Connection refused
16:29:52.676 [DEBUG] [discovery.BridgeDiscoveryParticipant] - Discovery for IP xxx.xxx.xxx.48 in thread Thread[JmDNS pool-7-thread-1,5,main] (thread ID 116) failed with exception java.net.ConnectException: Connection refused
16:29:52.676 [DEBUG] [discovery.BridgeDiscoveryParticipant] - Discovery for IP xxx.xxx.xxx.48 in thread Thread[JmDNS pool-6-thread-1,5,main] (thread ID 114) failed with exception java.net.ConnectException: Connection refused

16:30:02.349 [DEBUG] [discovery.BridgeDiscoveryParticipant] - Discovery for IP xxx.xxx.xxx.48 in thread Thread[JmDNS pool-8-thread-1,5,main] (thread ID 115) failed with exception java.net.ConnectException: Connection refused
16:30:02.349 [DEBUG] [discovery.BridgeDiscoveryParticipant] - Discovery for IP xxx.xxx.xxx.48 in thread Thread[JmDNS pool-6-thread-1,5,main] (thread ID 114) failed with exception java.net.ConnectException: Connection refused
16:30:02.349 [DEBUG] [discovery.BridgeDiscoveryParticipant] - Discovery for IP xxx.xxx.xxx.48 in thread Thread[JmDNS pool-7-thread-1,5,main] (thread ID 116) failed with exception java.net.ConnectException: Connection refused

16:30:07.597 [DEBUG] [discovery.BridgeDiscoveryParticipant] - Discovery for IP xxx.xxx.xxx.48 in thread Thread[JmDNS pool-6-thread-1,5,main] (thread ID 114) failed with exception java.net.ConnectException: Connection refused
16:30:07.597 [DEBUG] [discovery.BridgeDiscoveryParticipant] - Discovery for IP xxx.xxx.xxx.48 in thread Thread[JmDNS pool-7-thread-1,5,main] (thread ID 116) failed with exception java.net.ConnectException: Connection refused
16:30:07.598 [DEBUG] [discovery.BridgeDiscoveryParticipant] - Discovery for IP xxx.xxx.xxx.48 in thread Thread[JmDNS pool-8-thread-1,5,main] (thread ID 115) failed with exception java.net.ConnectException: Connection refused

16:30:13.358 [DEBUG] [discovery.BridgeDiscoveryParticipant] - Discovery for IP xxx.xxx.xxx.48 in thread Thread[JmDNS pool-8-thread-1,5,main] (thread ID 115) failed with exception java.net.ConnectException: Connection refused
16:30:13.358 [DEBUG] [discovery.BridgeDiscoveryParticipant] - Discovery for IP xxx.xxx.xxx.48 in thread Thread[JmDNS pool-7-thread-1,5,main] (thread ID 116) failed with exception java.net.ConnectException: Connection refused
16:30:13.358 [DEBUG] [discovery.BridgeDiscoveryParticipant] - Discovery for IP xxx.xxx.xxx.48 in thread Thread[JmDNS pool-6-thread-1,5,main] (thread ID 114) failed with exception java.net.ConnectException: Connection refused

16:30:58.889 [DEBUG] [discovery.BridgeDiscoveryParticipant] - Discovery for IP xxx.xxx.xxx.48 in thread Thread[JmDNS pool-6-thread-1,5,main] (thread ID 114) failed with exception java.net.ConnectException: Connection refused
16:30:58.889 [DEBUG] [discovery.BridgeDiscoveryParticipant] - Discovery for IP xxx.xxx.xxx.48 in thread Thread[JmDNS pool-7-thread-1,5,main] (thread ID 116) failed with exception java.net.ConnectException: Connection refused
16:30:58.889 [DEBUG] [discovery.BridgeDiscoveryParticipant] - Discovery for IP xxx.xxx.xxx.48 in thread Thread[JmDNS pool-8-thread-1,5,main] (thread ID 115) failed with exception java.net.ConnectException: Connection refused

16:31:31.391 [DEBUG] [discovery.BridgeDiscoveryParticipant] - Discovery for IP xxx.xxx.xxx.48 in thread Thread[JmDNS pool-8-thread-1,5,main] (thread ID 115) failed with exception java.net.ConnectException: Connection refused
16:31:31.391 [DEBUG] [discovery.BridgeDiscoveryParticipant] - Discovery for IP xxx.xxx.xxx.48 in thread Thread[JmDNS pool-7-thread-1,5,main] (thread ID 116) failed with exception java.net.ConnectException: Connection refused
16:31:31.391 [DEBUG] [discovery.BridgeDiscoveryParticipant] - Discovery for IP xxx.xxx.xxx.48 in thread Thread[JmDNS pool-6-thread-1,5,main] (thread ID 114) failed with exception java.net.ConnectException: Connection refused

16:31:32.178 [DEBUG] [discovery.BridgeDiscoveryParticipant] - Discovery for IP xxx.xxx.xxx.48 in thread Thread[JmDNS pool-6-thread-1,5,main] (thread ID 114) failed with exception java.net.ConnectException: Connection refused
16:31:32.178 [DEBUG] [discovery.BridgeDiscoveryParticipant] - Discovery for IP xxx.xxx.xxx.48 in thread Thread[JmDNS pool-7-thread-1,5,main] (thread ID 116) failed with exception java.net.ConnectException: Connection refused
16:31:32.178 [DEBUG] [discovery.BridgeDiscoveryParticipant] - Discovery for IP xxx.xxx.xxx.48 in thread Thread[JmDNS pool-8-thread-1,5,main] (thread ID 115) failed with exception java.net.ConnectException: Connection refused

16:31:42.396 [DEBUG] [discovery.BridgeDiscoveryParticipant] - Discovery for IP xxx.xxx.xxx.48 in thread Thread[JmDNS pool-6-thread-1,5,main] (thread ID 114) failed with exception java.net.ConnectException: Connection refused
16:31:42.396 [DEBUG] [discovery.BridgeDiscoveryParticipant] - Discovery for IP xxx.xxx.xxx.48 in thread Thread[JmDNS pool-8-thread-1,5,main] (thread ID 115) failed with exception java.net.ConnectException: Connection refused
16:31:42.396 [DEBUG] [discovery.BridgeDiscoveryParticipant] - Discovery for IP xxx.xxx.xxx.48 in thread Thread[JmDNS pool-7-thread-1,5,main] (thread ID 116) failed with exception java.net.ConnectException: Connection refused

Here we can gather some interesting information:

@jlaur do you know who implemented the discovery infrastructure and could help us to explain some of these observations?

GerdZanker commented 10 months ago

Hey, I used other simple "_http._tcp.local." mDNS Discovery code as examples for my pull request. To be honest - I not made up my mind regard threading, because the BridgeDiscoveryParticipant instance is called by the openhab framework and the framework is a back box.

It makes no sense to me to do so many discovery tries in a short amount of time - every 30 seconds is already very fast from my point of view. I agree that many requests will put a lot of stress on the smart home controller, because it has not a huge processing power. If necessary the SHC code could "cache" the last discovery result and return the cache value for a couple of discovery calls from the framework.

mike-bike commented 10 months ago

I do not know the details of the framework, but could imagine that bindings could register themselves to the mDNS discovery process. On every call they would need to check how to handle the call and what to action. In case the device is already registered/known, why not simply discarding the call? This would effectively minimize the number of arbitration and registration processes. Anything else would not make sense to me as the device has already been identified and properly processed. Why to repeat the same over and over again?

Just my 2cents from an uneducated user.

Am 03.01.2024 um 20:01 schrieb Gerd Zanker @.***>:

Hey, I used other simple "_http._tcp.local." mDNS Discovery code as examples for my pull request. To be honest - I not made up my mind regard threading, because the BridgeDiscoveryParticipant instance is called by the openhab framework and the framework is a back box.

It makes no sense to me to do so many discovery tries in a short amount of time - every 30 seconds is already very fast from my point of view. I agree that many requests will put a lot of stress on the smart home controller, because it has not a huge processing power. If necessary the SHC code could "cache" the last discovery result and return the cache value for a couple of discovery calls from the framework.

— Reply to this email directly, view it on GitHub https://github.com/openhab/openhab-addons/issues/16174#issuecomment-1875828004, or unsubscribe https://github.com/notifications/unsubscribe-auth/AHVH7GHTUEGEVWAKYIJEVATYMWTIBAVCNFSM6AAAAABBJMOROGVHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMYTQNZVHAZDQMBQGQ. You are receiving this because you were mentioned.

david-pace commented 10 months ago

Possible relation to https://github.com/openhab/openhab-core/pull/3906

I will also test the latest SNAPSHOT state because there was another change in the ThingHandlerService which also affected our boschshc ThingDiscoveryService: https://github.com/openhab/openhab-core/pull/3957

david-pace commented 10 months ago

The latest main state also shows the same issues.

I found out that openHAB indeed receives quite many events from the javax.jmdns service. All of these events are forwarded to the discovery participants and this is done is individual threads.

I have an experimental implementation running on my system which caches the discovery results using the IP address as key. This seems to work fine so far. In the whole log I only see the Discovery failed with exception java.net.ConnectException: Connection refused four times, which seems adequate. After that the result is cached and no physical connection attempts are made afterwards.

I also removed the participant-specific object bridgeInformation which was shared among all the different threads, so this might also have been a problem.

@GerdZanker do you think this approach is viable in general or do you have a better idea? Are IP addresses ok as keys for the cache? Do we need advanced features like evicting entries from the cache after n minutes or something like that?

mike-bike commented 10 months ago

However, I have just checked my PI2 test instance with the new test version of the binding and it is running fine for 11 hours now. This looks very promising and indicates that the issues on the PI2 were caused by the massive mDNS events.

Interestingly the many events do not harm BSHC on my PI4. Maybe because the PI4 has sufficient processing power to cope with the many requests. Nevertheless this looks like a design flaw - either a regression or by purpose and we do not understand the rationale. I’d suggest raising an issue for javax.jmdns though I assume this is part of the underlying framework and not openHAB specific and causing a broader impact. Independent on the framework issue, a good strategy of the individual binding would be to minimize load and traffic supporting a defensive strategy.

If you don’t mind, I’d like to share my 2cents on a caching strategy:

In my PoV the (caching) strategy depends on: What information comes with the discovery event? Could it be e.g. a changed service indicator? What information is returned as discovery results from the Bosch binding and what is the purpose? What are the benefits to initiate another discovery for a device which has already been successfully discovered and registered -> i.e. the binding has already created a thing what would another discovery add? a device which has been identified as not matching -> like the printer: why bothering the printer with BSHC specific requests all the time? It is unlikely, that it morphs into a Bosch bridge What are the use cases to re-discovery on an existing device?

An IP address as a key seems to be easy to implement. However, we then need to cater for IP address changes. In theory a Printer and the Bosch bridge could swap addresses e.g. after a (local) power failure and restart or a user manually re-arranges IP addresses… These may be rare cases which are also not high frequency. Why not following a pragmatic approach? add a user adjustable parameter to the binding setting a TTL in seconds use IP address as key and store the time stamp - if on next request current time > timestamp + TTL, discard the entry and initiate rediscover clear cache on restart of a thing - may not 100% reliable but hey how many users are having more than a handful of bosch bridges? cache should be transient anyway and cleared on binding restart

Thanks for all your good support and effort!

Am 04.01.2024 um 18:20 schrieb David Pace @.***>:

The latest main state also shows the same issues. I found out that openHAB indeed receives quite many events from the javax.jmdns service. All of these events are forwarded to the discovery participants and this is done is individual threads. I have an experimental implementation running on my system which caches the discovery results using the IP address as key. This seems to work fine so far. In the whole log I only see the Discovery failed with exception java.net.ConnectException: Connection refused four times, which seems adequate. I also removed the participant-specific object bridgeInformation which was shared among all the different threads, so this might also have been a problem. @GerdZanker do you think this approach is viable in general or do you have a better idea? Are IP addresses ok as keys for the cache? Do we need advanced features like evicting entries from the cache after n minutes or something like that? — Reply to this email directly, view it on GitHub, or unsubscribe. You are receiving this because you were mentioned.Message ID: @.***>

GerdZanker commented 10 months ago

@GerdZanker do you think this (cache) approach is viable in general or do you have a better idea? Are IP addresses ok as keys for the cache? Do we need advanced features like evicting entries from the cache after n minutes or something like that?

Yes, caching or other way to finish the discovery as early and as cheap as possible is good. Have you evaluate to use the complete ServiceInfo instance as a key? Using the IP address is also fine, because I like the documented trade-off from @mike-bike for a pragmatic approach.

GerdZanker commented 10 months ago

I'm still wonder why we run now into the discovery trouble:

david-pace commented 10 months ago

Thank you for your input :+1: Let's discuss further ideas for enhancements in the pull request I just created.