openhab / openhab-addons

Add-ons for openHAB
https://www.openhab.org/
Eclipse Public License 2.0
1.86k stars 3.57k forks source link

[boschshc] Long Polling crashes regularly every one or two days #15912

Closed xxthunder closed 7 months ago

xxthunder commented 9 months ago

Expected Behavior

Items reflecting Bosch Smart Home Things' read channels shall be updated after every change.

Current Behavior

After one or two days the items (e.g. temperature, humidity, open/close states) are not updated anymore (values are frozen to the last updated value). A view in the openhab.log reveals this exception:

2023-11-17 05:58:17.372 [WARN ] [mmon.WrappedScheduledExecutorService] - Scheduled runnable ended with an exception: 
com.google.gson.JsonSyntaxException: java.lang.IllegalStateException: Expected BEGIN_OBJECT but was STRING at line 1 column 1 path $
    at com.google.gson.internal.bind.ReflectiveTypeAdapterFactory$Adapter.read(ReflectiveTypeAdapterFactory.java:270) ~[?:?]
    at com.google.gson.Gson.fromJson(Gson.java:1058) ~[?:?]
    at com.google.gson.Gson.fromJson(Gson.java:1016) ~[?:?]
    at com.google.gson.Gson.fromJson(Gson.java:959) ~[?:?]
    at com.google.gson.Gson.fromJson(Gson.java:927) ~[?:?]
    at org.openhab.binding.boschshc.internal.devices.bridge.LongPolling.onLongPollComplete(LongPolling.java:195) ~[?:?]
    at org.openhab.binding.boschshc.internal.devices.bridge.LongPolling$1.lambda$0(LongPolling.java:164) ~[?:?]
    at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:539) ~[?:?]
    at java.util.concurrent.FutureTask.run(FutureTask.java:264) ~[?:?]
    at java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.run(ScheduledThreadPoolExecutor.java:304) ~[?:?]
    at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1136) ~[?:?]
    at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:635) ~[?:?]
    at java.lang.Thread.run(Thread.java:833) ~[?:?]
Caused by: java.lang.IllegalStateException: Expected BEGIN_OBJECT but was STRING at line 1 column 1 path $
    at com.google.gson.stream.JsonReader.beginObject(JsonReader.java:395) ~[?:?]
    at com.google.gson.internal.bind.ReflectiveTypeAdapterFactory$Adapter.read(ReflectiveTypeAdapterFactory.java:259) ~[?:?]
    ... 12 more

Possible Solution

No idea. I could dig deeper into this problem but I don't know where to begin. Any advice is welcome.

Steps to Reproduce (for Bugs)

Let it go, let it fail. I just restart the binding or openhab itself. It happens regularly after one or two days.

Your Environment

david-pace commented 9 months ago

Thanks for the detailed bug report. I can confirm that I found the same exception in my logs, approximately 14 hours after openHAB was started, on November 13th.

When this happens the Bosch Smart Home Controller seems to send a response that does not contain valid JSON, at least not a syntactically correct JSON object.

I will try to enhance the logging on my system for this specific case in order to find out what the actual response from the controller looks like when this happens.

david-pace commented 9 months ago

I was able to capture a long polling response when this happens. It looks like this:

<HTML><HEAD><TITLE>400</TITLE></HEAD><BODY><H1>400 No Host header in request to URI /remote/json-rpc with headers {api-version3.2=api-version3.2, content-typeapplication/json=content-typeapplication/json, content-length70=content-length70, user-agentjetty/9.4.52.v20230823=user-agentjetty/9.4.52.v20230823, host192.xxx.xxx.xxx:8444=host192.xxx.xxx.xxx:8444, accept-encodinggzip=accept-encodinggzip}</H1></BODY></HTML>

The first thing that is strange is that the key-value pairs are combined, which is probably not intended. The second strange thing is that the keys are the same as the values.

I will try to find out why this happens only sporadically, and why it happens in the first place. If you have any ideas, please let me know.

xxthunder commented 8 months ago

Thanks for looking into it. I have no idea why this happens. Might be related to controller updates. I think a day after I raised this issue there was an shc update from Bosch. I installed it (of course ;-) ) and since then no JsonSyntaxException. I don't know if it is worth mentioning: I am still using the first controller version. Let me know, if you need more details. I could also enable some debugging or similar if this would help.

david-pace commented 8 months ago

According to this page there were Smart Home Controller updates on November 8th and November 20th.

Could you please tell me your Smart Home Controller Software Version under Settings -> System -> Smart Home Controller? Mine is 10.18.3249-29469.

I found the JsonSyntaxException in my logs on the following days:

So it seems like for me the latest update did not fix the problem. Could you please look for JsonSyntaxException in your logs as well and check whether it occurred before and/or after November 20th? Note that logs before the current day are compressed as *.gz and have to be unzipped first.

xxthunder commented 8 months ago

Hi, I am using the same version. And it happened again today and according to my logs before and after the 20th of November.

xxthunder commented 8 months ago

I had a closer look into my logs, but I cannot find any correlation between some state changes of my devices and the Json exception. Just a proposal: wouldn't it be good anyway if the binding handles invalid Rest API replies? That the shc sends invalid answers is not nice, sure, but that the binding crashes because of this is worse.

david-pace commented 8 months ago

Indeed we should handle invalid responses gracefully. We just have to be careful, because if we are in a situation where we constantly get responses (valid or invalid on the JSON level) we could end up in an endless request - response loop. I'm thinking of delaying the next long poll response in this case, but I'm not sure how long. Do you think 30 seconds would be appropriate? Or do you have any other ideas?

xxthunder commented 8 months ago

You mean 30 seconds delay after an invalid reply from the controller? Sounds okay for me.

david-pace commented 8 months ago

I created a work in progress pull request. I use the existing mechanism to recover from long polling errors, which means that a new long polling subscription attempt is made 15 seconds after the faulty response.

It would be great if you could download the JAR and replace the JAR on your system, and then report what happens if the bridge responds with invalid content again. The best would be if you could share the relevant portion of the log file when it happens.

xxthunder commented 8 months ago

Hi David, tried to replace the binding's jar, but it looks like some deps have changed between 4.0.4 and 4.1.0?

2023-12-04 22:54:42.670 [WARN ] [org.apache.felix.fileinstall        ] - Error while starting bundle: file:/openhab/addons/org.openhab.binding.boschshc-4.1.0-SNAPSHOT.jar
org.osgi.framework.BundleException: Could not resolve module: org.openhab.binding.boschshc [262]
  Unresolved requirement: Import-Package: com.google.gson; version="[2.10.0,3.0.0)"
    at org.eclipse.osgi.container.Module.start(Module.java:463) ~[org.eclipse.osgi-3.18.0.jar:?]
david-pace commented 8 months ago

OpenHAB 4.0.4 comes with GSON 2.9.1 and current snapshots require at least 2.10. This is enforced by the openHAB core compile BOM so I can't change it easily.

However, you could try the following hack:

Let me know if this works for you.

xxthunder commented 8 months ago

Mmmmh, I checked three times now. I edited the manifest.mf as requested but still get the error. Is the dependency mentioned somewhere else? Otherwise I could also update to 4.1.0.

2023-12-07 19:39:23.374 [WARN ] [org.apache.felix.fileinstall        ] - Error while starting bundle: file:/openhab/addons/org.openhab.binding.boschshc-4.1.0-SNAPSHOT.jar
org.osgi.framework.BundleException: Could not resolve module: org.openhab.binding.boschshc [264]
  Unresolved requirement: Import-Package: com.google.gson; version="[2.10.0,3.0.0)"
    at org.eclipse.osgi.container.Module.start(Module.java:463) ~[org.eclipse.osgi-3.18.0.jar:?]
    at org.eclipse.osgi.internal.framework.EquinoxBundle.start(EquinoxBundle.java:445) ~[org.eclipse.osgi-3.18.0.jar:?]
    at org.apache.felix.fileinstall.internal.DirectoryWatcher.startBundle(DirectoryWatcher.java:1260) ~[?:?]
    at org.apache.felix.fileinstall.internal.DirectoryWatcher.startBundles(DirectoryWatcher.java:1233) ~[?:?]
    at org.apache.felix.fileinstall.internal.DirectoryWatcher.doProcess(DirectoryWatcher.java:520) ~[?:?]
    at org.apache.felix.fileinstall.internal.DirectoryWatcher.process(DirectoryWatcher.java:365) ~[?:?]
    at org.apache.felix.fileinstall.internal.DirectoryWatcher.run(DirectoryWatcher.java:316) ~[?:?]
david-pace commented 8 months ago

How do you replace the JAR exactly? Do you put it in a specific folder? Do you overwrite existing JARs? Or do you use the bundle:update command on the openHAB console?

I would recommend using the bundle:update command (and maybe make a backup copy of the original JAR before that).

xxthunder commented 8 months ago

Hi David, sorry for ignoring you but I was sick and now on a business trip.

I tried putting it into the add-ons folder, but of course the bundle:update is the way.

I will try tomorrow.

Regards, Karsten

xxthunder commented 8 months ago

Hi,

tried again today. I had to do search and replace in several MF files for gson deps. After that, the next dependency is not correct:

openhab> bundle:update 250 file:///openhab/userdata/org.openhab.binding.boschshc-4.1.0-SNAPSHOT.jar
openhab> bundle:restart 250
Error executing command: Error restarting bundles:
        Unable to start bundle 250: Could not resolve module: org.openhab.binding.boschshc [250]
  Unresolved requirement: Import-Package: javax.measure; version="[2.2.0,3.0.0)"

openhab>  

Is it not possible to build your fixes against the current stable release? This is a general pity somehow, as if users are willing to test fixes on their systems it should be as easy as possible for them to install the updated jars.

But anyway. I have no problem updating to 4.1.0. It will be the next stable anyway. Will your jar work with M5?

Regards, Karsten

david-pace commented 8 months ago

Hi @xxthunder, no worries, thank you for trying again. As far as I know fixes can only be merged down to lower releases once they are merged to the newest version, so unfortunately it's not so easy to test them on older releases. I haven't tried it out myself but I'm quite sure the JAR will wok on M5.

Another friendly user tried to test the fix (see https://github.com/openhab/openhab-addons/pull/16002) but unfortunately it looks like my code was not even active. Are you still experiencing the issue at the moment (without my JAR)? Could you look into your log files of the past days to check whether the issue occurred?

Thanks for trying to help testing the fix :+1:

xxthunder commented 8 months ago

Hi again @david-pace ,

actually there was another update from Bosch a week ago. Since then no exception. I will further investigate it and update to M5 as soon as possible.

Good night!

david-pace commented 8 months ago

I just upgraded to M5 and I can confirm that the bundle:update works with the provided JAR. I have also set the log level to DEBUG for boschshc via

log:set DEBUG org.openhab.binding.boschshc

Next time the issue occurs we should know more. If you want you can do the same because the chances are higher that we hit the issue when the fix is installed on two machines.

For the record: the last time the issue occurred on my machine was on December 10th and since then I did not install an update (at least not one that I had to confirm actively).

xxthunder commented 8 months ago

Hi, okay, so today is update day for me.

In the meantime I had another json hiccup yesterday evening with the latest controller software: 10.18.3249-29655

2023-12-18 19:22:56.128 [WARN ] [mmon.WrappedScheduledExecutorService] - Scheduled runnable ended with an exception: 
com.google.gson.JsonSyntaxException: java.lang.IllegalStateException: Expected BEGIN_OBJECT but was STRING at line 1 column 1 path $
    at com.google.gson.internal.bind.ReflectiveTypeAdapterFactory$Adapter.read(ReflectiveTypeAdapterFactory.java:270) ~[?:?]
    at com.google.gson.Gson.fromJson(Gson.java:1058) ~[?:?]
    at com.google.gson.Gson.fromJson(Gson.java:1016) ~[?:?]
    at com.google.gson.Gson.fromJson(Gson.java:959) ~[?:?]
    at com.google.gson.Gson.fromJson(Gson.java:927) ~[?:?]
    at org.openhab.binding.boschshc.internal.devices.bridge.LongPolling.onLongPollComplete(LongPolling.java:195) ~[?:?]
    at org.openhab.binding.boschshc.internal.devices.bridge.LongPolling$1.lambda$0(LongPolling.java:164) ~[?:?]
    at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:539) ~[?:?]
    at java.util.concurrent.FutureTask.run(FutureTask.java:264) ~[?:?]
    at java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.run(ScheduledThreadPoolExecutor.java:304) ~[?:?]
    at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1136) ~[?:?]
    at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:635) ~[?:?]
    at java.lang.Thread.run(Thread.java:833) ~[?:?]
Caused by: java.lang.IllegalStateException: Expected BEGIN_OBJECT but was STRING at line 1 column 1 path $
    at com.google.gson.stream.JsonReader.beginObject(JsonReader.java:395) ~[?:?]
    at com.google.gson.internal.bind.ReflectiveTypeAdapterFactory$Adapter.read(ReflectiveTypeAdapterFactory.java:259) ~[?:?]
    ... 12 more
xxthunder commented 8 months ago

I could cry ... I thought update via docker is easy ... sorry, this might not belong here ...

2023-12-19 10:01:10.765 [ERROR] [ternal.service.BootFeaturesInstaller] - Error installing boot feature repository mvn:org.openhab.distro/openhab-addons/4.1.0.M5/xml/features

java.lang.RuntimeException: Error resolving artifact org.openhab.distro:openhab-addons:xml:features:4.1.0.M5: [Could not find artifact org.openhab.distro:openhab-addons:xml:features:4.1.0.M5 in central (https://repo1.maven.org/maven2/)] : mvn:org.openhab.distro/openhab-addons/4.1.0.M5/xml/features
david-pace commented 8 months ago

Unfortunately I don't know a solution to the feature resolution error.

But please keep the log file from yesterday with the JsonSyntaxException. Did you set the DEBUG log level for the boschshc binding? If so, please look in for the following log entries directly before and after the JsonSyntaxException and post them here:

xxthunder commented 8 months ago

@david-pace

Finally, M5 is running, installed your JAR and set log level to debug.

Stay tuned!

xxthunder commented 8 months ago

@david-pace

First thing I observe:

2023-12-19 10:31:03.347 [DEBUG] [nternal.devices.bridge.BridgeHandler] - Registered device: hdm:ZigBee:18fc260000069383 - looking for roomClimateControl_hz_6

2023-12-19 10:31:03.348 [DEBUG] [nternal.devices.bridge.BridgeHandler] - Registered device: hdm:HomeMaticIP:3014F711A00018DA1859150B - looking for roomClimateControl_hz_6

2023-12-19 10:31:03.348 [DEBUG] [.internal.devices.bridge.LongPolling] - Sending long poll request

2023-12-19 10:31:03.550 [DEBUG] [discovery.BridgeDiscoveryParticipant] - Discovering failed with exception java.net.ConnectException: Connection refused

2023-12-19 10:31:03.564 [DEBUG] [discovery.BridgeDiscoveryParticipant] - Discovering failed with exception java.net.ConnectException: Connection refused

2023-12-19 10:31:23.375 [DEBUG] [.internal.devices.bridge.LongPolling] - Long poll response: {"result":[],"jsonrpc":"2.0"}

2023-12-19 10:31:23.377 [DEBUG] [.internal.devices.bridge.LongPolling] - Sending long poll request

2023-12-19 10:31:43.435 [DEBUG] [.internal.devices.bridge.LongPolling] - Long poll response: {"result":[],"jsonrpc":"2.0"}

2023-12-19 10:31:43.436 [DEBUG] [.internal.devices.bridge.LongPolling] - Sending long poll request

2023-12-19 10:31:50.826 [DEBUG] [discovery.BridgeDiscoveryParticipant] - Discovering failed with exception java.net.ConnectException: Connection refused

2023-12-19 10:31:50.826 [DEBUG] [discovery.BridgeDiscoveryParticipant] - Discovering failed with exception java.net.ConnectException: Connection refused

2023-12-19 10:31:50.826 [DEBUG] [discovery.BridgeDiscoveryParticipant] - Discovering failed with exception java.net.ConnectException: Connection refused

2023-12-19 10:31:53.544 [DEBUG] [discovery.BridgeDiscoveryParticipant] - Discovering failed with exception java.net.ConnectException: Connection refused

2023-12-19 10:31:53.548 [DEBUG] [discovery.BridgeDiscoveryParticipant] - Discovering failed with exception java.net.ConnectException: Connection refused

2023-12-19 10:31:58.242 [DEBUG] [discovery.BridgeDiscoveryParticipant] - Discovering failed with exception java.net.ConnectException: Connection refused

2023-12-19 10:31:58.257 [DEBUG] [discovery.BridgeDiscoveryParticipant] - Discovering failed with exception java.net.ConnectException: Connection refused

2023-12-19 10:32:00.568 [DEBUG] [discovery.BridgeDiscoveryParticipant] - Discovering failed with exception java.net.ConnectException: Connection refused

2023-12-19 10:32:00.824 [DEBUG] [discovery.BridgeDiscoveryParticipant] - Discovering failed with exception java.net.ConnectException: Connection refused

2023-12-19 10:32:03.487 [DEBUG] [.internal.devices.bridge.LongPolling] - Long poll response: {"result":[],"jsonrpc":"2.0"}

2023-12-19 10:32:03.488 [DEBUG] [.internal.devices.bridge.LongPolling] - Sending long poll request

2023-12-19 10:32:03.552 [DEBUG] [discovery.BridgeDiscoveryParticipant] - Discovering failed with exception java.net.ConnectException: Connection refused

2023-12-19 10:32:03.564 [DEBUG] [discovery.BridgeDiscoveryParticipant] - Discovering failed with exception java.net.ConnectException: Connection refused

2023-12-19 10:32:03.825 [DEBUG] [discovery.BridgeDiscoveryParticipant] - Discovering failed with exception java.net.ConnectException: Connection refused

2023-12-19 10:32:03.826 [DEBUG] [discovery.BridgeDiscoveryParticipant] - Discovering failed with exception java.net.ConnectException: Connection refused

2023-12-19 10:32:10.560 [DEBUG] [discovery.BridgeDiscoveryParticipant] - Discovering failed with exception java.net.ConnectException: Connection refused

2023-12-19 10:32:13.543 [DEBUG] [discovery.BridgeDiscoveryParticipant] - Discovering failed with exception java.net.ConnectException: Connection refused

2023-12-19 10:32:13.548 [DEBUG] [discovery.BridgeDiscoveryParticipant] - Discovering failed with exception java.net.ConnectException: Connection refused

2023-12-19 10:32:23.545 [DEBUG] [.internal.devices.bridge.LongPolling] - Long poll response: {"result":[],"jsonrpc":"2.0"}

2023-12-19 10:32:23.546 [DEBUG] [.internal.devices.bridge.LongPolling] - Sending long poll request

2023-12-19 10:32:43.605 [DEBUG] [.internal.devices.bridge.LongPolling] - Long poll response: {"result":[],"jsonrpc":"2.0"}

2023-12-19 10:32:43.606 [DEBUG] [.internal.devices.bridge.LongPolling] - Sending long poll request

2023-12-19 10:32:58.102 [DEBUG] [discovery.BridgeDiscoveryParticipant] - Discovering failed with exception java.net.ConnectException: Connection refused

2023-12-19 10:32:58.192 [DEBUG] [discovery.BridgeDiscoveryParticipant] - Discovering failed with exception java.net.ConnectException: Connection refused

2023-12-19 10:32:58.205 [DEBUG] [discovery.BridgeDiscoveryParticipant] - Discovering failed with exception java.net.ConnectException: Connection refused

2023-12-19 10:33:00.569 [DEBUG] [discovery.BridgeDiscoveryParticipant] - Discovering failed with exception java.net.ConnectException: Connection refused
david-pace commented 8 months ago

Hi @xxthunder, thanks for the report. Let's handle the ConnectException in a separate issue / pull request.

However, I noticed that you got the wrong version of the JAR. Please refer to my last post in https://github.com/openhab/openhab-addons/pull/16002 to get the correct version of the JAR. Let's continue our discussion in the pull request so that all testers can follow the same thread and that I don't have to write instructions multiple times.

Thank you for your patience and your help :+1:

david-pace commented 8 months ago

Update: since a recent Smart Home Controller update the issue still occurs, but the invalid response slightly changed. It now reads:

<HTML><HEAD><TITLE>400</TITLE></HEAD><BODY><H1>400 Unsupported HTTP Protocol Version: /remote/json-rpcHTTP/1.1</H1></BODY></HTML>

Thanks to some debug logging the fix for this specific issue developed in https://github.com/openhab/openhab-addons/pull/16002 can now be finalized 👍

xxthunder commented 7 months ago

@david-pace Happy New Year!

Did you open another issue/pull request regarding the "Discovery Failed / Connection Refused" issue? I still have debugging turned on and this error message appears almost every second. I would like to support solving this issue, too.

Regards, Karsten

david-pace commented 7 months ago

Happy new year @xxthunder and everyone 🍀

I did not have time to create the issue yet, but I did some analysis already. I will soon open the issue.

Meanwhile, after the JsonSyntaxException did not appear on my system anymore for 10 days, it now occurred again, with a slightly different response:

<HTML><HEAD><TITLE>400</TITLE></HEAD><BODY><H1>400 Unsupported HTTP Protocol Version: POSTHTTP/1.1</H1></BODY></HTML>

For the record: this happened on openHAB 4.1.0 on December 31th, 2023.

openhab-bot commented 3 months ago

This issue has been mentioned on openHAB Community. There might be relevant details there:

https://community.openhab.org/t/bosch-smart-home-binding-does-not-update-item-states/138073/6