openhab / openhab-addons

Add-ons for openHAB
https://www.openhab.org/
Eclipse Public License 2.0
1.86k stars 3.57k forks source link

[boschshc] 5000ms timeout is probably to short #15729

Closed grizzlyfred closed 5 months ago

grizzlyfred commented 10 months ago

When starting up openhab it takes a while and tons of things are happening simultaneously. Maybe too many?

With the latest updates on Bosch Smart Home Controller HW-Version 1 about half of my devices get not initialized, because they do not respond within 5000ms.

Expected: they do respond or there is a possibility to increase timeout.

Example: Initializing handler for thing 'boschshc:wall-thermostat:192-168-yy-zz:hdm_HomeMaticIP_3014...' takes more than 5000ms.

This results in them being "unitialized" in openhab "things". They are all reacheable just fine in the bosch app.

Before digging further, it wold be nice to be able set non-default timeout and see if after e.g. 10.000ms they are reachable. Or to have the possibility to set a "retry every x min to reconnect" or so.

Unfortunately, the binding does not seem to have a field for longer timeout or "re-init on-loss" strategy.

To reproduce: It maybe that it concerns those devices that need the smartplug-made zigbee mesh because otherwise they would be too far from the controller, so in a smaller setup this may not be observable.

Impact: For now, monitoring of the house temperature is not entirely possible (Bosch App does the control, not yet openhab) so it is not critical when winter is coming if the bosch app remains functioning...

Running openhab 4.0.3. on manjaro/arch in a snap on a low-power ryzen (so compute power should not be a concern)

jlaur commented 10 months ago

@lsiepel - something to have in mind for #15505.

grizzlyfred commented 10 months ago

I can illuminate further: ever so often the dsl-router likes a restart and in parallel, the bosch shc in a larger setup depends on the (in my case otherwise unused) zigbee mesh to reach further or badly placed (which cannot be helped) devices. It seems to have helped to re-initialize the zigbee mesh by switching plugs to different sockets because now also the distant heating valves are online again after a reboot and the twinguard "own" chain seems also to have been rebuild (like 1 Twingaurd in each storey, like controller -> TG Ground floor -> TG firstflor -> TG attic. But I stil think 5 seconds is a very short time, previous bosch apps also had the problem to have to "wait" for about 1 min sometimes to get meshed devices status back...

Concerning @jlaur's mention of overall http request timeout:

I had a secondary issue because PV plant modbus stopped reporting which is bad when you try to "excess" charge the car (bud around noon on no clouds, so lucky)

So I had to basically reset all systems and it will be impossible to find the glitch...

Even the odd pre-daylight savings time change update of locales was an issue for the openhab box, so I was really busy.

Now close to sunset and all seems normal again.

Still the binding should offer to change timeout for individual devices (if e.g. it is known they need the zigbee mesh) that may be slowish sometimes in responding.

So maybe this should be converted to feature request?

lsiepel commented 7 months ago

So if I create a PR that introduces an adjustable timeout for the http-calls for boschsch binding this would be fixed?

jlaur commented 7 months ago

So if I create a PR that introduces an adjustable timeout for the http-calls for boschsch binding this would be fixed?

Maybe @david-pace can respond to this. Bosch SHC uses so-called "long-polling", i.e. a HTTP request that blocks until new data is available or some max waiting time has been reached. This specific kind of request should probably not be user-configurable, I would imagine? There is currently some ongoing work for improving error handling for long polling requests: #16002

lsiepel commented 7 months ago

In that case I don’t understand the 5000ms timeout in the title. As that does not apply to the long polling request.

jlaur commented 7 months ago

In that case I don’t understand the 5000ms timeout in the title. As that does not apply to the long polling request.

It doesn't apply to anything specific, so just making sure the proposed change won't affect long-polling.

Somewhat unrelated, I was struggling with long-polling some time ago: https://github.com/openhab/openhab-addons/issues/13182#issuecomment-1211202076

david-pace commented 7 months ago

It would be good to find out which exact HTTP call times out after 5 seconds. @grizzlyfred did you already try to set the log level for the boschshc binding to DEBUG and see which request(s) take longer than 5 seconds?

In general this can not be an issue of the long polling mechanism because it is designed in such a way that the response can take up to 20 seconds if no changes are registered. Currently the request timeout is even set to 30 seconds to compensate for any potential network issues/delays. So if the call times out after 5 seconds the problem is not the long polling.

Background information: the binding uses a combination of short polling and long polling to obtain the data from the Smart Home Controller. Short polling is required in some cases because some states/changes are simply not included in the long polling responses. I assume the 5 seconds timeout happens during one of the short polling operations.

david-pace commented 6 months ago

I think the timeout that is relevant here is SafeCaller.DEFAULT_TIMEOUT. It is used when initialize() is called on a handler. In our case probably in org.openhab.core.thing.internal.ThingManagerImpl.doInitializeHandler(ThingHandler).

This could still be indirectly caused by REST/HTTP requests that take too long. We need to investigate further and try to find out which calls exactly take too long during initialize(), and whether we can do something about it on the openHAB side.

EDIT: @grizzlyfred, you could try setting the log level to DEBUG with the following command in the openHAB console:

log:set DEBUG org.openhab.binding.boschshc

and then look which HTTP calls are made for the wall-thermostat before you get the log entry Initializing handler for thing 'boschshc:wall-thermostat:192-168-yy-zz:hdm_HomeMaticIP_3014...' takes more than 5000ms.

If this does not bring any new insights, you can even go to TRACE level:

log:set TRACE org.openhab.binding.boschshc

Don't forget to reset to DEFAULT after the analysis.

grizzlyfred commented 6 months ago

I appreciate the attention the issue gets. But too many things have changed in the meantime:

david-pace commented 6 months ago

Thanks for your response @grizzlyfred :+1:

I just had a look at the code again and found out that the log entry Initializing handler for thing ... takes more than 5000ms. is on WARNING level anyway so there is no need to go to DEBUG level if you want to check whether it still occurs.

I would propose the following: check your log files for this kind of entry. If we don't find it, I would propose to close this issue because if no one can reproduce it anymore, it will be hard or even impossible to fix it. It is still possible to re-open this issue should it occur again in the future.

david-pace commented 5 months ago

It seems like the issue is not reproducible anymore. Also there was no further response for about 4 weeks regarding any log entries that point to this particular issue.

I therefore decided to close this issue for now. @grizzlyfred, please feel free to reopen it anytime in case you encounter it again.