ksya / ha-nefiteasy

Nefit Easy connection for Home Assistant
74 stars 30 forks source link

Frequent disconnects #243

Closed pathia closed 5 months ago

pathia commented 1 year ago

I am experiencing frequent disconnects with the nefit (bosch) cloud. Sometimes every few hours. I've read a fix for during boot up but not for this specific. My home automation is a mess with tests and such to make it work reliable. Is there anything I can do to fix this issue? I don't understand why it is doing this. My internet is super stable and also ha is hard wired. Could it have something to do with the fact that I also have the nefit easy app on my phone doing its updates in the background and I do too much updates per specific period?

RobBie1221 commented 1 year ago

In my opinion, having disconnects is something natural that could happen due to any reason. The infrastructure was not designed to have an open connection 24/7 as the app only makes a connection once opened. I think the biggest issue is that the integration deals with it as if something fatal. I started a branch that handles disconnects silently by just reconnecting. Only when the server is not available for reconnecting it’s considered as fatal. Never had the time though to finish that branch…

pathia commented 1 year ago

Hi Robbie, That would indeed be the cleanest fix. I sometimes have multiple disconnects per hour. The worst part for me is that my commands to turn up the temperature in the morning does not always arrive at my Nefit heater so the house is cold in the morning. I guess buffering these commands is also not easy to implement.... afbeelding

RobBie1221 commented 1 year ago

I have no clue what influences the behavior. Multiple times an hour seems a lot. How is your WiFi thermostat connection? I think it actually tries to reach that endpoint to get data.

You could try this branch: https://github.com/RobBie1221/ha-nefiteasy/tree/refactor_init

I think as to the normal flow it’s pretty much done. The thing not done is to start a reauth flow when you change credentials, but as long as you don’t change them you shouldn’t trigger this.

pathia commented 1 year ago

I could always try, indeed. I tried enabling the beta version in HACS but unfortunately I cannot see this branch, I guess. Or do I need to select 'dev' ?

pathia commented 1 year ago

And regarding the Wi-Fi connection for the thermostat I think it should be stable. It is on 2.4 GHz and my laptop is on 5 GHz only. But I can add some monitoring where I ping the thermostat. I wasn't aware that the Bosch cloud is only redirecting to my local thermostat.

RobBie1221 commented 1 year ago

It won’t work through HACS. For you to be able to pick it up as beta in HACS I first need to merge it here and create a beta release. I don’t want to do that yet, never know how many people have beta enabled and pick it up automatically.

Best way is to download from GitHub after following above link and to ssh to ha and copy it manually.

pathia commented 1 year ago

I got: AttributeError: module 'collections' has no attribute 'MutableSet' And tried to revert back to the old version but it kept saying that. In the end I reverted to my automatic nightly snapshot and all is fine - that is back to where I started. Pfiew.

RobBie1221 commented 1 year ago

I think it’s not compatible with current HA version…

Has been quite some time since I worked on that.

pathia commented 1 year ago

Could just as well be the case yes. I think I will try to write an automation to try and reload the integration whenever it fails. And hopefully one day I can upgrade to the fixed version. Thanks a lot for your help so far!

pathia commented 1 year ago

I think it actually tries to reach that endpoint to get data.

Just as an FYI: Yesterday the Nefit integration was unavailable from about 22u until 4:37. Using local monitoring I found out that my physical thermostat was offline from Wi-Fi for exactly that period. So yes I can confirm it does seem to actually reach out to my thermostat. I see it is connecting to the wrong access point throughout my house. I think this issue can be closed.

RobBie1221 commented 1 year ago

I dusted off the code that you tried and made it work (it needed rebasing on the current branch). I published a pre-release, you should now be able to pick it up from HACS when looking for beta.

pathia commented 1 year ago

Awesome! I installed this 0.4.0beta version and well at least it works for now. Let's see how it keeps up. Thanks.

pathia commented 1 year ago

Unfortunately I have my nefit connection down for 11 hours now but a local ping to the device does work. Is there anything I can do to troubleshoot before I reload the plugin?

holtkamp commented 10 months ago

Hey all, I have been using the dev-branch https://github.com/ksya/ha-nefiteasy/tree/dev since 06-07-2023 (so it includes the changes mentioned by https://github.com/ksya/ha-nefiteasy/issues/243#issuecomment-1502218785) on a new installation of Home Assistant and still the integration becomes unavailable / disconnects every few days:

Screenshot 2023-08-16 at 10 15 05

Reloading the integration resolves the issue.

Is there anything I can do to help to find the cause of this issue?

Already got "Debug logging" enabled on the integration and the following occurs when the integrations disconnects:

2023-08-15 14:08:27.038 DEBUG (MainThread) [custom_components.nefiteasy] Finished fetching nefiteasy data in 1.634 seconds (success: True)
2023-08-15 14:09:29.029 DEBUG (MainThread) [custom_components.nefiteasy] Finished fetching nefiteasy data in 1.622 seconds (success: True)
2023-08-15 14:10:31.041 DEBUG (MainThread) [custom_components.nefiteasy] Finished fetching nefiteasy data in 1.628 seconds (success: True)
2023-08-15 14:11:41.581 ERROR (MainThread) [custom_components.nefiteasy] Timeout fetching nefiteasy data
2023-08-15 14:11:41.583 DEBUG (MainThread) [custom_components.nefiteasy] Finished fetching nefiteasy data in 10.200 seconds (success: False)
2023-08-15 14:12:30.603 DEBUG (MainThread) [custom_components.nefiteasy] Unexpected disconnect of SERIAL_ID with Bosch server. Try to reconnect..
2023-08-15 14:12:41.405 DEBUG (MainThread) [custom_components.nefiteasy] Starting reconnect procedure.
2023-08-15 14:12:41.405 DEBUG (MainThread) [custom_components.nefiteasy] Start connecting.
2023-08-15 14:12:41.406 DEBUG (MainThread) [custom_components.nefiteasy] Waiting for connected event.
2023-08-15 14:12:41.410 DEBUG (MainThread) [custom_components.nefiteasy] Connected successfully.
2023-08-15 14:12:41.411 ERROR (MainThread) [custom_components.nefiteasy] Unexpected error fetching nefiteasy data: 
Traceback (most recent call last):
  File "/usr/src/homeassistant/homeassistant/helpers/update_coordinator.py", line 283, in _async_refresh
    self.data = await self._async_update_data()
                ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/config/custom_components/nefiteasy/__init__.py", line 262, in _async_update_data
    await self.connect()
  File "/config/custom_components/nefiteasy/__init__.py", line 148, in connect
    self.nefit.get("/gateway/brandID")
  File "/usr/local/lib/python3.11/site-packages/aionefit/__init__.py", line 107, in get
    self.xmppclient.send_message(mto=self._to, mbody=msg)
  File "/usr/local/lib/python3.11/site-packages/aionefit/provider/slixmpp_impl.py", line 93, in send_message
    message.stream.send_raw(str_data)
  File "/usr/local/lib/python3.11/site-packages/slixmpp/xmlstream/xmlstream.py", line 1354, in send_raw
    raise NotConnectedError()
slixmpp.xmlstream.xmlstream.NotConnectedError
2023-08-15 14:12:41.432 DEBUG (MainThread) [custom_components.nefiteasy] Finished fetching nefiteasy data in 0.027 seconds (success: False)
2023-08-15 14:13:41.408 DEBUG (MainThread) [custom_components.nefiteasy] Starting reconnect procedure.
2023-08-15 14:13:41.408 DEBUG (MainThread) [custom_components.nefiteasy] Start connecting.
2023-08-15 14:13:41.408 DEBUG (MainThread) [custom_components.nefiteasy] Connection procedure is already running.
2023-08-15 14:13:41.408 DEBUG (MainThread) [custom_components.nefiteasy] Finished fetching nefiteasy data in 0.001 seconds (success: False)
2023-08-15 14:14:41.406 DEBUG (MainThread) [custom_components.nefiteasy] Starting reconnect procedure.
2023-08-15 14:14:41.407 DEBUG (MainThread) [custom_components.nefiteasy] Start connecting.
2023-08-15 14:14:41.407 DEBUG (MainThread) [custom_components.nefiteasy] Connection procedure is already running.
2023-08-15 14:14:41.407 DEBUG (MainThread) [custom_components.nefiteasy] Finished fetching nefiteasy data in 0.001 seconds (success: False)
2023-08-15 14:15:41.404 DEBUG (MainThread) [custom_components.nefiteasy] Starting reconnect procedure.
2023-08-15 14:15:41.404 DEBUG (MainThread) [custom_components.nefiteasy] Start connecting.
2023-08-15 14:15:41.404 DEBUG (MainThread) [custom_components.nefiteasy] Connection procedure is already running.
2023-08-15 14:15:41.404 DEBUG (MainThread) [custom_components.nefiteasy] Finished fetching nefiteasy data in 0.000 seconds (success: False)
2023-08-15 14:16:41.403 DEBUG (MainThread) [custom_components.nefiteasy] Starting reconnect procedure.
2023-08-15 14:16:41.404 DEBUG (MainThread) [custom_components.nefiteasy] Start connecting.
2023-08-15 14:16:41.404 DEBUG (MainThread) [custom_components.nefiteasy] Connection procedure is already running.
2023-08-15 14:16:41.404 DEBUG (MainThread) [custom_components.nefiteasy] Finished fetching nefiteasy data in 0.000 seconds (success: False)
2023-08-15 14:17:41.404 DEBUG (MainThread) [custom_components.nefiteasy] Starting reconnect procedure.
2023-08-15 14:17:41.404 DEBUG (MainThread) [custom_components.nefiteasy] Start connecting.
2023-08-15 14:17:41.404 DEBUG (MainThread) [custom_components.nefiteasy] Connection procedure is already running.
2023-08-15 14:17:41.405 DEBUG (MainThread) [custom_components.nefiteasy] Finished fetching nefiteasy data in 0.001 seconds (success: False)
.....

Reconnecting seems successful at 2023-08-15 14:12:41.410, but loading the data after reconnecting seems to fail and will keep failing every minute after that...

I know very little Python, but the documentation at https://slixmpp.readthedocs.io/en/latest/_modules/slixmpp/xmlstream/xmlstream.html describes the slixmpp.xmlstream.xmlstream.NotConnectedError as:

Raised when we try to send something over the wire but we are not connected.

So maybe reconnecting is not done "completely"?

RobBie1221 commented 10 months ago

I'm not sure what happens after the exception, but it's not good that there is an uncaught exception. It has been a while since I did something in this code, but apparently in the data update coordinator an exception is thrown which was not foreseen during development. That needs improvement.

RobBie1221 commented 10 months ago

I just looked, this is a very strange place to get an exception. The connected event is raised, and immediately after it does a self.nefit.get("/gateway/brandID") which raises a NotConnectedError so apparently it is able to connect however the connection is dropped immediately after that 🤷

RobBie1221 commented 10 months ago

try changing this block in __init.py__:

if self.connected_state == STATE_CONNECTED:
    self.nefit.get("/gateway/brandID")
    try:
        await asyncio.wait_for(
            self.nefit.xmppclient.message_event.wait(), timeout=29.0
        )
    except asyncio.TimeoutError:
        _LOGGER.debug(
            "Did not get a response in time for testing connection."
        )
    except:  # noqa: E722 pylint: disable=bare-except
        _LOGGER.debug("No connection while testing connection.")
    else:
        self.nefit.xmppclient.message_event.clear()

        # No exception and no auth error
        if self.connected_state == STATE_CONNECTED:
            self.connected_state = STATE_CONNECTION_VERIFIED

to

if self.connected_state == STATE_CONNECTED:
    try: 
        self.nefit.get("/gateway/brandID")
    except slixmpp.xmlstream.xmlstream.NotConnectedError:
        selct.connected_state == STATE_INIT
        return
    try:
        await asyncio.wait_for(
            self.nefit.xmppclient.message_event.wait(), timeout=29.0
        )
    except asyncio.TimeoutError:
        _LOGGER.debug(
            "Did not get a response in time for testing connection."
        )
    except:  # noqa: E722 pylint: disable=bare-except
        _LOGGER.debug("No connection while testing connection.")
    else:
        self.nefit.xmppclient.message_event.clear()

        # No exception and no auth error
        if self.connected_state == STATE_CONNECTED:
            self.connected_state = STATE_CONNECTION_VERIFIED
holtkamp commented 10 months ago

Thanks for the swift response!

I added the additional try statement.

Note you made a typo:

selct.connected_state == STATE_INIT => self.connected_state == STATE_INIT

Will monitor it and report back

holtkamp commented 10 months ago

Got some new logs:

2023-08-17 09:32:30.992 DEBUG (MainThread) [custom_components.nefiteasy] Finished fetching nefiteasy data in 1.770 seconds (success: True)
2023-08-17 09:33:31.961 DEBUG (MainThread) [custom_components.nefiteasy] Finished fetching nefiteasy data in 1.739 seconds (success: True)
2023-08-17 09:34:31.889 DEBUG (MainThread) [custom_components.nefiteasy] Unexpected disconnect of 558912211 with Bosch server. Try to reconnect..
2023-08-17 09:34:40.233 ERROR (MainThread) [custom_components.nefiteasy] Timeout fetching nefiteasy data
2023-08-17 09:34:40.236 DEBUG (MainThread) [custom_components.nefiteasy] Finished fetching nefiteasy data in 9.013 seconds (success: False)
2023-08-17 09:35:40.223 DEBUG (MainThread) [custom_components.nefiteasy] Starting reconnect procedure.
2023-08-17 09:35:40.224 DEBUG (MainThread) [custom_components.nefiteasy] Start connecting.
2023-08-17 09:35:40.224 DEBUG (MainThread) [custom_components.nefiteasy] Waiting for connected event.
2023-08-17 09:35:40.235 DEBUG (MainThread) [custom_components.nefiteasy] Connected successfully.
2023-08-17 09:35:40.235 ERROR (MainThread) [custom_components.nefiteasy] Unexpected error fetching nefiteasy data: 
Traceback (most recent call last):
  File "/usr/src/homeassistant/homeassistant/helpers/update_coordinator.py", line 283, in _async_refresh
    self.data = await self._async_update_data()
                ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/config/custom_components/nefiteasy/__init__.py", line 262, in _async_update_data
    """Update data via library."""
        ^^^^^^^^^^^^^^^^^^^^
  File "/config/custom_components/nefiteasy/__init__.py", line 148, in connect
    try:
  File "/usr/local/lib/python3.11/site-packages/aionefit/__init__.py", line 107, in get
    self.xmppclient.send_message(mto=self._to, mbody=msg)
  File "/usr/local/lib/python3.11/site-packages/aionefit/provider/slixmpp_impl.py", line 93, in send_message
    message.stream.send_raw(str_data)
  File "/usr/local/lib/python3.11/site-packages/slixmpp/xmlstream/xmlstream.py", line 1354, in send_raw
    raise NotConnectedError()
slixmpp.xmlstream.xmlstream.NotConnectedError
2023-08-17 09:35:40.266 DEBUG (MainThread) [custom_components.nefiteasy] Finished fetching nefiteasy data in 0.043 seconds (success: False)
2023-08-17 09:36:40.223 DEBUG (MainThread) [custom_components.nefiteasy] Starting reconnect procedure.
2023-08-17 09:36:40.224 DEBUG (MainThread) [custom_components.nefiteasy] Start connecting.
2023-08-17 09:36:40.224 DEBUG (MainThread) [custom_components.nefiteasy] Connection procedure is already running.
2023-08-17 09:36:40.224 DEBUG (MainThread) [custom_components.nefiteasy] Finished fetching nefiteasy data in 0.001 seconds (success: False)
2023-08-17 09:37:40.228 DEBUG (MainThread) [custom_components.nefiteasy] Starting reconnect procedure.
2023-08-17 09:37:40.228 DEBUG (MainThread) [custom_components.nefiteasy] Start connecting.
2023-08-17 09:37:40.229 DEBUG (MainThread) [custom_components.nefiteasy] Connection procedure is already running.
2023-08-17 09:37:40.229 DEBUG (MainThread) [custom_components.nefiteasy] Finished fetching nefiteasy data in 0.000 seconds (success: False)
2023-08-17 09:38:40.223 DEBUG (MainThread) [custom_components.nefiteasy] Starting reconnect procedure.
2023-08-17 09:38:40.224 DEBUG (MainThread) [custom_components.nefiteasy] Start connecting.
2023-08-17 09:38:40.224 DEBUG (MainThread) [custom_components.nefiteasy] Connection procedure is already running.
2023-08-17 09:38:40.224 DEBUG (MainThread) [custom_components.nefiteasy] Finished fetching nefiteasy data in 0.001 seconds (success: False)

3 lines were added by the extra try clause.

The line numbers in the error message are the same, but the message has changed a bit, exactly those 3 line numbers...:

old:

  File "/config/custom_components/nefiteasy/__init__.py", line 262, in _async_update_data
    await self.connect()

new:

  File "/config/custom_components/nefiteasy/__init__.py", line 262, in _async_update_data
    """Update data via library."""

Not sure if that makes sense.