Connection issues causing broker to disconnect

Nick-Adams-AU commented 1 year ago

Hey,

Thanks so much for this integration! On first connect, it works a treat and is a handy inclusion into HA. Thank you!

I appreciate that this integration is new and still a bit raw. I seem to be having some connection timeout issues. If I reload the integration, the connection seems to work fine for a number of hours but I have noticed that it will eventually die a number of hours later (~5+ hours). The logged errors are below.

Do we need we need either a keepalive or a re-connect on failure check?

Logger: custom_components.mydolphin_plus.component.api.aws_iot_websocket
Source: custom_components/mydolphin_plus/component/api/aws_iot_websocket.py:176
Integration: MyDolphin Plus ([documentation](https://github.com/sh00t2kill/dolphin-robot), [issues](https://github.com/sh00t2kill/dolphin-robot/issues))
First occurred: October 28, 2022 at 10:00:19 AM (2 occurrences)
Last logged: October 28, 2022 at 5:00:20 PM
Rejected message for $aws/things/XXXXXXX/shadow/get/rejected, Message: {"code":429,"message":"TOO_MANY_REQUESTS"}

Logger: custom_components.mydolphin_plus.component.api.aws_iot_websocket
Source: custom_components/mydolphin_plus/component/api/aws_iot_websocket.py:347
Integration: MyDolphin Plus ([documentation](https://github.com/sh00t2kill/dolphin-robot), [issues](https://github.com/sh00t2kill/dolphin-robot/issues))
First occurred: October 30, 2022 at 6:42:09 PM (1 occurrences)
Last logged: October 30, 2022 at 6:42:09 PM
Failed to publish message: {'state': {'desired': {'systemState': {'pwsState': 'on'}}}} to $aws/things/D3665TQM/shadow/update, Broker is not connected

Logger: custom_components.mydolphin_plus.component.api.mydolphin_plus_api
Source: custom_components/mydolphin_plus/component/api/mydolphin_plus_api.py:274
Integration: MyDolphin Plus ([documentation](https://github.com/sh00t2kill/dolphin-robot), [issues](https://github.com/sh00t2kill/dolphin-robot/issues))
First occurred: October 31, 2022 at 11:39:01 AM (1 occurrences)
Last logged: October 31, 2022 at 11:39:01 AM
Failed to post JSON to https://mbapp18.maytronics.com/api/serialnumbers/getrobotdetailsbymusn/, Error: Cannot connect to host mbapp18.maytronics.com:443 ssl:False [Try again], Line: 97

Logger: custom_components.mydolphin_plus.component.api.mydolphin_plus_api
Source: custom_components/mydolphin_plus/component/api/mydolphin_plus_api.py:166
Integration: MyDolphin Plus ([documentation](https://github.com/sh00t2kill/dolphin-robot), [issues](https://github.com/sh00t2kill/dolphin-robot/issues))
First occurred: October 28, 2022 at 7:57:20 PM (8419 occurrences)
Last logged: 10:58:00 AM

    Failed to post JSON to https://mbapp18.maytronics.com/api/serialnumbers/getrobotdetailsbymusn/, HTTP Status: Unauthorized (401)
    Failed to retrieve Robot Details, Error: 'NoneType' object has no attribute 'get', Line: 276

sh00t2kill commented 1 year ago

@Nick-Adams-AU are you having connectivity issues ? The cant connect to host errors are weird. This is .. sort of handled, but not in the initial authentication part --- it either works or it doesnt -- if it fails to connect it wont keep retrying.

Once it auths, there is retry and reconnect logic.

Does the aws broker sensor set to off ? In the interim you could use this value to restart the integration automatically, using a service call to home_assistant.reload_config_entry

Nick-Adams-AU commented 1 year ago

So I can load the integration and it will stay connected for many hours without issue. When it is "healthy", the "AWS Broker" sensor shows connected. After some indeterminate time, it will move to "disconnected" and it will never reconnect until I either reload the integration or restart HA. If I reload the integration, as you suggested, it comes back up immediately and is fine again for a few hours.

I have a (very?) reliable internet connection and don't have issues with any other web polling integrations. It is possible that my firewall is killing long connections or the odd packet goes missing here or there but generally, my HA integrations are rock solid.

Looking at the errors, the integration doesn't seem to be reconnecting?

elad-bar commented 1 year ago

I had the same error around that time, seems that the servers got into maintaince mode or something like that, I will work during the weekend to improve the recovery in case of such error.

elad-bar commented 1 year ago

actually, the recovery worked pretty good and it reconnected once the issue solved, it began at 6:56:31 AM IL TZ, tried to reconnect every 2 minutes, up until 7:18:32 AM which fainlly connected.

sh00t2kill commented 1 year ago

Issue caused by vendor outage

sh00t2kill / dolphin-robot

Connection issues causing broker to disconnect #75