home-assistant / core

:house_with_garden: Open source home automation that puts local control and privacy first.
https://www.home-assistant.io
Apache License 2.0
73.52k stars 30.71k forks source link

HA is not reconnecting to ESPHome devices #65563

Closed maxim-le closed 2 years ago

maxim-le commented 2 years ago

The problem

Over the past few months I have been experiencing an issue where HA would disconnect from an ESPHome device and not try to reconnect. I included below some of the logs I am seeing. This issue is very statistical, I saw it happen once a month and I saw it happen a couple of times a week (even once a day).

Even more strange is the fact that none of these logs are at the time of disconnection (in the example below - the ESPHome device is disconnected from HA since 1PM and these logs are from around 10AM and there are no logs regarding ESPHome from around 1PM)

The ESPHome device itself is connected to the WiFi network and is accessible via the web interface. I am also using an API password and encryption key, a static IP, power save mode NONE and other recommended features enabled that should have fixed this issue. (see YAML example below).

I tried to reset the ESPHome device and update its ESPHome version - but it did not solve the issue (HA is still disconnected). However, when I reload the ESPHome integration - it all works (until it disconnects again). This is what led me to believe that the issue is probably in HA and not in ESPHome.

If there's anything else I can provide to help debug this issue, please let me know. Thank you in advance!

What version of Home Assistant Core has the issue?

core-2022.2.0

What was the last working version of Home Assistant Core?

No response

What type of installation are you running?

Home Assistant OS

Integration causing the issue

ESPHome

Link to integration documentation on our website

https://www.home-assistant.io/integrations/esphome/

Example YAML snippet

esphome:
  name: "esp2"
  platform: "ESP8266"
  board: "esp01_1m"

logger:

api:
  password: !secret api_password
  encryption:
    key: !secret api_key

ota:
  password: !secret ota_password

wifi:
  ssid: !secret wifi_ssid
  password: !secret wifi_password
  power_save_mode: NONE
  fast_connect: on
  reboot_timeout: 5min
  manual_ip:
    static_ip: "****"
    gateway: !secret wifi_gateway
    subnet: !secret wifi_subnet
    dns1: !secret wifi_dns1
    dns2: !secret wifi_dns2
  ap:
    ssid: !secret fallback_wifi_ssid
    password: !secret fallback_wifi_password
    ap_timeout: 2min

captive_portal:

web_server:
  port: 80
  include_internal: true

sensor:
  - platform: wifi_signal
    name: "WiFi Signal Sensor"
    internal: true
    update_interval: 60s

output:
  - platform: esp8266_pwm
    pin: 12
    frequency: 1000 Hz
    id: pwm_b
  - platform: esp8266_pwm
    pin: 13
    frequency: 1000 Hz
    id: pwm_g
  - platform: esp8266_pwm
    pin: 15
    frequency: 1000 Hz
    id: pwm_r

light:
  - platform: rgb
    name: "Light"
    id: esp2_light
    red: pwm_r
    green: pwm_g
    blue: pwm_b

Anything in the logs that might be useful for us?

2022-02-03 10:18:25 WARNING (MainThread) [aioesphomeapi.reconnect_logic] Can't connect to ESPHome API for esp1 @ ***: Error connecting to ('***', 6053): [Errno 113] Connect call failed ('***', 6053)
2022-02-03 10:19:09 WARNING (MainThread) [aioesphomeapi.reconnect_logic] Can't connect to ESPHome API for esp2 @ ***: Timeout waiting for response for <class 'api_pb2.ConnectRequest'>
2022-02-03 10:24:12 WARNING (MainThread) [homeassistant.components.websocket_api.http.connection] [548034806976] Disconnected: Did not receive auth message within 10 seconds
2022-02-03 10:24:32 WARNING (MainThread) [homeassistant.components.websocket_api.http.connection] [548271912416] Disconnected: Did not receive auth message within 10 seconds

Additional information

ESPHome integration Diagnostics

probot-home-assistant[bot] commented 2 years ago

esphome documentation esphome source (message by IssueLinks)

probot-home-assistant[bot] commented 2 years ago

Hey there @ottowinter, @jesserockz, mind taking a look at this issue as it has been labeled with an integration (esphome) you are listed as a code owner for? Thanks! (message by CodeOwnersMention)

OttoWinter commented 2 years ago

Hi, yeah I've seen this before too, but it's really hard to reproduce unfortunately (for me happens only once a month or so)

If you have a setup that reproduces this relatively frequently it would be great to get some verbose logs to try to see what might be happening.

In your Home Assistant (!!) configuration.yaml, put

logger:
  default: info
  logs:
    homeassistant.components.esphome: debug
    aioesphomeapi.reconnect_logic: debug
    aioesphomeapi._frame_helper: debug
    aioesphomeapi.connection: debug
    aioesphomeapi.host_resolver: debug

And post the debug logs of around the time of the last connect attempt

Note: those settings make HA debug all esphome traffic, so might be good to first check if there's something sensitive in there and if so filter it out.

Thanks!

maxim-le commented 2 years ago

Sure thing! I'll set this up and post the logs here when it'll happen again :)

cvwillegen commented 2 years ago

I see these disconnects way more often than once a month or week, so I will also provide logs.

Like the OP said, when I look at the logs via the web interface, the device connects without any problems, but HA still shows entities as "Unavailable"...

RobinMarsollier commented 2 years ago

I got those as well every 12-14 hours or so. Setting up the debug of esphome in HA too.

For what I could observe so far, it affect only esp boards where noise encryption is set up (half of my fleet).

cvwillegen commented 2 years ago

esphome.log

Here's a bit of a log file showing disconnects. I also have (all my) nodes on Noise, like @RobinMarsollier.

RobinMarsollier commented 2 years ago

esp_filtered.log

Some more logs (10 min, including the moment of the disconnect), disconnects happens at 21:03:42 for bedroom_sensors (192.168.0.28), and from this point on, no trace of it on the logs except for this line a few minutes later : 2022-02-08 21:14:39 DEBUG (MainThread) [aioesphomeapi.reconnect_logic] bedroom_sensors @ 192.168.0.28: Triggering reconnect because of received mDNS record record[ptr,in,_esphomelib._tcp.local.]=4500/4499,bedroom_sensors._esphomelib._tcp.local.

OttoWinter commented 2 years ago

Thanks for the logs guys/gals!

Using the logs I was able to identify one real problem, when it's merged perhaps you can check if https://github.com/home-assistant/core/pull/66189 fixes the issue for you.

Specifically (from the logs from @RobinMarsollier) here's an example.

First the device disconnects, and a reconnect is immediately started

2022-02-08 21:03:42 INFO (MainThread) [aioesphomeapi.reconnect_logic] Disconnected from ESPHome API for bedroom_sensors @ 192.168.0.28
2022-02-08 21:03:42 DEBUG (MainThread) [aioesphomeapi.connection] bedroom_sensors @ 192.168.0.28: Socket closed, stopping read loop
2022-02-08 21:03:42 DEBUG (MainThread) [aioesphomeapi.connection] bedroom_sensors @ 192.168.0.28: Connecting to 192.168.0.28:6053 (AddrInfo(family=<AddressFamily.AF_INET: 2>, type=<SocketKind.SOCK_STREAM: 1>, proto=6, sockaddr=IPv4Sockaddr(address='192.168.0.28', port=6053)))

That socket is opened successfully and a TCP connection established, as we can see by this later message

2022-02-08 21:03:49 DEBUG (MainThread) [aioesphomeapi.connection] 192.168.0.28: Opened socket for

https://github.com/esphome/aioesphomeapi/blob/v10.8.1/aioesphomeapi/connection.py#L167

Next is initializing the frame helper and performing the handshake, and indeed we can see an (empty) ClientHello handshake frame being transmitted in the next log line

2022-02-08 21:03:49 DEBUG (MainThread) [aioesphomeapi._frame_helper] Sending frame 

https://github.com/esphome/aioesphomeapi/blob/v10.8.1/aioesphomeapi/_frame_helper.py#L148

Now the logging here isn't great (unable to see which Received frame corresponds to which connection), but I think what's happening is the connection is silently closed (for example by the device dropping off the network at just the right time), and so the read_frame to get the server's response just hangs forever.

That is a bug and is fixed by https://github.com/home-assistant/core/pull/66189, but I can't say with certainty if it's the issue you're seeing.

RobinMarsollier commented 2 years ago

I'll wait for it to be released to test it (winter here, I rely on HA quite a lot, i can't to hack too much right now). If you need to expand the logging to track it down more precisely, i'd be happy to follow-up on it here tho.

cvwillegen commented 2 years ago

If I make those changes in the 3 files listed in the commit, will the updated component automatically be downloaded, or do I need to download this manually?

cvwillegen commented 2 years ago

Ok, found the changes in the https://github.com/esphome/aioesphomeapi/pull/176/files PR, applied those in root@a0d7b954-esphome:/usr/local/lib/python3.9/dist-packages/aioesphomeapi, cleared the __pycache__ for those 2 files, restarted HA.

In the ESPHome dashboard, all devices I expect to be online are online, and I'll check if any of them fall out. I have a dashboard showinf exactly the 4 sensors I need for that :-)

RobinMarsollier commented 2 years ago

Looks like it did the trick on my end, no more disconnected board on the past days.

cvwillegen commented 2 years ago

I have seen sporadic disconnects, but they almost always reconnected within some dozens (tens? Not sure if that's a correct sentence...) of seconds.

maxim-le commented 2 years ago

Updated to 2022.2.6 a few days ago and no problems since! :) Thanks for all the help! I'll keep you posted if I experience any more disconnects

OttoWinter commented 2 years ago

Ok, as this issue appears to be fixed I'll close it. Note: with "the issue" I mean HA stopping reconnecting at all after a while (not sporadic disconnects, these can be for any number of reasons, like wifi coverage etc)