home-assistant / core

:house_with_garden: Open source home automation that puts local control and privacy first.
https://www.home-assistant.io
Apache License 2.0
71.13k stars 29.81k forks source link

ESPHome unable to connect after esp changes IP #45116

Closed frog32 closed 3 years ago

frog32 commented 3 years ago

The problem

When an ESP is changing its IP address home assistant loses connectivity and needs a restart to connect again.

Environment

Problem-relevant configuration.yaml

None

Traceback/Error logs

None

Additional information

probot-home-assistant[bot] commented 3 years ago

esphome documentation esphome source (message by IssueLinks)

probot-home-assistant[bot] commented 3 years ago

Hey there @ottowinter, mind taking a look at this issue as its been labeled with an integration (esphome) you are listed as a codeowner for? Thanks! (message by CodeOwnersMention)

WizBangCrash commented 3 years ago

Hi @frog32, FYI I am having the same issue and just commented on an existing issue in the ESPHome repo: https://github.com/esphome/issues/issues/1435

CarlosGS commented 3 years ago

From time to time, I also need to restart hassos to bring back some esphome devices.

Come to think of it, my hassos is running on a virtualbox VM on a wifi-connected computer - what's your setup?

WizBangCrash commented 3 years ago

I'm running in a docker container on a Synology NAS connected via Ethernet.

logan893 commented 3 years ago

I just ran into this same issue after a router restart, which caused my ESPHome device (only have one so far) to changed its IP. The ESPHome integration (homeassistant.components.esphome) thinks my ESPHome device should still be reachable at its old IP, and is unable to reconnect.

2021-02-23 15:22:13 INFO (MainThread) [homeassistant.components.esphome] Can't connect to ESPHome API for esp-climate-display-and-probe-1 (x.x.x.x): Error connecting to ('x.x.x.x', 6053): [Errno 113] Connect call failed ('x.x.x.x', 6053) 2021-02-23 15:22:13 INFO (MainThread) [homeassistant.components.esphome] Trying to reconnect to x.x.x.x in 60 seconds

The ESPHome add-on has no problem with the new IP, and I can connect to it from that GUI to fetch logs.

I'm running HassOS 5.11 and HA core-2021.2.3 on a Raspberry Pi 3B.

github-actions[bot] commented 3 years ago

There hasn't been any activity on this issue recently. Due to the high number of incoming GitHub notifications, we have to clean some of the old issues, as many of them have already been resolved with the latest updates. Please make sure to update to the latest Home Assistant version and check if that solves the issue. Let us know if that works for you by adding a comment 👍 This issue has now been marked as stale and will be closed if no further activity occurs. Thank you for your contributions.

Fabi0San commented 3 years ago

The issue is still active on the latest build.

bdraco commented 3 years ago

@Fabi0San Can you post the contents of your .storage/core.config_entries for the esphome device in question you are having trouble with (remove any sensitive data)

Fabi0San commented 3 years ago

Sure, this happens to any of my devices whenever they change IP addresses for whatever reason (typically router restart).

{ "entry_id": "164af54c339fec0c0f1cb71497918f92", "version": 1, "domain": "esphome", "title": "main_nook_ls", "data": { "host": "192.168.50.42", "port": 6053, "password": "shhhhh" }, "options": {}, "system_options": { "disable_new_entities": false }, "source": "user", "connection_class": "local_push", "unique_id": "main_nook_ls", "disabled_by": null },

bdraco commented 3 years ago

Is the device discoverable via zeroconf?

Turn on debug logging for homeassistant.components.zeroconf

If not, it wouldn't be able to update the IP

Fabi0San commented 3 years ago

Yes all my esphome devices get discovered as soon as I add them to the network. I thought the host property in config_entries should show the hostname instead of ip, but you are saying zeroconf will find the same device with a different ip and update that record? What is the key? unique_id? is that advertised from esphome?

I'll need some time to turn on the logging and try to reproduce the issue.

bdraco commented 3 years ago

The key is the unique id (main_nook_ls) The hostname will be main_nook_ls.local. under _esphomelib._tcp.local.

WizBangCrash commented 3 years ago

Hi @bdraco, I reproduced the issue this morning. My core.config_entries entry before I started was:

{
  "entry_id": "8590c713d756830ccb180e1e532a9735",
  "version": 1,
  "domain": "esphome",
  "title": "office_nodemcu_sensor",
  "data": {
      "host": "192.168.123.163",
      "port": 6053,
      "password": "shhhh"
  },
  "options": {},
  "system_options": {
      "disable_new_entities": false
  },
  "source": "zeroconf",
  "connection_class": "local_push",
  "unique_id": "office_nodemcu_sensor",
  "disabled_by": null
}

I rebooted HA with zeroconf debug turned on and have the following entry:

306: 2021-05-25 08:19:20 DEBUG (zeroconf-ServiceBrowser__kizbox._tcp.local.-_leap._tcp.local.-_miio._udp.local.-_printer._tcp.local.-_wled._tcp.local.-_mediaremotetv._tcp.local.-_plugwise._tcp.local.-_elg._tcp.local.-_esphomelib._tcp.local.-_touch-able._tcp.local.-_Volumio._tcp.local.-_xbmc-jsonrpc-h._tcp.local.-_bond._tcp.local.-_daap._tcp.local.-_nut._tcp.local.-_axis-video._tcp.local.-_hap._tcp.local.-_ipp._tcp.local.-_dvl-deviceapi._tcp.local.-_hap._udp.local.-_http._tcp.local.-_dkapi._tcp.local.-_enphase-envoy._tcp.local.-_homekit._tcp.local.-_fbx-api._tcp.local.-_ssh._tcp.local.-_api._udp.local.-_ipps._tcp.local.-_googlecast._tcp.local.-_viziocast._tcp.local.-_spotify-connect._tcp.local._15671) [homeassistant.components.zeroconf] Discovered new device office_nodemcu_sensor._esphomelib._tcp.local. {'host': '192.168.123.163', 'port': 6053, 'hostname': 'office_nodemcu_sensor.local.', 'type': '_esphomelib._tcp.local.', 'name': 'office_nodemcu_sensor._esphomelib._tcp.local.', 'properties': {'_raw': {'mac': b'840d8e8d55b0', 'address': b'office_nodemcu_sensor.lan', 'version': b'1.15.3'}, 'mac': '840d8e8d55b0', 'address': 'office_nodemcu_sensor.lan', 'version': '1.15.3'}}

I then changed my DHCP server to assign a static address of 192.168.123.199 to the device and rebooted it. The device restarted and got an address of .199, but I did not see a zeroconf entry in the logs and the core.config_entries are unchanged. The following warning was in the logs:

2021-05-25 08:24:33 WARNING (MainThread) [homeassistant.components.esphome] Can't connect to ESPHome API for office_nodemcu_sensor (192.168.123.163): Timeout while connecting to ('192.168.123.163', 6053)

I also upgraded the device to the latest version of esphome that I'm running (1.17.1) and after the reboot there were still no zeroconf entries.

WizBangCrash commented 3 years ago

One thing to note about my setup. Not sure if it is relevant or not. I have the following set in my device esphome config:

wifi:
  ssid: !secret devices_ssid
  password: !secret devices_wifi_password
  fast_connect: True
  domain: .lan
  reboot_timeout: 10min

i.e. I set the domain to .lan

bdraco commented 3 years ago

Only .local is supported. You should remove the domain setting and let it use the default as the mdns/zeroconf stack will only accept .local

WizBangCrash commented 3 years ago

Understood, but how does zeroconf recognise it and add it in the first place? If you look at my log entry in the previous post the device name is office_nodemcu_sensor.lan. If it found it when it was first added, why doesn't it recognise it when its IP address changes?

WizBangCrash commented 3 years ago

So I changed my domain to .local, reprogrammed the device and rebooted. It is reporting its name as office_nodemcu_sensor.local and its IP address as .199

wifi:
  ssid: !secret devices_ssid
  password: !secret devices_wifi_password
  fast_connect: True
  domain: .local
  reboot_timeout: 10min
[13:41:58][C][homeassistant.time:011]:   Timezone: 'GMT0BST-1,M3.4.0/1,M10.5.0/2'
[13:41:58][C][version.text_sensor:021]: Version Text Sensor 'office_nodemcu_sensor_version'
[13:41:58][C][version.text_sensor:021]:   Icon: 'mdi:new-box'
[13:41:58][C][web_server:136]: Web Server:
[13:41:58][C][web_server:137]:   Address: office_nodemcu_sensor.local:80
[13:41:58][C][ota:029]: Over-The-Air Updates:
[13:41:58][C][ota:030]:   Address: office_nodemcu_sensor.local:8266
[13:41:58][C][api:095]: API Server:
[13:41:58][C][api:096]:   Address: office_nodemcu_sensor.local:6053
[13:41:58][C][wifi_info:009]: WifiInfo IPAddress 'office_nodemcu_sensor_ip'
[13:41:58][C][wifi_info:009]:   Icon: 'mdi:ip-network-outline'
[13:41:58][C][wifi_signal.sensor:009]: WiFi Signal 'office_nodemcu_sensor_wifi_signal'

Zeroconf is still not picking it up or reporting the new IP address.

WizBangCrash commented 3 years ago

@bdraco Having done a lot more debugging. I think the .lan statement is wrong. Regardless of the domain you set in the esphome config, the device will always send out an mDNS message using the .local domain. I have been using the Discovery tool from http://www.tildesoft.com on my Mac to see whether the esphome device correctly advertises its name and address when I change its IP. In the screenshot you can see that I have set office_nodemcu_sensor to the .local domain and moes_lightswitch_1 to the .lan domain and both are advertising a .local address too. In all cases the esphome device is broadcasting new mDNS messages giving out the correct information when its IP address changes, so it would appear that HomeAssistant is either ignoring new messages from the same device name, or catching the message but ignoring the IP address change. Screenshot 2021-05-26 at 08 29 58

bdraco commented 3 years ago

It is still broadcasting with a .local name but using a .lan in the address field 🤷‍♂️ , but that seems like a red herring here.

bdraco commented 3 years ago

The esphome reconnect listener does not check for IP changes: https://github.com/home-assistant/core/blob/dev/homeassistant/components/esphome/__init__.py#L457

The zeroconf config flow does https://github.com/home-assistant/core/blob/dev/homeassistant/components/esphome/config_flow.py#L95

bdraco commented 3 years ago

Looks like we never fire off another config flow announcement when the IP changes.

In theory #47683 should have added support for this, but it seems like we need to do something else as well to get it to work

bdraco commented 3 years ago

Thats because we never get a service updated callback with the new IP from zeroconf.

bdraco commented 3 years ago

The record update is seen by the service browser

2021-05-26 13:57:01 DEBUG (zeroconf-Engine-242) [homeassistant.components.zeroconf.models] update_record: record[a,in-unique,testesphome.local.]=120/119,192.168.208.127 1622037421062.2422

bdraco commented 3 years ago

I was able to get it to work .. sometimes....

It looks like a race condition in zeroconf so we sometimes don't see the ip change.

bdraco commented 3 years ago

I've spent a few hours on this and come to the following conclusions:

Screen Shot 2021-05-26 at 6 20 42 PM

This results in lots of update churn each time it sees an IPv6 address as it thinks its new since the cache fetch is always returning the latest one.

probot-home-assistant[bot] commented 3 years ago

zeroconf documentation zeroconf source (message by IssueLinks)

bdraco commented 3 years ago

Should be fixed with zeroconf 0.32

https://github.com/jstasiak/python-zeroconf/issues/643