home-assistant / operating-system

:beginner: Home Assistant Operating System
Apache License 2.0
5.01k stars 981 forks source link

Regularly the hostname is lost and replaced with something 'weird' #1673

Closed charraeus closed 2 years ago

charraeus commented 2 years ago

Describe the issue you are experiencing

From time to time in the network my HA-hostname, defined in the web-gui as "smarthome" is replaced by a weird name like "265a4a7c22df48da84278374023ff59b". In the gui the originally defined hostname "smarthome" is shown. But if you look in the gui of my internet router you find the weird name with the ip of my machine.

See 2 screenshots of my router with the machine information of the home assistant machine in topic "additional information".

What operating system image do you use?

rpi3-64 (Raspberry Pi 3 64-bit OS)

What version of Home Assistant Operating System is installed?

7.0

Did you upgrade the Operating System.

Yes

Steps to reproduce the issue

  1. Install HA on a raspberry pi 3 B+ as described here.
  2. Set hostname "smarthome", ipv4- and ipv6-addresses in the web-gui
  3. wait and try pinging "smarthome"... After some times the ping will fail

My network environment:

Anything in the Supervisor logs that might be useful for us?

nothing unusual. The first line of the log below is when I set the hostname again. 

21-12-13 19:48:05 INFO (MainThread) [supervisor.host.control] Set hostname smarthome
21-12-13 19:49:08 INFO (MainThread) [supervisor.host.network] Updating local network information
21-12-13 19:49:19 INFO (MainThread) [supervisor.resolution.check] Starting system checks with state CoreState.RUNNING
21-12-13 19:49:19 INFO (MainThread) [supervisor.resolution.checks.base] Run check for IssueType.TRUST/ContextType.PLUGIN
21-12-13 19:49:20 INFO (MainThread) [supervisor.resolution.checks.base] Run check for IssueType.TRUST/ContextType.CORE
21-12-13 19:49:20 INFO (MainThread) [supervisor.resolution.checks.base] Run check for IssueType.PWNED/ContextType.ADDON
21-12-13 19:49:20 INFO (MainThread) [supervisor.resolution.checks.base] Run check for IssueType.TRUST/ContextType.SUPERVISOR
21-12-13 19:49:20 INFO (MainThread) [supervisor.resolution.checks.base] Run check for IssueType.SECURITY/ContextType.CORE
21-12-13 19:49:20 INFO (MainThread) [supervisor.resolution.checks.base] Run check for IssueType.FREE_SPACE/ContextType.SYSTEM
21-12-13 19:49:20 INFO (MainThread) [supervisor.resolution.check] System checks complete
21-12-13 19:49:20 INFO (MainThread) [supervisor.resolution.evaluate] Starting system evaluation with state CoreState.RUNNING
21-12-13 19:49:20 INFO (MainThread) [supervisor.resolution.evaluate] System evaluation complete
21-12-13 19:49:20 INFO (MainThread) [supervisor.resolution.fixup] Starting system autofix at state CoreState.RUNNING
21-12-13 19:49:20 INFO (MainThread) [supervisor.resolution.fixup] System autofix complete
21-12-13 19:52:00 INFO (MainThread) [supervisor.homeassistant.api] Updated Home Assistant API token
21-12-13 20:16:21 INFO (MainThread) [supervisor.api.middleware.security] /network/info access from core_ssh
21-12-13 20:16:21 INFO (MainThread) [supervisor.api.middleware.security] /network/info access from core_ssh
21-12-13 20:16:21 INFO (MainThread) [supervisor.api.middleware.security] /host/info access from core_ssh
21-12-13 20:16:21 INFO (MainThread) [supervisor.api.middleware.security] /core/info access from core_ssh
21-12-13 20:22:00 INFO (MainThread) [supervisor.homeassistant.api] Updated Home Assistant API token
21-12-13 20:24:15 INFO (MainThread) [supervisor.host.info] Updating local host information
21-12-13 20:24:18 INFO (MainThread) [supervisor.host.services] Updating service information
21-12-13 20:24:18 INFO (MainThread) [supervisor.host.network] Updating local network information
21-12-13 20:24:19 INFO (MainThread) [supervisor.host.sound] Updating PulseAudio information
21-12-13 20:24:19 INFO (MainThread) [supervisor.host.manager] Host information reload completed

Anything in the Host logs that might be useful for us?

#### Many lines like this at the beginning of the host-log:
[   89.075373] eth0: renamed from veth65baee8
[   89.086085] IPv6: ADDRCONF(NETDEV_CHANGE): veth9be8ca2: link becomes ready
[   89.086285] hassio: port 6(veth9be8ca2) entered blocking state
[   89.086298] hassio: port 6(veth9be8ca2) entered forwarding state
[   96.130927] kauditd_printk_skb: 13 callbacks suppressed
[   96.130938] audit: type=1334 audit(1639316956.718:148): prog-id=12 op=UNLOAD
[   96.130970] audit: type=1334 audit(1639316956.718:149): prog-id=11 op=UNLOAD
[   96.144279] audit: type=1334 audit(1639316956.731:150): prog-id=14 op=UNLOAD
[   96.144321] audit: type=1334 audit(1639316956.731:151): prog-id=13 op=UNLOAD
[   96.318041] hassio: port 7(veth0c41dab) entered blocking state
[   96.318062] hassio: port 7(veth0c41dab) entered disabled state
[   96.318930] device veth0c41dab entered promiscuous mode
[   96.319088] audit: type=1700 audit(1639316956.905:152): dev=veth0c41dab prom=256 old_prom=0 auid=4294967295 uid=0 gid=0 ses=4294967295
[   96.434723] audit: type=1325 audit(1639316957.021:153): table=nat family=2 entries=23 op=xt_replace pid=3630 subj==unconfined comm="iptables"
[   96.460800] audit: type=1325 audit(1639316957.047:154): table=filter family=2 entries=34 op=xt_replace pid=3634 subj==unconfined comm="iptables"
[   96.483386] audit: type=1325 audit(1639316957.070:155): table=nat family=2 entries=24 op=xt_replace pid=3637 subj==unconfined comm="iptables"
[   97.622366] audit: type=1325 audit(1639316958.209:156): table=nat family=2 entries=0 op=xt_register pid=3729 subj==unconfined comm="iptables"
[   97.641598] audit: type=1325 audit(1639316958.228:157): table=filter family=2 entries=0 op=xt_register pid=3730 subj==unconfined comm="iptables"
[   97.761108] eth0: renamed from veth81ab77b
[   97.772856] IPv6: ADDRCONF(NETDEV_CHANGE): veth0c41dab: link becomes ready
[   97.773041] hassio: port 7(veth0c41dab) entered blocking state
[   97.773052] hassio: port 7(veth0c41dab) entered forwarding state
[  141.875557] hassio: port 8(veth3590add) entered blocking state
[  141.875583] hassio: port 8(veth3590add) entered disabled state
[  141.877110] device veth3590add entered promiscuous mode
[  141.877224] kauditd_printk_skb: 8 callbacks suppressed
[  141.877230] audit: type=1700 audit(1639317002.463:166): dev=veth3590add prom=256 old_prom=0 auid=4294967295 uid=0 gid=0 ses=4294967295
[  143.160878] audit: type=1325 audit(1639317003.748:167): table=nat family=2 entries=0 op=xt_register pid=4682 subj==unconfined comm="iptables"
[  143.195843] audit: type=1325 audit(1639317003.783:168): table=filter family=2 entries=0 op=xt_register pid=4683 subj==unconfined comm="iptables"
[  143.218393] audit: type=1325 audit(1639317003.806:169): table=nat family=2 entries=5 op=xt_replace pid=4686 subj==unconfined comm="iptables"
[  143.228112] audit: type=1325 audit(1639317003.816:170): table=nat family=2 entries=7 op=xt_replace pid=4687 subj==unconfined comm="iptables"
[  143.240850] audit: type=1325 audit(1639317003.828:171): table=nat family=2 entries=8 op=xt_replace pid=4689 subj==unconfined comm="iptables"
[  143.248264] audit: type=1325 audit(1639317003.836:172): table=nat family=2 entries=10 op=xt_replace pid=4690 subj==unconfined comm="iptables"
[  143.263643] audit: type=1325 audit(1639317003.851:173): table=nat family=2 entries=11 op=xt_replace pid=4691 subj==unconfined comm="iptables"
[  143.273994] audit: type=1325 audit(1639317003.861:174): table=nat family=2 entries=12 op=xt_replace pid=4692 subj==unconfined comm="iptables"
[  143.286282] audit: type=1325 audit(1639317003.874:175): table=nat family=2 entries=13 op=xt_replace pid=4693 subj==unconfined comm="iptables"
[  143.333875] eth0: renamed from veth1d9329e
[  143.340025] IPv6: ADDRCONF(NETDEV_CHANGE): veth3590add: link becomes ready
[  143.340212] hassio: port 8(veth3590add) entered blocking state
[  143.340222] hassio: port 8(veth3590add) entered forwarding state
(...)

And Many lines like this:

[106597.132977] audit: type=1334 audit(1639423457.178:295): prog-id=75 op=LOAD
[106597.133114] audit: type=1334 audit(1639423457.178:296): prog-id=76 op=LOAD
[106597.836044] ext4 filesystem being remounted at /run/systemd/unit-root/etc/udev/rules.d supports timestamps until 2038 (0x7fffffff)
[106597.836162] ext4 filesystem being remounted at /run/systemd/unit-root/etc/docker supports timestamps until 2038 (0x7fffffff)
[106597.837040] ext4 filesystem being remounted at /run/systemd/unit-root/etc/hostname supports timestamps until 2038 (0x7fffffff)
[106597.838290] ext4 filesystem being remounted at /run/systemd/unit-root/etc/dropbear supports timestamps until 2038 (0x7fffffff)
[106597.838367] ext4 filesystem being remounted at /run/systemd/unit-root/etc/systemd/timesyncd.conf supports timestamps until 2038 (0x7fffffff)
[106597.838909] ext4 filesystem being remounted at /run/systemd/unit-root/etc/modprobe.d supports timestamps until 2038 (0x7fffffff)
[106597.839419] ext4 filesystem being remounted at /run/systemd/unit-root/etc/NetworkManager/system-connections supports timestamps until 2038 (0x7fffffff)
[106597.841604] ext4 filesystem being remounted at /run/systemd/unit-root/etc/modules-load.d supports timestamps until 2038 (0x7fffffff)
[106597.841721] ext4 filesystem being remounted at /run/systemd/unit-root/etc/hosts supports timestamps until 2038 (0x7fffffff)
[106597.887647] ext4 filesystem being remounted at /run/systemd/unit-root/etc/hostname supports timestamps until 2038 (0x7fffffff)
[106597.887732] ext4 filesystem being remounted at /run/systemd/unit-root/etc/modprobe.d supports timestamps until 2038 (0x7fffffff)
[106597.887794] ext4 filesystem being remounted at /run/systemd/unit-root/etc/docker supports timestamps until 2038 (0x7fffffff)
[106597.887856] ext4 filesystem being remounted at /run/systemd/unit-root/etc/modules-load.d supports timestamps until 2038 (0x7fffffff)
[106597.887932] ext4 filesystem being remounted at /run/systemd/unit-root/etc/udev/rules.d supports timestamps until 2038 (0x7fffffff)
[106597.887996] ext4 filesystem being remounted at /run/systemd/unit-root/etc/systemd/timesyncd.conf supports timestamps until 2038 (0x7fffffff)
[106597.888060] ext4 filesystem being remounted at /run/systemd/unit-root/etc/dropbear supports timestamps until 2038 (0x7fffffff)
[106597.888170] ext4 filesystem being remounted at /run/systemd/unit-root/etc/hosts supports timestamps until 2038 (0x7fffffff)
[106597.888239] ext4 filesystem being remounted at /run/systemd/unit-root/etc/NetworkManager/system-connections supports timestamps until 2038 (0x7fffffff)
[106627.164275] audit: type=1334 audit(1639423487.209:297): prog-id=74 op=UNLOAD
[106627.164315] audit: type=1334 audit(1639423487.209:298): prog-id=73 op=UNLOAD
[106628.376888] audit: type=1334 audit(1639423488.422:299): prog-id=76 op=UNLOAD
[106628.376929] audit: type=1334 audit(1639423488.422:300): prog-id=75 op=UNLOAD

System Health information

System Health

version core-2021.12.1
installation_type Home Assistant OS
dev false
hassio true
docker true
user root
virtualenv false
python_version 3.9.7
os_name Linux
os_version 5.10.63-v8
arch aarch64
timezone Europe/Berlin
Home Assistant Cloud logged_in | false -- | -- can_reach_cert_server | ok can_reach_cloud_auth | ok can_reach_cloud | ok
Home Assistant Supervisor host_os | Home Assistant OS 7.0 -- | -- update_channel | stable supervisor_version | supervisor-2021.12.1 docker_version | 20.10.9 disk_total | 27.7 GB disk_used | 3.9 GB healthy | true supported | true board | rpi3-64 supervisor_api | ok version_api | ok installed_addons | File editor (5.3.3), Mosquitto broker (6.0.1), Zigbee2mqtt (1.22.1-1), Samba share (9.5.1), Terminal & SSH (9.2.1), Glances (0.14.0)
Lovelace dashboards | 2 -- | -- resources | 0 views | 1 mode | storage

Additional information

Bildschirmfoto 2021-12-13 um 21 00 55 Bildschirmfoto 2021-12-13 um 21 02 44
charraeus commented 2 years ago

Added a log of DNS in the logs section in post #1

edit: removed the DNS-log again because it is not related to the bug.

agners commented 2 years ago

Not aware of such hostname issues. Do you have access to the OS shell (e.g. via USB keyboard/monitor connected to your Pi running HAOS?) If so, you can check the hostname using hostnamectl. Maybe the NetworkManager logs contain hints: journalctl -b 0 -u NetworkManager

charraeus commented 2 years ago

hostnamectl result:

# hostnamectl
 Static hostname: smarthome
       Icon name: computer-embedded
         Chassis: embedded
      Deployment: production
      Machine ID: 03026ff02ca84fd08cb80a45f1d0a33d
         Boot ID: 828a541bd51a4b40acd4ce2fb4284907
Operating System: Home Assistant OS 7.0
     CPE OS Name: cpe:2.3:o:home-assistant:haos:7.0:*:production:*:*:*:rpi3-64:*
          Kernel: Linux 5.10.63-v8
    Architecture: arm64

The NetworkManager logs do not contain any hints. So I shut down the machine completely and restarted it at 16:00:00. Within the first approx. 5 minutes the hostname changed from "smarthome" to "265a4a7c22df48da84278374023ff59b". I had a look in the complete log and saw in line 1991 that the hostnamed-serivce is finally deactivated after approx. 3 minutes:

Dec 15 16:03:41 smarthome systemd[1]: systemd-hostnamed.service: Deactivated successfully.

Find enclosed the complete log. If you make a search for "hostname" you will see that the "systemd-hostnamed.service" is started and stopped several times and remains finally deactivated. The hostnamectl- and nmcli general hostname nevertheless show "smarthome" as hostname. Very weird...

Gesicherte Terminal-Ausgabe.log NetworManagerProtocol.log

agners commented 2 years ago
Dec 15 16:03:41 smarthome systemd[1]: systemd-hostnamed.service: Deactivated successfully.

This is the expected behavior. The systemd service is only really required to change or request the hostname and shuts itself down if not required to safe memory.

Not sure what is going on, I don't think that this is something triggered by HAOS. To be sure, you could sniff traffic using Wireshark and filtering for DHCP packets. It's a bit a hack, but the Home Assistant Core container has full access to the network, so you can do:

docker exec -it homeassistant /bin/bash
bash-5.1# apk add tshark
bash-5.1# tshark -i eth0 dhcp
charraeus commented 2 years ago

Unfortunately I have absolutely no experience with Wireshark. I've tried your command and got an error message:

bash-5.1# tshark -i eth0 dhcp
Capturing on 'eth0'
tshark: Invalid capture filter "dhcp" for interface 'eth0'.

That string looks like a valid display filter; however, it isn't a valid
capture filter (can't parse filter expression: syntax error).

Note that display filters and capture filters don't have the same syntax,
so you can't use most display filter expressions as capture filters.

See the User's Guide for a description of the capture filter syntax.
0 packets captured
bash-5.1#

I've tried a fresh install on a fresh sd-card without changing anything but creating the initial admin-user. The problem still occurs. Then I've tried a standard installation of Raspberry Pi OS (32bit, no desktop environment) without any changes. The issue does not occur.

agners commented 2 years ago

Hm, not sure what went wrong with the filter, but this one should work:

bash-5.1# tshark -i eth0 -f "port 67 or port 68"
charraeus commented 2 years ago

The filter worked. But there were no packets caught.

agners commented 2 years ago

That is weird. That suggests the new hostname is not caused/sent from HAOS, but "generated" for some reason by the router, or maybe by another device (IP address conflict?)

charraeus commented 2 years ago

I've tried a standard installation of Raspberry Pi OS (32bit, no desktop environment) with the same static IP (192.168.1.16) and hostname (smarthome). The issue does not occur. In the router only "smarthome" is shown, no weird name.

Another weird behavior with the HAOS: pinging without domain fails, but with ".local" as domain it succeeds, despite the HA-Host is shown as "" in the router. See below. I added the ping result of another host, called "magicmirror1" where the issue does not occur. For me, it is like HAOS does not recognize, the domain is "fritz.box". Maybe an issue in the NetManager configuration?

My network configuration:

~ % ping smarthome
PING smarthome.fritz.box (192.168.1.185): 56 data bytes
Request timeout for icmp_seq 0
Request timeout for icmp_seq 1
^C
--- smarthome.fritz.box ping statistics ---
5 packets transmitted, 0 packets received, 100.0% packet loss

~ % ping smarthome.local
PING smarthome.local (192.168.1.16): 56 data bytes
64 bytes from 192.168.1.16: icmp_seq=0 ttl=64 time=2.778 ms
64 bytes from 192.168.1.16: icmp_seq=1 ttl=64 time=1.403 ms
^C
--- smarthome.local ping statistics ---
5 packets transmitted, 5 packets received, 0.0% packet loss
round-trip min/avg/max/stddev = 1.364/1.670/2.778/0.554 ms

~ % ping magicmirror1
PING magicmirror1.fritz.box (192.168.1.15): 56 data bytes
64 bytes from 192.168.1.15: icmp_seq=0 ttl=64 time=0.996 ms
64 bytes from 192.168.1.15: icmp_seq=1 ttl=64 time=1.174 ms
^C
--- magicmirror1.fritz.box ping statistics ---
3 packets transmitted, 3 packets received, 0.0% packet loss
round-trip min/avg/max/stddev = 0.996/1.171/1.342/0.141 ms
~ % 
github-actions[bot] commented 2 years ago

There hasn't been any activity on this issue recently. To keep our backlog manageable we have to clean old issues, as many of them have already been resolved with the latest updates. Please make sure to update to the latest Home Assistant OS version and check if that solves the issue. Let us know if that works for you by adding a comment 👍 This issue has now been marked as stale and will be closed if no further activity occurs. Thank you for your contributions.