Closed chielbos closed 1 year ago
@lukassadovsky it seems that Network Manager has issues getting a DHCP lease:
Jun 21 20:18:18 homeassistant NetworkManager[558]: <info> [1687378698.0321] device (eth0): carrier: link connected
Jun 21 20:18:18 homeassistant NetworkManager[558]: <info> [1687378698.0329] device (eth0): state change: unavailable -> disconnected (reason 'carrier-changed', sys-iface-state: 'managed')
Jun 21 20:18:18 homeassistant NetworkManager[558]: <info> [1687378698.0349] policy: auto-activating connection 'Wired connection 1' (2ed5f039-a578-34e7-98c5-2ddcf44db399)
Jun 21 20:18:18 homeassistant NetworkManager[558]: <info> [1687378698.0364] device (eth0): Activation: starting connection 'Wired connection 1' (2ed5f039-a578-34e7-98c5-2ddcf44db399)
Jun 21 20:18:18 homeassistant NetworkManager[558]: <info> [1687378698.0367] device (eth0): state change: disconnected -> prepare (reason 'none', sys-iface-state: 'managed')
Jun 21 20:18:18 homeassistant NetworkManager[558]: <info> [1687378698.0375] manager: NetworkManager state is now CONNECTING
Jun 21 20:18:18 homeassistant NetworkManager[558]: <info> [1687378698.0381] device (eth0): state change: prepare -> config (reason 'none', sys-iface-state: 'managed')
Jun 21 20:18:18 homeassistant NetworkManager[558]: <info> [1687378698.0396] device (eth0): state change: config -> ip-config (reason 'none', sys-iface-state: 'managed')
Jun 21 20:18:18 homeassistant NetworkManager[558]: <info> [1687378698.0412] dhcp4 (eth0): activation: beginning transaction (timeout in 45 seconds)
Jun 21 20:18:18 homeassistant systemd-resolved[556]: eth0: Bus client set MulticastDNS setting: yes
This is a very suspicious error:
Jun 21 20:18:18 homeassistant kernel: RTL8211F Gigabit Ethernet stmmac-0:00: Downshift occurred from negotiated speed 1Gbps to actual speed 100Mbps, check cabling!
Jun 21 20:18:18 homeassistant kernel: rk_gmac-dwmac fe2a0000.ethernet eth0: Link is Up - 100Mbps/Full - flow control rx/tx
Can you try a different Ethernet cable?
It kinda supports what I've said previously in https://github.com/home-assistant/operating-system/issues/2578#issuecomment-1597185152: This issue is board dependent. A not correctly working Ethernet port would explain the problem.
I wonder if the OS Hardkernel provides (Debian) also shows such problems :thinking:
@agners I'll try another cable. But if I connect a laptop to the same cable, it works at 1Gbps.
I can give it a go with usb to RJ45 adaptor. I have couple of them lying around. Will HAOS recognize them out of the box?
I can give it a go with usb to RJ45 adaptor. I have couple of them lying around. Will HAOS recognize them out of the box?
Depends on the exact model, I remember I've enabled some in the past. YMMV, definitely wroth trying.
I can give it a go with usb to RJ45 adaptor. I have couple of them lying around. Will HAOS recognize them out of the box?
Depends on the exact model, I remember I've enabled some in the past. YMMV, definitely wroth trying.
Still the same bug. The dongle works, but that is it.
Can you try a different Ethernet cable?
Different Ethernet cable works at 1Gbps
Any theories why the mdns plugin is preventing coredns from starting properly?
Version: haos_odroid-m1-10.3.img.xz I found interesting behaviour. If I boot using nvme (with sd card to start kernel only) (check this method https://community.home-assistant.io/t/add-support-for-odroid-m1/408257/63?u=gordio) bug with dns happened rarely. If use just sd card - always. Is it something related to race? like running dns container before dbus/mounting fs or something like that…? Log of "broken boot": https://dpaste.org/MewXA
Sigh..... 😮💨 - The saga appears to continue over here.
Today, our electrician came over to apply some change's (which required poweroff).
Upon restarting 🪄 the issue came back.
Home Assistant 2023.6.3 Supervisor 2023.06.4 Operating System 11.0.dev20230613 Frontend-versie: 20230608.0 - latest
Any actions i can perform to resolve the issue on my end, or help identifying the problem?
Update; After rebooting 3 times (via ui) out of pure desperation, it works again. It feels like black magic 🧙🏽♀️
I've made some progress here, this gets a bit technical and is more meant as a note to myself for the records :sweat_smile:
Currently I am booting into an emergency shell with cmdline.txt
set to systemd.log_level=debug systemd.unit=emergency.target
. Up to this shell my system and the system with problems boot identically. From the logs I've found that udisks2.service
was slow to start. Further investigation show that it is dbus-broker.service
which initially seems to fail. A very minimal start sequence emulates the problematic start:
systemctl start systemd-journald.socket
systemctl start dbus-broker.service
The second command takes 90s+ to respond on a problematic board, which triggers a systemd timeout:
Jul 04 14:53:53 homeassistant systemd[1]: dbus-broker.service: Main process exited, code=killed, status=9/KILL
Jul 04 14:53:53 homeassistant systemd[1]: dbus-broker.service: Failed with result 'timeout'.
Checking the current syscall of the offending process reveals:
~ # ps -p $(pidof dbus-broker-launch) -ocmd,stat,wchan
CMD STAT WCHAN
/usr/bin/dbus-broker-launch Ss wait_for_random_bytes
The dbus-broker.service
gets restarted again and succeeds, but from what I understand this causes udisks2.service to fail and subsequently other services to not correctly start.
I am not clear why my board doesn't show the same problem. My first thought was that #2575 is the culprit, but since this was already a problem with 10.2, and PR #2575 came in with 10.3, this can't really be (unless we trace two independent problems here :thinking: ).
Meanwhile I've proven that the entropy pool is the problem indeed. On the problematic board I can see that the entropy pool as reported by the kernel starts around ~50 and grows very slowly:
watch -n 1 cat /proc/sys/kernel/random/entropy_avail
On the working board the entropy pool ramps up much quicker. It still takes about 5-10s to reach 256. At 256 the service dbus-broker.service
then launches successful.
Currently it is not clear to me why that is the case. The kernel adds various identifiers (e.g. of SD cards and USB devices) to the entropy pool via add_device_randomness
. It could be that the SD card's identifier used doesn't/does not provide as much entropy. However, this is a rather wild guess.
In any case, I am currently working on integrating a driver for the hardware random number generator. With that there will be enough entropy available very quickly, which seems to solve the boot problem.
It might also be sensible to improve the service ordering: In particular, udisks2.service
should depend on the dbus.socket
. However, it seems that the socket and the dbus-broker.service
are not directly linked, so this alone won't resolve the issue.
I was wondering why dbus-broker requires randomness. It seem that the XML library libexpat is at fault. I am not entirly sure why quality entropy is need for an XML parser, but it seems the library really insists on it (see https://github.com/libexpat/libexpat/blob/R_2_5_0/expat/lib/xmlparse.c#L128).
For the ordering problem I've opened an issue in the dbus-broker project https://github.com/bus1/dbus-broker/issues/319. Maybe there is a better way to express ordering for this issue.
With #2578 and #2633 there are now two fixes for this issue. They will be part of the next 10.4 release. The latest dev version 11.0.dev20230705
already has the first fix (see https://os-builds.home-assistant.io/).
Hi Angers,
Running the new version for a while now.
Works like a charm! Thanks for putting in the effort!
Describe the issue you are experiencing
As already described in https://github.com/home-assistant/core/issues/92429 Petro on discord suggested opening a issue in here;
After fresh install with https://github.com/home-assistant/operating-system/releases/download/10.2/haos_odroid-m1-10.2.img.xz all works fine, until i reboot the system.
After the first reboot i get stuck with "Unsupported system - Systemd-Resolved issues"
What operating system image do you use?
odroid-m1 (Hardkernel ODROID-M1)
What version of Home Assistant Operating System is installed?
core-2023.5.4
Did you upgrade the Operating System.
No
Steps to reproduce the issue
-
Anything in the Supervisor logs that might be useful for us?
Anything in the Host logs that might be useful for us?
System information
System Information
Home Assistant Cloud
logged_in | false -- | -- can_reach_cert_server | failed to load: unreachable can_reach_cloud_auth | failed to load: unreachable can_reach_cloud | failed to load: unreachableHome Assistant Supervisor
host_os | Home Assistant OS 10.2 -- | -- update_channel | stable supervisor_version | supervisor-2023.06.1 agent_version | 1.5.1 docker_version | 23.0.6 disk_total | 13.8 GB disk_used | 4.5 GB healthy | true supported | failed to load: Unsupported board | odroid-m1 supervisor_api | ok version_api | failed to load: unreachable installed_addons | Z-Wave JS (0.1.83), Advanced SSH & Web Terminal (15.0.2)Dashboards
dashboards | 1 -- | -- resources | 0 mode | auto-genRecorder
oldest_recorder_run | 5 juni 2023 om 17:21 -- | -- current_recorder_run | 6 juni 2023 om 20:48 estimated_db_size | 0.77 MiB database_engine | sqlite database_version | 3.40.1Additional information
No response