home-assistant / operating-system

:beginner: Home Assistant Operating System
Apache License 2.0
4.8k stars 959 forks source link

Unsupported system - Systemd-Resolved issues on a fresh OdroidM1 install #2578

Closed chielbos closed 1 year ago

chielbos commented 1 year ago

Describe the issue you are experiencing

As already described in https://github.com/home-assistant/core/issues/92429 Petro on discord suggested opening a issue in here;

After fresh install with https://github.com/home-assistant/operating-system/releases/download/10.2/haos_odroid-m1-10.2.img.xz all works fine, until i reboot the system.

After the first reboot i get stuck with "Unsupported system - Systemd-Resolved issues"

What operating system image do you use?

odroid-m1 (Hardkernel ODROID-M1)

What version of Home Assistant Operating System is installed?

core-2023.5.4

Did you upgrade the Operating System.

No

Steps to reproduce the issue

-

Anything in the Supervisor logs that might be useful for us?

See other post in the mentioned link above.

Anything in the Host logs that might be useful for us?

.

System information

System Information

version core-2023.5.4
installation_type Home Assistant OS
dev false
hassio true
docker true
user root
virtualenv false
python_version 3.10.11
os_name Linux
os_version 6.1.29
arch aarch64
timezone Europe/Amsterdam
config_dir /config
Home Assistant Cloud logged_in | false -- | -- can_reach_cert_server | failed to load: unreachable can_reach_cloud_auth | failed to load: unreachable can_reach_cloud | failed to load: unreachable
Home Assistant Supervisor host_os | Home Assistant OS 10.2 -- | -- update_channel | stable supervisor_version | supervisor-2023.06.1 agent_version | 1.5.1 docker_version | 23.0.6 disk_total | 13.8 GB disk_used | 4.5 GB healthy | true supported | failed to load: Unsupported board | odroid-m1 supervisor_api | ok version_api | failed to load: unreachable installed_addons | Z-Wave JS (0.1.83), Advanced SSH & Web Terminal (15.0.2)
Dashboards dashboards | 1 -- | -- resources | 0 mode | auto-gen
Recorder oldest_recorder_run | 5 juni 2023 om 17:21 -- | -- current_recorder_run | 6 juni 2023 om 20:48 estimated_db_size | 0.77 MiB database_engine | sqlite database_version | 3.40.1

Additional information

No response

agners commented 1 year ago

After the first reboot i get stuck with "Unsupported system - Systemd-Resolved issues"

Where is that showing?

My test instance on 10.2 runs fine here, even after reboot. Can you share the host logs? E.g. by checking System -> Logs -> Host or by using the following command on the terminal:

ha host logs -b -1 -n 10000
regan-a commented 1 year ago

I am having the same issue and can share my logs if it helps. I haven't had this system online for a little while so logs are old but hopefully they show what you are looking for.

host.log

chielbos commented 1 year ago

The logfile as recieved via SSH with the mentioned command ha-logs.log

The logfile as reported by the UI home-assistant_2023-06-07T16-21-20.188Z.log

An other one home-assistant_2023-06-07T16-24-06.505Z.log

And a screen of the error;

image
lukassadovsky commented 1 year ago

My logs host_2023-06-07T20-15-38.480Z.log dns_2023-06-07T20-15-44.667Z.log

chielbos commented 1 year ago

@agners Any other actions/interventions we can try?

Could we for example SSH-in and run a command on our OS to temp-fix this?

agners commented 1 year ago

@chielbos nothing concerning in the host logs. However, the Home Assistant Core logs seem to indicate that there is a general DNS resolving issue on your system, even when using the DNS resolver provided by Supervised :thinking:

Can you share the Supervisor logs?

agners commented 1 year ago

@lukassadovsky do yo have the same error? What board are you using? Can you share the Supervisor logs as well?

lukassadovsky commented 1 year ago

@agners Yes, i have the same error. I have new clean installation on Ondroid M1, 8GB, 64GB eMMC, SSD Samsung 970 EVO PLUS 500GB. My issue is described here. I started a thread on it earlier than @chielbos. I will send the Supervisor log later.

agners commented 1 year ago

If someone has HAOS debug SSH access (on port 22222) the log output of this command would be interesting too:

journalctl -b 0 -u systemd-resolved.service

I've rebooted my M1 test instance but wasn't able to reproduce this issue so far.

lukassadovsky commented 1 year ago

@agners here is my Supervisor log supervisor_2023-06-12T18-16-35.876Z.log

And the result of the command journalctl -b 0 -u systemd-resolved.service: result.txt

Jun 12 16:51:29 homeassistant systemd[1]: Starting Network Name Resolution...
Jun 12 16:51:29 homeassistant systemd-resolved[468]: Positive Trust Anchors:
Jun 12 16:51:29 homeassistant systemd-resolved[468]: . IN DS 20326 8 2 e06d44b80b8f1d39a95c0b0d7c65d08458e880409bbc683457104237c7f8ec8d
Jun 12 16:51:29 homeassistant systemd-resolved[468]: Negative trust anchors: home.arpa 10.in-addr.arpa 16.172.in-addr.arpa 17.172.in-addr.arpa 18.172.in-addr.arpa 19.172.in-
Jun 12 16:52:25 homeassistant systemd-resolved[468]: Using system hostname 'homeassistant'.
Jun 12 16:52:25 homeassistant systemd[1]: Started Network Name Resolution.
Jun 12 16:52:59 homeassistant-odroidm1 systemd-resolved[468]: System hostname changed to 'homeassistant-odroidm1'.
Jun 12 16:53:05 homeassistant-odroidm1 systemd-resolved[468]: Switching to fallback DNS server 1.1.1.1#cloudflare-dns.com.
Jun 12 16:53:18 homeassistant-odroidm1 systemd-resolved[468]: Using degraded feature set UDP instead of UDP+EDNS0 for DNS server 192.168.2.1.
Jun 12 16:53:28 homeassistant-odroidm1 systemd-resolved[468]: Clock change detected. Flushing caches.
Jun 12 16:54:05 homeassistant-odroidm1 systemd-resolved[468]: Using degraded feature set UDP instead of UDP+EDNS0 for DNS server 192.168.2.1.
Jun 12 17:11:08 homeassistant-odroidm1 systemd-resolved[468]: Grace period over, resuming full feature set (UDP+EDNS0) for DNS server 192.168.2.1.
Jun 12 17:11:08 homeassistant-odroidm1 systemd-resolved[468]: Using degraded feature set UDP instead of UDP+EDNS0 for DNS server 192.168.2.1.
JirikP commented 1 year ago

Hi, same here. For now i am running on RPI4 with SSD boot.

On M1, fresh install, SD card only this: Log here: home-assistant_2023-06-12T18-24-35.454Z.log

And the result of the command journalctl -b 0 -u systemd-resolved.service:

haos_debug_journal

It has been like this for a while. For me this install is acting up. Terminal addon not starting, Settings - addons not working.

Spyro7x commented 1 year ago

Hi, same bug.

Petitboot updated, SD card 32GB and SSD WD Blue SN570 500GB (data is moved on SSD).

System Info. sys_info_Spyro.txt

Logs from Settings > System > Logs home_assistant_core_Spyro.log supervisor_Spyro.log host_Spyro.log dns_Spyro.log multicast_Spyro.log

Host log HAOS_log_terminal_Spyro.log

Logs from HAOS debug SSH journalctl_Spyro.log docker_logs_hassio_supervisor_Spyro.log docker_logs_homeassistant_Spyro.log

For me, every time the USB drive "CONFIG" was connected, system booted correctly.

In Petitboot is IP 192.168.9.99, but HAOS is 192.168.9.6. It´s Ok?

regan-a commented 1 year ago

I did a little more exploration and found that the coredns service running in the hassio_dns doesn't seem to be opening the usual DNS ports. I compared "corefile" on both systems and they are identical.

Broken HA install on ODROID M1:

bash-5.1# ps aux
PID   USER     TIME  COMMAND
    1 root      0:00 s6-svscan -t0 /var/run/s6/services
   36 root      0:00 s6-supervise s6-fdholderd
  206 root      0:00 s6-supervise coredns
  210 root      0:00 coredns -conf /etc/corefile
  260 root      0:00 bash
  321 root      0:00 ps aux

bash-5.1# netstat -pa
Active Internet connections (servers and established)
Proto Recv-Q Send-Q Local Address           Foreign Address         State       PID/Program name
tcp        0      0 127.0.0.11:36601        0.0.0.0:*               LISTEN      -
udp        0      0 127.0.0.11:43899        0.0.0.0:*                           -

Working HA install on RPi4b:

bash-5.1# ps aux
PID   USER     TIME  COMMAND
    1 root      0:00 s6-svscan -t0 /var/run/s6/services
   35 root      0:00 s6-supervise s6-fdholderd
  208 root      0:00 s6-supervise coredns
  211 root     24:02 coredns -conf /etc/corefile
  260 root      0:00 bash
  280 root      0:00 ps aux

bash-5.1# netstat -pa
Active Internet connections (servers and established)
Proto Recv-Q Send-Q Local Address           Foreign Address         State       PID/Program name
tcp        0      0 127.0.0.11:41291        0.0.0.0:*               LISTEN      -
tcp        0      0 :::domain               :::*                    LISTEN      211/coredns
tcp        0      0 :::5553                 :::*                    LISTEN      211/coredns
udp        0      0 127.0.0.11:60860        0.0.0.0:*                           -
udp        0      0 :::5553                 :::*                                211/coredns
udp        0      0 :::domain               :::*                                211/coredns

This coredns configuration file is identical on both systems:

bash-5.1# cat /etc/corefile
.:53 {
    log {
        class error
    }
    errors
    loop

    hosts /config/hosts {
        fallthrough
    }
    template ANY AAAA local.hass.io hassio {
        rcode NOERROR
    }
    template ANY A local.hass.io hassio {
        rcode NXDOMAIN
    }
    mdns
    forward .  dns://10.1.40.5 {
        except local.hass.io
        policy sequential
        health_check 1m
        max_fails 5
    }
    fallback REFUSED,SERVFAIL,NXDOMAIN . dns://127.0.0.1:5553
    cache 600
}

.:5553 {
    log {
        class error
    }
    errors

    forward . tls://1.1.1.1 tls://1.0.0.1 {
        tls_servername cloudflare-dns.com
        max_fails 0
        except local.hass.io
    }
    cache 600
}
regan-a commented 1 year ago

I also tried restarting the container on both systems and this is what I saw in the logs. On the working system you can see some additional log entries for the DNS ports and CoreDNS service. Those are missing on my ODROID M1. I'm not sure where to go from here.

Broken HA install on ODROID M1 (hassio_dns container restarted):

2023-06-12T20:49:50.663355000Z IMAGE_NAME=ghcr.io%2Fhome-assistant%2Faarch64-hassio-dns%3A2022.04.1 [INFO] SIGTERM: Shutting down servers then terminating  
2023-06-12T20:49:50.680333000Z IMAGE_NAME=ghcr.io%2Fhome-assistant%2Faarch64-hassio-dns%3A2022.04.1 [cont-finish.d] executing container finish scripts...  
2023-06-12T20:49:50.683666000Z IMAGE_NAME=ghcr.io%2Fhome-assistant%2Faarch64-hassio-dns%3A2022.04.1 [cont-finish.d] done.  
2023-06-12T20:49:50.684950000Z IMAGE_NAME=ghcr.io%2Fhome-assistant%2Faarch64-hassio-dns%3A2022.04.1 [s6-finish] waiting for services.  
2023-06-12T20:49:50.899160000Z IMAGE_NAME=ghcr.io%2Fhome-assistant%2Faarch64-hassio-dns%3A2022.04.1 [s6-finish] sending all processes the TERM signal.  
2023-06-12T20:49:53.908999000Z IMAGE_NAME=ghcr.io%2Fhome-assistant%2Faarch64-hassio-dns%3A2022.04.1 [s6-finish] sending all processes the KILL signal and exiting.  
2023-06-12T20:49:55.792844000Z IMAGE_NAME=ghcr.io%2Fhome-assistant%2Faarch64-hassio-dns%3A2022.04.1 [s6-init] making user provided files available at /var/run/s6/etc...exited 0.  
2023-06-12T20:49:55.904141000Z IMAGE_NAME=ghcr.io%2Fhome-assistant%2Faarch64-hassio-dns%3A2022.04.1 [s6-init] ensuring user provided files have correct perms...exited 0.  
2023-06-12T20:49:55.908262000Z IMAGE_NAME=ghcr.io%2Fhome-assistant%2Faarch64-hassio-dns%3A2022.04.1 [fix-attrs.d] applying ownership & permissions fixes...  
2023-06-12T20:49:55.912251000Z IMAGE_NAME=ghcr.io%2Fhome-assistant%2Faarch64-hassio-dns%3A2022.04.1 [fix-attrs.d] done.  
2023-06-12T20:49:55.914955000Z IMAGE_NAME=ghcr.io%2Fhome-assistant%2Faarch64-hassio-dns%3A2022.04.1 [cont-init.d] executing container initialization scripts...  
2023-06-12T20:49:55.919412000Z IMAGE_NAME=ghcr.io%2Fhome-assistant%2Faarch64-hassio-dns%3A2022.04.1 [cont-init.d] corefile.sh: executing...  
2023-06-12T20:49:56.009597000Z IMAGE_NAME=ghcr.io%2Fhome-assistant%2Faarch64-hassio-dns%3A2022.04.1 [cont-init.d] corefile.sh: exited 0.  
2023-06-12T20:49:56.012387000Z IMAGE_NAME=ghcr.io%2Fhome-assistant%2Faarch64-hassio-dns%3A2022.04.1 [cont-init.d] done.  
2023-06-12T20:49:56.015350000Z IMAGE_NAME=ghcr.io%2Fhome-assistant%2Faarch64-hassio-dns%3A2022.04.1 [services.d] starting services  
2023-06-12T20:49:56.040978000Z IMAGE_NAME=ghcr.io%2Fhome-assistant%2Faarch64-hassio-dns%3A2022.04.1 [services.d] done.  

Working HA install on RPi4b (hassio_dns container restarted):

2023-06-12T20:51:57.994670000Z IMAGE_NAME=ghcr.io%2Fhome-assistant%2Faarch64-hassio-dns%3A2022.04.1 [INFO] SIGTERM: Shutting down servers then terminating  
2023-06-12T20:51:58.088887000Z IMAGE_NAME=ghcr.io%2Fhome-assistant%2Faarch64-hassio-dns%3A2022.04.1 [cont-finish.d] executing container finish scripts...  
2023-06-12T20:51:58.103617000Z IMAGE_NAME=ghcr.io%2Fhome-assistant%2Faarch64-hassio-dns%3A2022.04.1 [cont-finish.d] done.  
2023-06-12T20:51:58.105567000Z IMAGE_NAME=ghcr.io%2Fhome-assistant%2Faarch64-hassio-dns%3A2022.04.1 [s6-finish] waiting for services.  
2023-06-12T20:51:58.347219000Z IMAGE_NAME=ghcr.io%2Fhome-assistant%2Faarch64-hassio-dns%3A2022.04.1 [s6-finish] sending all processes the TERM signal.  
2023-06-12T20:52:01.369681000Z IMAGE_NAME=ghcr.io%2Fhome-assistant%2Faarch64-hassio-dns%3A2022.04.1 [s6-finish] sending all processes the KILL signal and exiting.  
2023-06-12T20:52:03.257946000Z IMAGE_NAME=ghcr.io%2Fhome-assistant%2Faarch64-hassio-dns%3A2022.04.1 [s6-init] making user provided files available at /var/run/s6/etc...exited 0.  
2023-06-12T20:52:03.503083000Z IMAGE_NAME=ghcr.io%2Fhome-assistant%2Faarch64-hassio-dns%3A2022.04.1 [s6-init] ensuring user provided files have correct perms...exited 0.  
2023-06-12T20:52:03.510385000Z IMAGE_NAME=ghcr.io%2Fhome-assistant%2Faarch64-hassio-dns%3A2022.04.1 [fix-attrs.d] applying ownership & permissions fixes...  
2023-06-12T20:52:03.516249000Z IMAGE_NAME=ghcr.io%2Fhome-assistant%2Faarch64-hassio-dns%3A2022.04.1 [fix-attrs.d] done.  
2023-06-12T20:52:03.521283000Z IMAGE_NAME=ghcr.io%2Fhome-assistant%2Faarch64-hassio-dns%3A2022.04.1 [cont-init.d] executing container initialization scripts...  
2023-06-12T20:52:03.529852000Z IMAGE_NAME=ghcr.io%2Fhome-assistant%2Faarch64-hassio-dns%3A2022.04.1 [cont-init.d] corefile.sh: executing...  
2023-06-12T20:52:04.048997000Z IMAGE_NAME=ghcr.io%2Fhome-assistant%2Faarch64-hassio-dns%3A2022.04.1 [cont-init.d] corefile.sh: exited 0.  
2023-06-12T20:52:04.052970000Z IMAGE_NAME=ghcr.io%2Fhome-assistant%2Faarch64-hassio-dns%3A2022.04.1 [cont-init.d] done.  
2023-06-12T20:52:04.056585000Z IMAGE_NAME=ghcr.io%2Fhome-assistant%2Faarch64-hassio-dns%3A2022.04.1 [services.d] starting services  
2023-06-12T20:52:04.103166000Z IMAGE_NAME=ghcr.io%2Fhome-assistant%2Faarch64-hassio-dns%3A2022.04.1 [services.d] done.  
2023-06-12T20:52:04.867387000Z IMAGE_NAME=ghcr.io%2Fhome-assistant%2Faarch64-hassio-dns%3A2022.04.1 .:53  
2023-06-12T20:52:04.869480000Z IMAGE_NAME=ghcr.io%2Fhome-assistant%2Faarch64-hassio-dns%3A2022.04.1 .:5553  
2023-06-12T20:52:04.870003000Z IMAGE_NAME=ghcr.io%2Fhome-assistant%2Faarch64-hassio-dns%3A2022.04.1 CoreDNS-1.8.4  
2023-06-12T20:52:04.870284000Z IMAGE_NAME=ghcr.io%2Fhome-assistant%2Faarch64-hassio-dns%3A2022.04.1 linux/arm64, go1.15.15, 053c4d5-dirty  
chielbos commented 1 year ago

@regan-a , this is pure speculation, but docker sounds like the right place to look into.

My ZWave JS add-in started to NOT find de ZWaveJS server because of a network/dns issue, at the same time. Since i understood that ZWaveJS is hosted on the docker lvl, this could colerate with the problem, and thus suggesting these issues have the same root cause.

I'll try to get a buch of logs tonight.

agners commented 1 year ago

After fresh install with https://github.com/home-assistant/operating-system/releases/download/10.2/haos_odroid-m1-10.2.img.xz all works fine, until i reboot the system.

After the first reboot i get stuck with "Unsupported system - Systemd-Resolved issues"

I've tried to reproduce this with a fresh 10.2 installation on a SD card, but wasn't able to: Things worked after first boot, and I didn't get that "Unsupported system - Systemd-Resolved issues" on next reboot.

However, I did saw some problems at first boot: systemd-time-wait-sync.service timed out, which might lead to the system to not wait long enough to start certain service. PR #2594 addresses this. It could be that this, with some weird interaction during setup leads then to the Systemd-Resolved, but I can't really see how.

I'll do a nightly build tonight, it would be interesting to see if you can reproduce the problem with that nightly build.

agners commented 1 year ago

The nightly build is available here: https://os-builds.home-assistant.io/11.0.dev20230613/

Can somebody give this a try to see if it makes a difference?

JirikP commented 1 year ago

The nightly build is available here: https://os-builds.home-assistant.io/11.0.dev20230613/

Can somebody give this a try to see if it makes a difference?

I might have time today. I will give it a go.

chielbos commented 1 year ago

I've tried the nightly build. On initial boot this was successfull. I've recovered my backup, which also worked.

Unfortunatly, now after 2'nd reboot, its stuck on "Waiting for the Home Assistant CLI to be ready... So no easy loggin/debugging now right?

chielbos commented 1 year ago

New update; Another fresh install; this time without recovering a backup. After two rebooted until now it appears to work just fine.

Problem described in my earlier comment is probably due to multiple partitions on the data disk. Feel free to ignore ;)

I'm gonna put it to the test in the upcoming days. It's looking promising!

lukassadovsky commented 1 year ago

New fresh install 11.0.dev20230613/. After reboot same problem. After restore backup from RPI3B (instalation ok) same problem.

My logs: home-assistant_2023-06-14T18-17-29.747Z.log dns_2023-06-14T18-17-22.768Z.log host_2023-06-14T18-17-17.369Z.log supervisor_2023-06-14T18-17-11.755Z.log

JirikP commented 1 year ago

Same here, fresh install of 11.0.dev20230613 and right after first reboot. home-assistant_2023-06-14T19-01-44.414Z.log

agners commented 1 year ago

@lukassadovsky @JirikP did both of you restore a backup? What add-ons are you using?

lukassadovsky commented 1 year ago

@agners First step: New fresh install 11.0.dev20230613/. After reboot same problem. I don't have logs Second step: Restore a backup from RPI3B. Same problem. Logs attached

lukassadovsky commented 1 year ago

@agners Now a new fresh install. Everything OK. Logs attached audio_2023-06-14T21-33-46.717Z.log dns_2023-06-14T21-33-43.031Z.log home-assistant_2023-06-14T21-33-30.281Z.log host_2023-06-14T21-33-37.493Z.log multicast_2023-06-14T21-33-50.714Z.log supervisor_2023-06-14T21-33-34.652Z.log

After the first reboot same error "Unsupported system - Systemd-Resolved issues". Logs attached. audio_2023-06-14T21-41-02.179Z_AfterReboot.log dns_2023-06-14T21-40-59.505Z_AfterReboot.log home-assistant_2023-06-14T21-40-48.357Z_AfterReboot.log host_2023-06-14T21-40-56.436Z_AfterReboot.log multicast_2023-06-14T21-50-29.727Z_AfterReboot.log supervisor_2023-06-14T21-40-53.443Z_AfterReboot.log

Only a reboot was performed. No recovery. Nothing. Fresh install only

JirikP commented 1 year ago

@lukassadovsky @JirikP did both of you restore a backup? What add-ons are you using?

Only fresh install, init config and installed file editor. Reboot and again Unsupported system. Only SD card, nothing else, no USB devices no integrations. No other homeassistant on network. DNS on router set to 1.1.1.1

agners commented 1 year ago

@lukassadovsky @JirikP can one of you run journalctl -b 0 -u systemd-resolved.service on the host shell (via port 22222, see https://developers.home-assistant.io/docs/operating-system/debugging#ssh-access-to-the-host).

agners commented 1 year ago

Ideally, also a full log of the failing reboot (journalctl -b 0).

Unfortunately, I just can't reproduce the problem here :cry: I reinstalled the SD now 4 times, went through onboarding, rebooted (through UI) and it just doesn't show the problem you are seeing. So it seems to be some interaction of your (network) environment and HAOS.

Maybe comparing systems gives us clue? My setup

JirikP commented 1 year ago

@agners My setup:

This is exactly the same setup for my HAOS on RPI4 with boot from AXAGON EEM2-GTR and 250 GB NVME and that works fine.

Is there something i can update on the odroid or a revision of some component on the board?

lukassadovsky commented 1 year ago

@agners

This is exactly the same setup for my HAOS on RPI3B with boot from SD card and that works fine. When I test the Odroid M1 (IP 192.168.2.26), the RPI3B is powered off, disconnected from the network and its IP (192.168.2.25) is disabled on the router. The router is restarted. I will send the log (journalctl -b 0) later.

I tried having both devices (RPI3B and OdroidM1) powered on with different hostnames. Same error. Same behavior. If everything works fine for you, then I suspect the number of HAOSs on the same network (RPI3B and OdroidM1)

agners commented 1 year ago

I tried having both devices (RPI3B and OdroidM1) powered on with different hostnames. Same error. Same behavior. If everything works fine for you, then I suspect the number of HAOSs on the same network (RPI3B and OdroidM1)

Interesting idea, but it is then a bit weird that it only happens on ODROID-M1. Does renaming the instance name/hostname helps then?

agners commented 1 year ago

Interesting idea, but it is then a bit weird that it only happens on ODROID-M1. Does renaming the instance name/hostname helps then?

I've tried with two Home Assistant installation in the same network (Rpi 4 + ODROID-M1), both with the same hostname etc. Still, I am not able to reproduce.

@regan-a's analysis seem to be the best culprit we have currently: Somehow CoreDNS cannot bind to port 53 and 5553. But the reason for that is unclear to me, also why it is not complaining more in the logs. It is just that these two lines seem to be missing in the failing case:

.:53
.:5553

But CoreDNS is in its independent container, so no idea why it would fail :confused:

lukassadovsky commented 1 year ago

Interesting idea, but it is then a bit weird that it only happens on ODROID-M1. Does renaming the instance name/hostname helps then?

I have different instance name/hostname. But error still.

HA is running fine on the RPI3B. And for that I am testing on OdroidM1. But I'm trying different variants. Also the option that I turned off the RPI3B.

The target state is that I migrate from RPI3B to OdroidM1

agners commented 1 year ago

Could you try to access the system on port 22222 (you have to enable access using via port 22222, see https://developers.home-assistant.io/docs/operating-system/debugging#ssh-access-to-the-host).

Particularly, the full output of these commands after a boot which lead to the Systemd-Resolved issue would be interesting:

journalctl -b 0 -u systemd-resolved.service
journalctl -b 0
lukassadovsky commented 1 year ago

@agners Here are the required logs. journalctl -b 0 -u systemd-resolved.service.txt journalctl -b 0.txt

agners commented 1 year ago

@agners Here are the required logs. journalctl -b 0 -u systemd-resolved.service.txt journalctl -b 0.txt

Hm, the journalctl -b 0 isn't complete. Can you maybe store it in a file and then upload that (e.g. the Home Assistant Core /config directory using journalctl -b 0 > /mnt/data/supervisor/homeassistant/).

lukassadovsky commented 1 year ago

@agners journalctl -b 0.txt

I am sorry. I don't know linux. But I'm getting better :)

JirikP commented 1 year ago

Tried again on fresh install on completely different network on different place and network provider. Still the same issue. Is there any other somewhat easy way how to set up hassio on M1? I dont feel like going with Debian 10 and docker. Do not have that much skill with linux and do not have time to learn it right now.

agners commented 1 year ago

@lukassadovsky thanks for the full log, that helped me, I think I see the problem now:

Jun 17 19:11:04 homeassistant udisksd[256]: udisks daemon version 2.9.2 starting
...
Jun 17 19:12:34 homeassistant systemd[1]: udisks2.service: start operation timed out. Terminating.
Jun 17 19:12:34 homeassistant udisksd[256]: udisks daemon version 2.9.2 exiting
Jun 17 19:12:34 homeassistant systemd[1]: udisks2.service: Failed with result 'timeout'.
...
Jun 17 19:12:36 homeassistant systemd[1]: haos-swapfile.service: Deactivated successfully.

It seems that udisks2 service gets started really early, and then times out because creation of the Swapfile takes much longer than anticipated. With that I should be able to reproduce the problem and fix it so that it won't happen in the future.

lukassadovsky commented 1 year ago

@agners Thanks. I will wait for a fix.

agners commented 1 year ago

Actually, I still can't reproduce. I inserted artificial delays in some services, but it doesn't end up in the issues you are seeing.

At this point I must assume this is hardware (related) issue. Would be someone willing to send me a unit so I can reproduce it locally? :cold_sweat:

lukassadovsky commented 1 year ago

@agners Okay. Try removing NVME and eMMC. Try installing from the microSD card only. But I am concerned that there were users in another thread who had OdroidM1 only with microSD

agners commented 1 year ago

I've tried with eMMC here as well, I can't reproduce. I really think it is device dependent.

JirikP commented 1 year ago

Actually, I still can't reproduce. I inserted artificial delays in some services, but it doesn't end up in the issues you are seeing.

At this point I must assume this is hardware (related) issue. Would be someone willing to send me a unit so I can reproduce it locally? 😰

Sure, it is useless for me now. Where are you from? I really hope from europe otherwise it can get expensive really fast. Or i might be able to setup some kind of remote access machine and the M1 on the same network, but it might be more work than just shipping it.

agners commented 1 year ago

Sure, it is useless for me now. Where are you from? I really hope from europe otherwise it can get expensive really fast.

@JirikP I am, sort of :sweat_smile: . Maybe it is easiest if we go chat, are you on Discord? I am falstaff321 on the Home Assistant Discord server.

regan-a commented 1 year ago

@agners, thanks for helping us get to the bottom of this. It looks like you suspect a hardware related issue but I just wanted to check and see if there is anything I can do that would help as far as installing a specific build & collecting logs? I finally have access to my M1 again 😀

agners commented 1 year ago

@regan-a what I see from the logs is that something delays the startup massively, but the main problem I have currently is that I can't tell from the logs what service that exactly is. It seems to be related to udisks2, but it could also be the D-Bus broker which is somehow stalling.

A complete journalctl -b 0 of a first boot would be interesting, as well as maybe a second journalctl -b 0 of a second boot (where things fail).

Another interesting log might be one with systemd debugging on (set systemd.log_level=debug in /mnt/boot/cmdline.txt).

regan-a commented 1 year ago

Okay, I will upload those later today.

regan-a commented 1 year ago

Apologies for the delay, I had to procure a new USB stick. I was able to reproduce the issue on my first attempt and have uploaded the logs. I can confirm that again, the DNS ports don't seem to be opening in the hassio_dns container on second boot.

After a lot of testing, manipulating the coredns configuration file (/etc/corefile), and manually starting coredns I found that the service appears to be getting hung up on the mdns external plugin. If I comment out mdns in my corefile, coredns starts without a problem. Not sure what the difference is between my RPi4 and my M1 but that single line, which is present on both systems, seems to negatively affect my M1.

For example, if I do this on my M1, coredns runs properly:

/etc/corefile:

.:53 {
    log {
        class error
    }
    errors
    loop

    hosts /config/hosts {
        fallthrough
    }
    template ANY AAAA local.hass.io hassio {
        rcode NOERROR
    }
    template ANY A local.hass.io hassio {
        rcode NXDOMAIN
    }
#    mdns
    forward .  dns://10.2.10.1 {
        except local.hass.io
        policy sequential
        health_check 1m
        max_fails 5
    }
    fallback REFUSED,SERVFAIL,NXDOMAIN . dns://127.0.0.1:5553
    cache 600
}

.:5553 {
    log {
        class error
    }
    errors

    forward . tls://1.1.1.1 tls://1.0.0.1 {
        tls_servername cloudflare-dns.com
        max_fails 0
        except local.hass.io
    }
    cache 600
}  

I'm not really sure where to take it from here but hopefully that helps get us closer to a solution.

First boot: first_boot_log.txt first_boot_docker_dns.txt

Second boot: second_boot_log.txt second_boot_docker_dns.txt

lukassadovsky commented 1 year ago

New fresh install 11.0.dev20230619. Same error. Logs.

First boot: first_boot_log.txt

Second boot: second_boot_log.txt