home-assistant / supervisor

:house_with_garden: Home Assistant Supervisor
https://home-assistant.io/hassio/
Apache License 2.0
1.75k stars 639 forks source link

On restart, doesn't restart. #1529

Closed rlpowell closed 4 years ago

rlpowell commented 4 years ago

Home Assistant release with the issue: This isn't an issue with home assistant as such; it's an issue with the hassio supervisor docker running on Linux. The supervisor image version is 201

Operating environment (HassOS/Generic): Linux (Fedora 31, specifically) using Docker. The initial setup was done with https://github.com/home-assistant/hassio-installer

Description of problem:

I had hassio running OK, but due to some problems I was having I wanted to restart everything. So I did a docker kill / docker rm of all the containers, and restarted. Specifically, I'm restarting with "sudo systemctl restart hassio-supervisor"; this uses the systemd configs installed by https://github.com/home-assistant/hassio-installer , but what it actually ends up running is /usr/sbin/hassio-supervisor , which looks like so:

set -e

# Load configs
CONFIG_FILE=/etc/hassio.json

SUPERVISOR="$(jq --raw-output '.supervisor' ${CONFIG_FILE})"
HOMEASSISTANT="$(jq --raw-output '.homeassistant' ${CONFIG_FILE})"
DATA="$(jq --raw-output '.data // "/usr/share/hassio"' ${CONFIG_FILE})"

# AppArmor Support
if command -v apparmor_parser > /dev/null 2>&1 && grep hassio-supervisor /sys/kernel/security/apparmor/profiles > /dev/null 2>&1; then
    APPARMOR="--security-opt apparmor=hassio-supervisor"
else
    APPARMOR="--security-opt apparmor:unconfined"
fi

# Init supervisor
HASSIO_DATA=${DATA}
HASSIO_IMAGE_ID=$(docker inspect --format='{{.Id}}' "${SUPERVISOR}")
HASSIO_CONTAINER_ID=$(docker inspect --format='{{.Image}}' hassio_supervisor || echo "")

runSupervisor() {
    docker rm --force hassio_supervisor || true

    # shellcheck disable=SC2086
    docker run --name hassio_supervisor \
        --privileged \
        $APPARMOR \
        --security-opt seccomp=unconfined \
        -v /var/run/docker.sock:/var/run/docker.sock \
        -v /var/run/dbus:/var/run/dbus \
        -v "${HASSIO_DATA}":/data \
        -e SUPERVISOR_SHARE="${HASSIO_DATA}" \
        -e SUPERVISOR_NAME=hassio_supervisor \
        -e HOMEASSISTANT_REPOSITORY="${HOMEASSISTANT}" \
        "${SUPERVISOR}"
}

# Run supervisor
mkdir -p "${HASSIO_DATA}"
([ "${HASSIO_IMAGE_ID}" = "${HASSIO_CONTAINER_ID}" ] && docker start --attach hassio_supervisor) || runSupervisor

This works fine. But. Here's the logs, and I'll show you the problem:

20-02-27 07:34:03 INFO (MainThread) [__main__] Initialize Hass.io setup
20-02-27 07:34:03 INFO (SyncWorker_0) [hassio.docker.supervisor] Attach to Supervisor homeassistant/amd64-hassio-supervisor with version 201
20-02-27 07:34:03 INFO (SyncWorker_0) [hassio.docker.supervisor] Connect Supervisor to Hass.io Network
20-02-27 07:34:04 INFO (MainThread) [hassio.bootstrap] Initialize Hass.io debugger
20-02-27 07:34:04 INFO (MainThread) [__main__] Setup HassIO
20-02-27 07:34:04 INFO (MainThread) [hassio.utils.gdbus] Connect to dbus: org.freedesktop.systemd1 - /org/freedesktop/systemd1
20-02-27 07:34:05 INFO (MainThread) [hassio.utils.gdbus] Connect to dbus: org.freedesktop.hostname1 - /org/freedesktop/hostname1
20-02-27 07:34:05 WARNING (MainThread) [hassio.dbus.rauc] Host has no rauc support. OTA updates have been disabled.
20-02-27 07:34:05 WARNING (MainThread) [hassio.dbus.nmi_dns] No DnsManager support on the host. Local DNS functions have been disabled.
20-02-27 07:34:05 INFO (MainThread) [hassio.host.info] Update local host information
20-02-27 07:34:05 INFO (MainThread) [hassio.utils.gdbus] Call org.freedesktop.DBus.Properties.GetAll on /org/freedesktop/hostname1
20-02-27 07:34:05 INFO (MainThread) [hassio.host.services] Update service information
20-02-27 07:34:05 INFO (MainThread) [hassio.utils.gdbus] Call org.freedesktop.systemd1.Manager.ListUnits on /org/freedesktop/systemd1
20-02-27 07:34:05 INFO (MainThread) [hassio.host.apparmor] Load AppArmor Profiles: set()
20-02-27 07:34:05 INFO (MainThread) [hassio.host.apparmor] AppArmor is not enabled on host
20-02-27 07:34:05 INFO (SyncWorker_0) [hassio.docker.interface] Attach to homeassistant/amd64-hassio-dns with version 1
20-02-27 07:34:05 INFO (MainThread) [hassio.misc.forwarder] Start DNS port forwarding to 172.30.32.3
20-02-27 07:34:05 INFO (MainThread) [hassio.dns] Start CoreDNS plugin
20-02-27 07:34:05 INFO (SyncWorker_0) [hassio.docker.interface] Clean hassio_dns application
20-02-27 07:34:06 INFO (SyncWorker_0) [hassio.docker.dns] Start DNS homeassistant/amd64-hassio-dns with version 1 - 172.30.32.3
20-02-27 07:34:06 INFO (SyncWorker_1) [hassio.docker.interface] Attach to homeassistant/qemux86-64-homeassistant with version 0.106.0
20-02-27 07:34:06 INFO (MainThread) [hassio.store.git] Load add-on /data/addons/git/a0d7b954 repository
20-02-27 07:34:06 INFO (MainThread) [hassio.store.git] Load add-on /data/addons/core repository
20-02-27 07:34:06 INFO (MainThread) [hassio.store] Load add-ons from store: 63 all - 63 new - 0 remove
20-02-27 07:34:06 INFO (MainThread) [hassio.addons] Found 1 installed add-ons
20-02-27 07:34:07 INFO (SyncWorker_8) [hassio.docker.interface] Attach to hassioaddons/node-red-amd64 with version 6.1.0
20-02-27 07:34:07 INFO (MainThread) [hassio.updater] Fetch update data from https://version.home-assistant.io/stable.json
20-02-27 07:34:08 INFO (MainThread) [hassio.snapshots] Found 0 snapshot files
20-02-27 07:34:08 INFO (MainThread) [hassio.discovery] Load 0 messages
20-02-27 07:34:08 INFO (MainThread) [hassio.ingress] Load 0 ingress session
20-02-27 07:34:08 INFO (MainThread) [hassio.secrets] Load Home Assistant secrets: 1
20-02-27 07:34:08 INFO (MainThread) [__main__] Run Hass.io
20-02-27 07:34:08 INFO (MainThread) [hassio.api] Start API on 172.30.32.2
20-02-27 07:34:08 INFO (MainThread) [hassio.addons] Phase 'initialize' start 0 add-ons
20-02-27 07:34:08 INFO (MainThread) [hassio.core] Hass.io reboot detected
20-02-27 07:34:08 INFO (MainThread) [hassio.tasks] All core tasks are scheduled
20-02-27 07:34:08 INFO (MainThread) [hassio.core] Hass.io is up and running
20-02-27 07:37:14 INFO (MainThread) [__main__] Stopping Hass.io
20-02-27 07:37:14 INFO (MainThread) [hassio.api] Stop API on 172.30.32.2
20-02-27 07:37:14 INFO (MainThread) [hassio.misc.forwarder] Stop DNS forwarding
20-02-27 07:37:14 INFO (MainThread) [hassio.core] Hass.io is down
20-02-27 07:37:14 INFO (MainThread) [__main__] Close Hass.io
20-02-27 07:37:17 INFO (MainThread) [__main__] Initialize Hass.io setup
20-02-27 07:37:17 INFO (SyncWorker_0) [hassio.docker.supervisor] Attach to Supervisor homeassistant/amd64-hassio-supervisor with version 201
20-02-27 07:37:17 INFO (MainThread) [hassio.bootstrap] Initialize Hass.io debugger
20-02-27 07:37:17 INFO (MainThread) [__main__] Setup HassIO
20-02-27 07:37:17 INFO (MainThread) [hassio.utils.gdbus] Connect to dbus: org.freedesktop.systemd1 - /org/freedesktop/systemd1
20-02-27 07:37:18 INFO (MainThread) [hassio.utils.gdbus] Connect to dbus: org.freedesktop.hostname1 - /org/freedesktop/hostname1
20-02-27 07:37:18 WARNING (MainThread) [hassio.dbus.rauc] Host has no rauc support. OTA updates have been disabled.
20-02-27 07:37:18 WARNING (MainThread) [hassio.dbus.nmi_dns] No DnsManager support on the host. Local DNS functions have been disabled.
20-02-27 07:37:18 INFO (MainThread) [hassio.host.info] Update local host information
20-02-27 07:37:18 INFO (MainThread) [hassio.utils.gdbus] Call org.freedesktop.DBus.Properties.GetAll on /org/freedesktop/hostname1
20-02-27 07:37:18 INFO (MainThread) [hassio.host.services] Update service information
20-02-27 07:37:18 INFO (MainThread) [hassio.utils.gdbus] Call org.freedesktop.systemd1.Manager.ListUnits on /org/freedesktop/systemd1
20-02-27 07:37:18 INFO (MainThread) [hassio.host.apparmor] Load AppArmor Profiles: set()
20-02-27 07:37:18 INFO (MainThread) [hassio.host.apparmor] AppArmor is not enabled on host
20-02-27 07:37:18 INFO (SyncWorker_1) [hassio.docker.interface] Attach to homeassistant/amd64-hassio-dns with version 1
20-02-27 07:37:18 INFO (MainThread) [hassio.misc.forwarder] Start DNS port forwarding to 172.30.32.3
20-02-27 07:37:18 INFO (SyncWorker_1) [hassio.docker.interface] Restart homeassistant/amd64-hassio-dns
20-02-27 07:37:19 INFO (SyncWorker_2) [hassio.docker.interface] Attach to homeassistant/qemux86-64-homeassistant with version 0.106.0
20-02-27 07:37:20 INFO (MainThread) [hassio.store.git] Load add-on /data/addons/git/a0d7b954 repository
20-02-27 07:37:20 INFO (MainThread) [hassio.store.git] Load add-on /data/addons/core repository
20-02-27 07:37:20 INFO (MainThread) [hassio.store] Load add-ons from store: 63 all - 63 new - 0 remove
20-02-27 07:37:20 INFO (MainThread) [hassio.addons] Found 1 installed add-ons
20-02-27 07:37:20 INFO (SyncWorker_8) [hassio.docker.interface] Attach to hassioaddons/node-red-amd64 with version 6.1.0
20-02-27 07:37:20 INFO (MainThread) [hassio.updater] Fetch update data from https://version.home-assistant.io/stable.json
20-02-27 07:37:20 INFO (MainThread) [hassio.snapshots] Found 0 snapshot files
20-02-27 07:37:20 INFO (MainThread) [hassio.discovery] Load 0 messages
20-02-27 07:37:20 INFO (MainThread) [hassio.ingress] Load 0 ingress session
20-02-27 07:37:20 INFO (MainThread) [hassio.secrets] Load Home Assistant secrets: 1
20-02-27 07:37:20 INFO (MainThread) [__main__] Run Hass.io
20-02-27 07:37:20 INFO (MainThread) [hassio.api] Start API on 172.30.32.2
20-02-27 07:37:20 INFO (MainThread) [hassio.addons] Phase 'initialize' start 0 add-ons
20-02-27 07:37:20 INFO (MainThread) [hassio.core] Hass.io reboot detected
20-02-27 07:37:20 INFO (MainThread) [hassio.tasks] All core tasks are scheduled
20-02-27 07:37:20 INFO (MainThread) [hassio.core] Hass.io is up and running

The problem is with these lines here:

20-02-27 07:37:19 INFO (SyncWorker_2) [hassio.docker.interface] Attach to homeassistant/qemux86-64-homeassistant with version 0.106.0
20-02-27 07:37:20 INFO (SyncWorker_8) [hassio.docker.interface] Attach to hassioaddons/node-red-amd64 with version 6.1.0

The problem is that those docker instances are not, in fact, running!

There's nothing to attach to!

But the supervisor doesn't seem to notice this.

The only "fix" that I've found is to delete the docker images (not the docker containers, the images); something about having to download the images causes the system to also start the images.

Some things I've noticed in the code:

In /usr/src/hassio/hassio/homeassistant.py we have:

        """Prepare Home Assistant object."""
        try:
            # Evaluate Version if we lost this information
            if not self.version:
                self.version = await self.instance.get_latest_version(
                    key=pkg_version.parse
                )

            await self.instance.attach(tag=self.version)
        except DockerAPIError:
            _LOGGER.info("No Home Assistant Docker image %s found.", self.image)
            await self.install_landingpage()
        else:
            self.version = self.instance.version
            self.save_data()

What I'm noticing is that the only error condition that that code catches is the "no image found" error condition; I suspect this is the source of the problem.

Also, I see that there's a "repair" function in /usr/src/hassio/hassio/homeassistant.py that looks like it forces an install, and the install functions cause the docker start to occur.

Is there a way to trigger the running of this "repair" function in a Linux Docker setup? I've exec'd into my docker instances and I don't seem to have this nifty "hassio" command everybody else has.

rlpowell commented 4 years ago

This seems like such a glaring issue that I'm left wondering if I'm holding it wrong? Like, how do other people running hassio with Docker on Linux perform a clean restart of everything?

rlpowell commented 4 years ago

I grabbed https://github.com/home-assistant/cli , and can thus confirm that "/root/go/bin/cli supervisor repair ", once I figured out my api token, did not fix it, although it did cause docker images to get downloaded; I still have only:

CONTAINER ID        IMAGE                                   COMMAND                  CREATED             STATUS              PORTS               NAMES
a3ce358a1bc1        homeassistant/amd64-hassio-dns:1        "coredns -conf /conf…"   34 minutes ago      Up 30 minutes                           hassio_dns
3297768c41e0        homeassistant/amd64-hassio-supervisor   "/bin/entry.sh pytho…"   34 minutes ago      Up 30 minutes                           hassio_supervisor

reload also did not help.

pvizeli commented 4 years ago

yep, never adjust manual docker container managed by supervisor.

Use somethings like ha host reboot or ha host shutdown. But also the systemd will care that the supervisor is shutdown correctly. You can't shudown the supervisor manual, only if you want remove it from system. The local installed part on your host will prevent that and also care that he shutdown/startup correctly on boot.

This is no issue:

20-02-27 07:37:19 INFO (SyncWorker_2) [hassio.docker.interface] Attach to homeassistant/qemux86-64-homeassistant with version 0.106.0
20-02-27 07:37:20 INFO (SyncWorker_8) [hassio.docker.interface] Attach to hassioaddons/node-red-amd64 with version 6.1.0

He correctly attach to the data on the host of this add-ons. The Supervisor work different as the local docker / compose system.

Like we write on every information channel: The Supervisor is not just a container.

Yes, the only way to fix that is with the ha supervisor repair command

After this is implemented, you have an CLI also with superviserd installations: https://github.com/home-assistant/hassio-installer/issues/54

rlpowell commented 4 years ago

I really must be badly confused about something, because most of what you said doesn't seem relevant to what's going on for me.

yep, never adjust manual docker container managed by supervisor.

Well, "sudo systemctl restart hassio-supervisor" didn't actually restart anything except the supervisor, so my options were limited.

Use somethings like ha host reboot or ha host shutdown.

I have no idea how to do that; there is no program named "ha" on my supervisor instance:

rlpowell@lebna> sudo docker ps -a
CONTAINER ID        IMAGE                                            COMMAND                  CREATED             STATUS              PORTS               NAMES
fed5b83755f7        hassioaddons/node-red-amd64:6.1.3                "/init"                  7 hours ago         Up 7 hours                              addon_a0d7b954_nodered
7c08bc814905        homeassistant/qemux86-64-homeassistant:0.106.0   "/bin/entry.sh pytho…"   2 days ago          Up 11 hours                             homeassistant
a3ce358a1bc1        homeassistant/amd64-hassio-dns:1                 "coredns -conf /conf…"   2 days ago          Up 2 days                               hassio_dns
3297768c41e0        homeassistant/amd64-hassio-supervisor            "/bin/entry.sh pytho…"   2 days ago          Up 2 days                               hassio_supervisor
rlpowell@lebna> sudo docker exec -it 3297768c41e0 bash
bash-5.0# find / -name ha
bash-5.0#

Are you talking about manually installing https://github.com/home-assistant/cli when you refer to the ha command?

...

According to the help for that script, "host reboot" means rebooting my Linux box. No, you don't get to reboot my server, sorry. It's doing several other things besides running hass.io

Why would rebooting my linux box be required just to get the homeassistant instance restarted?

But also: why would I need to download an extra script just to restart hass.io? That seems super weird.

But also the systemd will care that the supervisor is shutdown correctly.

No, it doesn't.

rlpowell@lebna> sudo docker ps -a
CONTAINER ID        IMAGE                                            COMMAND                  CREATED             STATUS              PORTS               NAMES
fed5b83755f7        hassioaddons/node-red-amd64:6.1.3                "/init"                  7 hours ago         Up 7 hours                              addon_a0d7b954_nodered
7c08bc814905        homeassistant/qemux86-64-homeassistant:0.106.0   "/bin/entry.sh pytho…"   2 days ago          Up 11 hours                             homeassistant
a3ce358a1bc1        homeassistant/amd64-hassio-dns:1                 "coredns -conf /conf…"   2 days ago          Up 2 days                               hassio_dns
3297768c41e0        homeassistant/amd64-hassio-supervisor            "/bin/entry.sh pytho…"   2 days ago          Up 2 days                               hassio_supervisor
rlpowell@lebna> sudo systemctl stop hassio-supervisor
[sudo] password for rlpowell:
rlpowell@lebna> sudo docker ps -a
CONTAINER ID        IMAGE                                            COMMAND                  CREATED             STATUS                     PORTS               NAMES
fed5b83755f7        hassioaddons/node-red-amd64:6.1.3                "/init"                  7 hours ago         Up 7 hours                                     addon_a0d7b954_nodered
7c08bc814905        homeassistant/qemux86-64-homeassistant:0.106.0   "/bin/entry.sh pytho…"   2 days ago          Up 11 hours                                    homeassistant
a3ce358a1bc1        homeassistant/amd64-hassio-dns:1                 "coredns -conf /conf…"   2 days ago          Up 2 days                                      hassio_dns
3297768c41e0        homeassistant/amd64-hassio-supervisor            "/bin/entry.sh pytho…"   2 days ago          Exited (0) 2 seconds ago                       hassio_supervisor

hassio-dns is still up, homeassistant is still up, node-red is still up.

If systemd was actually stopping or starting things cleanly, I wouldn't have opened this issue.

You can't shudown the supervisor manual, only if you want remove it from system. The local installed part on your host will prevent that and also care that he shutdown/startup correctly on boot.

This is no issue:

20-02-27 07:37:19 INFO (SyncWorker_2) [hassio.docker.interface] Attach to homeassistant/qemux86-64-homeassistant with version 0.106.0
20-02-27 07:37:20 INFO (SyncWorker_8) [hassio.docker.interface] Attach to hassioaddons/node-red-amd64 with version 6.1.0

He correctly attach to the data on the host of this add-ons. The Supervisor work different as the local docker / compose system.

As I said, I could be confused, but my point here was that it's attaching to an instance that did not exist. There was no homeassistant instance running. But the supervisor was perfectly content to claim to be attaching to an unstarted instance. It seems like that's a bug, and that if the homeassistant instance isn't running, the supervisor should notice that when it tries to attach, and turn it on.

Even if it's not going to self-repair in this case, shouldn't it at least generate an error? "Oh hey I tried to attach to this instance but it's down"?

Like we write on every information channel: The Supervisor is not just a container.

I mean, that was pretty obvious from the fact that it requires dbus and /var/run/docker.sock to function.

I understand that the supervisor is managing the other instances. My point is that it's not doing it very well in the case where an instance is down. And that it doesn't shut down the other instances when systemd tells it to shutdown.

Yes, the only way to fix that is with the ha supervisor repair command

So, again assuming that you're talking about manually installing https://github.com/home-assistant/cli when you refer to the ha command, that didn't actually work.

"/root/go/bin/cli supervisor repair" worked (the command ran, and a bunch of output was produced in the logs, but it didn't actually start the homeassistant instance. Nor did "supervisor reload".

What actually worked was "core start".

After this is implemented, you have an CLI also with superviserd installations: home-assistant/hassio-installer#54

As andygrunwald said, that issue doesn't really explain what the goal is. Is that issue why I don't have the "ha" command?

I'm new to this project, but I'm perfectly willing to help out; if you want to provide (a lot) more detail in that issue, I'd be happy to take a crack at it.

rlpowell commented 4 years ago

I can confirm that "ha host shutdown" does, in fact, shutdown my entire Linux box (I had reason to shut it down anyway so it was worth a try). This is very very much not what I ever want, but I guess hassio really expects to be in charge of the whole machine.

That's kind of exactly the opposite of the point of using docker, though.

yozik04 commented 4 years ago

I can confirm that there is something wrong with supervisor. Yesterday I have installed a new version of docker. That means docker service restart. amd64-hassio-supervisor container started, then started amd64-hassio-audio and amd64-hassio-dns. But homeassistant and plugins did not start.

brnhffmnn commented 4 years ago

What's the workaround here -- reboot the whole system?

rlpowell commented 4 years ago

I certainly never found any other way to do it besides rebooting the entire docker host; I eventually ended up moving from "run hassio on docker on my server" to "run hassio in a VM on my server", because hassio clearly wants to have complete control of the host it's running on, and giving it a VM solves that problem well enough. I actually wrote a bunch of scripts and so on to basically fully automate that; writeup is at https://github.com/rlpowell/hassio-vm

rct commented 4 years ago

I've been struggling with what I think is the same problem with perhaps a few different details.

Doing a cold boot, everything starts fine.

If I have to shut everything down to some maintenance, like upgrade docker, file system maintenance, etc., systemctl start hassio-supervisor does not restart homeassistant or the ssh add-on.

So there is no way to do any ha commands.

The supervisor will restart hassio-audio and hassio-dns, but it doesn't restart homeassistant or the community ssh add-on. (I haven't tested with the official ssh add-on).

I have tried to install the ha CLI on the host, however I don't think the supervisor's API port is exposed to the host by default. If homeassistant is running, I can set SUPERVISOR_ENDPOINT and then I can do ha commands on the host.

stale[bot] commented 4 years ago

This issue has been automatically marked as stale because it has not had recent activity. It will be closed if no further activity occurs. Thank you for your contributions.