home-assistant / home-assistant.io

:blue_book: Home Assistant User documentation
https://www.home-assistant.io
Other
4.77k stars 7.19k forks source link

mDNS/Zeroconf is not fully supported in HA Core via Docker #14153

Closed pedrolamas closed 4 years ago

pedrolamas commented 4 years ago

The problem

The current documentation states that one should use host network to have mDNS/Zeroconf discovery working in Docker, however I've came to realize that it works with just a few devices (like Chromecast) and most of them actually fail!

With HA Core container is on the host network, the container will get the mDNS entries and Zeroconf integration will get notified of them and even start the flow to notify the admin that a new device that can be integrated has been found.

However, the notification never actually appears as the container is unable to resolve the ".local" mDNS names, so the flow terminates!

The reason for this is because the HA images are based in Alpine and unfortunately Alpine doesn't have the necessary support for mDNS names resolution (normally via libnss-mdns).

This is also the reason why HA Supervised installations have an mDNS reflector and an internal DNS resolver (that basically allows for regular DNS queries to also resolve mDNS entries).

So I believe that these two documents need to be changed to reflect this limitation:

pedrolamas commented 4 years ago

There are a few tickets around this problem:

frenck commented 4 years ago

That doesn't have to be a problem, to be honest, as the mDNS records contain the actual IP address of the end device which can be used by integrations.

If this is a problem with a specific integration, this should be raised as an issue for that integration so it can be fixed.

pedrolamas commented 4 years ago

Using IP can be tricky as IP's can change... I assume that is also the reason why most integrations are using the mDNS FQDN instead of the IP.

Here are some examples:

All of the above are using the mDNS name, so there won't be any problem if/when the local IP changes!

Interestingly, the Chromecast one works by taking the IP and resolving that to the local domain FQDN and then using that!

@frenck I just noticed that you actually own the Elgato integration above! 😅

pedrolamas commented 4 years ago

To be quite honest, this wouldn't even be a problem if Alpine base images were replaced with full Debian or Ubuntu slim ones, but that can bring other unknown issues!

frenck commented 4 years ago

It isn't tricky, because config entries can be updated on the fly based on new discovery data. As for Elgato, yes, that needs a fix.

As for led, that is handled here: https://github.com/home-assistant/core/blob/dev/homeassistant/components/wled/config_flow.py#L95

It uses the host instead of the hostname and updates the configuration entry if needed.

pedrolamas commented 4 years ago

Wasn't aware of the "on the fly update" of the config entries, that is a lot better!

Not sure why you mentioned wled, but looking at the code of it, ~isn't that "host" populated with the mDNS hostname with the last "." trimmed?~ but I do see what you mean! That should work fine, yes!

pedrolamas commented 4 years ago

Given that, I'm happy to close this and instead open a few tickets around the integrations that don't have such a fix (like the Elgato one), does that sound ok to you @frenck ?

frenck commented 4 years ago

Yes, please, issues per integration make it assignable/trackable and workable.

pedrolamas commented 4 years ago

Closing this as discussed above, will open separate issues on each of the integrations I find with issues.

pedrolamas commented 4 years ago

@frenck I just noticed something interesting with my Elgato: the mDNS name is elgato-key-light-air-c1ec.local but the full local domain name is ElgatoKeyLightAirC1EC.my.local.domain.com

So getting the host from the mDNS name might not be trivial... for the Elgato it seems that removing the "-" is enough, but on other devices this might be not so simple...

OnFreund commented 4 years ago

Using IP can be tricky as IP's can change... I assume that is also the reason why most integrations are using the mDNS FQDN instead of the IP. Here are some examples: https://github.com/home-assistant/core/blob/45e451271eb8fc7c53e3e70e1b0389b872231584/homeassistant/components/volumio/config_flow.py#L98

That's not entirely accurate. Volumio uses the host field, which is actually the IP address. To overcome changing IPs, it updates the config entry with the new IP. (I just realized I probably need to add an entry update listener...)

I'm not sure there's a reliable way to get a hostname from mDNS that would work both locally and in a container.

pedrolamas commented 4 years ago

@OnFreund I took a quick look at the Chromecast integration and it seems that it will take the IP address reported by mDNS, do a reverse DNS query on that to return the local domain name, and then just use that from that point on!

OnFreund commented 4 years ago

Interesting, I wonder how reliable that is.

pedrolamas commented 4 years ago

It works for the Elgato Key Light (I can do a ping -a x.x.x.x and get the correct local domain name), but I can't say that it would work for other devices (like Volumio)...

OnFreund commented 4 years ago

I think it's a more a question of environments than device types

OnFreund commented 4 years ago

OK, it might be device related - on my environment it works with chromecast but not with Volumio

pedrolamas commented 4 years ago

This seems to work fine:

import socket

# this is the IP for my Elgato Key Light as reported by mDNS
IpFromMDns = "192.168.40.33"

# this will put the local domain host for it
RealHostname = socket.gethostbyaddr(IpFromMDns)
OnFreund commented 4 years ago

I get:

socket.herror: [Errno 1] Unknown host
pedrolamas commented 4 years ago

I'm afraid it doesn't seem to work with all devices... an example are my Hikvision cameras, I get the same error you do when I use their IP addresses.

My Hikvision cameras actually don't seem to have any hostname set/publicized as Unifi controller doesn't know about it either!

darek-margas commented 11 months ago

So what is solution for all these ESPhome things which are on dynamic IPs and natively use mDNS? How to add them by name while (as you say) mDNS in docker isn't required.

pedrolamas commented 11 months ago

@darek-margas this ticket is more than 3 years old, so I'm not sure the information here is still true/relevant... for what is worth, mDNS/Zeroconf is working just fine in my HA Core in Docker (just make sure to use the host network and not a bridge one)

darek-margas commented 11 months ago

@darek-margas this ticket is more than 3 years old, so I'm not sure the information here is still true/relevant... for what is worth, mDNS/Zeroconf is working just fine in my HA Core in Docker (just make sure to use the host network and not a bridge one)

Well, I use host network and in general I am proficient in using docker so it is not docker problem. It has been a while but mDNS is still missing in official core docker, it does not work and is unable to resolve anything. I see here it is always someone's else problem, integrations have to be fixed. So, I wonder how they are going to fis ESPhome? It does not work as it uses mDNS for itself making it incompatible with docker core image. Yes, it is possible to map static IP to ESPhome devices but it should not be required. However, there is NO WAY to operate these by names over mDNS. Funny, it works from ESPhome docker container. Thus my question - how is that going to be fixed? It is easy to actually add it I just wonder if I want to lock myself in rebuilding each version.

DavidSchinazi commented 6 months ago

Hi folks, I think I'm hitting the same issue here. I'm trying to set up a RESTful switch, and due to some aspects of my home network, that device's IP changes quite a bit so I need to use mDNS (e.g., foobar.local). The REST integration is failing to reach the device. I've tracked it down to an mDNS-in-docker issue because on my raspberry pi, I'm able to access it via mDNS in the pi directly (both ping -c1 foobar.local and curl foobar.local work) but not in the HA docker (both docker exec homeassistant ping -c1 foobar.local and docker exec homeassistant curl foobar.local fail with it unable to resolve the host). My HA docker was started with --network=host per the docker instructions. Based on this discussion, it sounds like the issue comes from HA using Alpine Linux, so I can see how this wouldn't be trivial to fix, so I totally understand if can't happen soon, but would it make sense to reopen this issue? (At the end of the day, I'm mainly trying to figure out if the issue still exists or if I'm the one holding it wrong - which is always possible). Thanks!

norburban commented 6 months ago

@DavidSchinazi - I'm struggling with the same issue.

DavidSchinazi commented 6 months ago

I was able to confirm that this issue is caused by the fact that musl doesn't currently support mDNS. musl is the C standard library used by Alpine Linux, which is the OS running in the HA container. I'm discussing mDNS with the musl maintainers on their mailing list. This could lead to them adding mDNS support in musl, but it's not certain that this feature will be enabled by default so it might require further changes. In the meantime, the Alpine Linux wiki has a workaround for supporting mDNS. Since there is a container involved here, it's possible to deploy this workaround in the external OS that docker is running in. Since the HA container is running with --network=host, docker copies the external OS's /etc/resolv.conf when it starts the container. So if that file points to 127.0.0.1 and the resolver running there has the workaround from the Alpine wiki, then mDNS should work inside the container. I haven't tried it myself though.

norburban commented 6 months ago

Thanks @DavidSchinazi - this is way over my head ;) I took a look at /etc/resolv.conf and it's set to 127.0.0.11. I performed a full backup, changed it to 127.0.0.1 and rebooted, but alas it returned to 127.0.0.11. I suspect this is what you mean by 'docker copies the external OS's /etc/resolve.conf when it starts the container'

rubin110 commented 1 week ago

🤔 TL;DR if your local DNS server doesn't know about mDNS hostnames, your containers most likely won't either. 🤔

I'm sharing my journey in trying to fix this in the hopes that it helps out others that may have a similar setup as I do. If you use some special local DNS server, like Adguard, this is really relevant to you. So I've been hitting this same issue where it doesn't seem like Zeroconf or mDNS of any kind is working for either my HA container or ESPHome. I spent most of yesterday trying to understand what's wrong, how to debug, and what to fix. I will say everything was working up until a few months ago. I don't recall any changes I made back then that would have impacted this.

Finally last night I understood (or at least I think I understood) that these are two very separate processes, and I'm pretty certain that this all was explained on some page I stumbled somewhere in the sea of tabs I have open now but it never clicked...

Even though something like avahi-daemon via libnss-mdns can find devices, that doesn't directly mean that the discovered hostname+IP address becoming available when a different tool is trying to resolve a .local hostname. I'm under the impression that a bunch of stuff in HA that depends on mDNS will find devices just fine, but fail when trying to do anything else because whatever other tool doesn't get a proper resolution when attempting to do so.

I then learned about /etc/nsswitch.conf, which when tickled by something like ping provides a list of different things to use to resolve addresses on that system. On my host machine the relevant line looks like this:

hosts:          files mdns4_minimal [NOTFOUND=return] dns

I assume mdns4_minimal hits libnss-mdns which does discovery and then helps to resolve the .local hostnames when ping asks nicely, before hitting dns to poke your DNS services in /etc/resolv.conf. So on my host machine running ping cat-bed-plug.local resolves to an IP address, and does so before poking my DNS server.

In the HA and ESPHome containers I don't see mdns4_minimal inside of /etc/nsswitch.conf. Both containers are running with network: host enabled, and in firewalld I've got docker0 tied to my local network zone for posterity. If I installed avahi-utils and its depends in either container, I can totally run avahi-browse -ar and get results from my local network, but ping cat-bed-plug.local still doesn't work. I don't even think ESPHome cares about discovery, only that it needs to poke .local addresses that it's assumed based off of the device names in each of the yaml configs. So it seems like the main issue is that hostname resolution doesn't work.

At this point I don't understand if the containers should or shouldn't have mdns4_minimal inside of /etc/nsswitch.conf, and I wanted some sort of solution that works for all of my containers.

As it turns out I'm running Adguard in another container on port 53 through network: host, but it has no concept of mDNS. I do remember dnsmasq on my host OS does have some options to include hostnames resolved through mDNS. I had killed off dnsmasq since I started using Adguard. First I made sure that libnss-mdns was up and running on my host OS (on something Debian based you can test this by grabbing the avahi-utils package and running avahi-browse -ar). I got dnsmasq up and running again but on a different port, and following this post I added in some lines to provide resolution for .local addresses.

# General configuration
# DNS
domain-needed
bogus-priv
port=5340
no-hosts
domain=local
local=/local/
listen-address=::1,127.0.0.1
expand-hosts

Once that service was running again on my host OS, in Adguard's Upstream DNS servers list I provided [/local/]127.0.0.1:5340...

image

I made sure my containers were using 127.0.0.1 for DNS, and now ping cat-bed-plug.local inside of either container works! For ESPHome I still needed to add in ESPHOME_DASHBOARD_USE_PING=true to env.

In the end I imagine what these containers (any containers wanting to use mDNS addresses) are expecting is a local DNS server that can resolve .local addresses. So I don't know if what I've done here is truly a fix, or just a workaround.

If I got any of this wrong I'm welcome to shared knowledge and wisdom. :)

Linux seldom lives up to their parent's expectations.

DavidSchinazi commented 1 week ago

Hi @rubin110, your analysis is mostly correct, you're just missing one implementation detail. You're right that most programs (e.g. ping) don't know about mDNS - they get a hostname from the user, and then call a system API to resolve it into IP addresses. Most programs use getaddrinfo for that. getaddrinfo is implemented inside libc, which is the lowest-level library that all userspace programs link. Linux distributions all pick a libc.

HomeAssistant uses getaddrinfo just like ping does, so since it runs inside the Alpine container, it'll end up with what I describe above.

rubin110 commented 1 week ago

@DavidSchinazi Thank you for the clarifications!