home-assistant / plugin-dns

CoreDNS implementation for Home Assistant
Apache License 2.0
19 stars 14 forks source link

Not resolving local host names #20

Closed ssummer closed 2 years ago

ssummer commented 3 years ago

In home assistant, if I use hostnames in configuration.yaml they fail to resolve (eg using platform snmp ). Using the corresponding IP addresses works, but is far from ideal and means I have to use fixed IP addresses and configuration becomes less unreadable and requires more maintenance.

Trying to work out what is the root issue is, has led me to hassio-dns seemingly not working as I would expect. The homeassistant docker container appears to be using 172.30.32.3 for dns which is the hassio-dns container. From within the homeassistant container this local hostname fails to resolve: dig pdu3.netality.co.uk and I can see from the dig output it is going out to the Internet to do the DNS lookup.

But dig pdu3.netality.co.uk @192.168.66.1 from home assistant does work (not surprisingly).

The strange thing is from within the hassio-dns container, fully qualified and unqualified local hostname lookups do work: dig pdu3.netality.co.uk dig pdu3 But dig pdu3.netality.co.uk @172.30.32.3 does not work (unsurprisingly).

What I would expect is that fully qualified and unqualified hostnames should resolve as I have set the search domain to netality.co.uk and dns server to 192.168.66.1 in HassOS.

I don't think this has ever worked with hassio-dns.

Versions: armhf-hassio-dns:2020.11.0 raspberrypi2-homeassistant:2020.12.1 armhf-hassio-supervisor:latest (2020.12.7) HassOS: 5.9

ssummer commented 3 years ago

Was doing a bit more poking around in hassio-dns - I tried killing the coredns process (which caused a new one to be started) - it then started to work: From within hassio-dns: dig pdu3.netality.co.uk @172.30.32.3 dig pdu3 @172.30.32.3 dig pdu3.netality.co.uk dig pdu3 all worked as expected and homeassistant container then starts to resolve local names (both fully qualified and unqualified): dig pdu3.netality.co.uk dig pdu3

But then after a few minutes it stops working again. So seems like something odd going on with coredns.

ssummer commented 3 years ago

/etc/corefile

.:53 {
    log
    errors
    loop

    hosts /config/hosts {
        fallthrough
    }
    template ANY AAAA local.hass.io hassio {
        rcode NOERROR
    }
    mdns
    forward . dns://192.168.66.1 dns://192.168.66.1 dns://127.0.0.1:5553 {
        except local.hass.io
        policy sequential
        health_check 5s
    }
    fallback REFUSED . dns://127.0.0.1:5553
    fallback SERVFAIL . dns://127.0.0.1:5553
    fallback NXDOMAIN . dns://127.0.0.1:5553
    cache 10
}

.:5553 {
    log
    errors

    forward . tls://1.1.1.1 tls://1.0.0.1 {
        tls_servername cloudflare-dns.com
        except local.hass.io
        health_check 10s
    }
    cache 30
}
Zixim commented 3 years ago

Similar to https://github.com/home-assistant/plugin-dns/issues/6 ? Which is basically : if for some reason core-dns doesn't get an answer from your locally hosted server, core-dns will try the fallback. The fallback has of course no clue about your local hosts. Core-DNS will not revert back to the dns server that you configured, but keep on using the fallback, until you do a ha dns restart

ssummer commented 3 years ago

It didn't feel the same when I raised the issue, but now I can see it could be the same issue. In my case local names do not seem to resolve at all - when Hass starts, all hostname entries in configuration.yaml fail to load with resolve errors. But perhaps something else fails to resolve and it flips to the fallback before processing configuration.yaml?

Coredns shouldn't flip to permanently use the fallback - that is plainly broken behaviour.

6 says it was fixed some time ago - but as you said and I see it appears it isn't. I don't understand this comment under this ticket:

"You will all time see that because of health check. Please use Home Assistant Container if you have an issue with that" I'm using the standard Home Assistant image (ie HassOS).

It doesn't seem like too much to ask to for HA to use the specified local dns resolver and it seems like a lot of people have been affected by this for a long time. Also having hardcoded fallback server IPs that can't be changed doesn't seem right. Perhaps a better solution would at least allow the default fallback server to be overridden in HA and ideally, optionally disabled completely.

Zixim commented 3 years ago

I'm also using the default HAOS. What I understand from the cryptic "You will all time see that because of health check. Please use Home Assistant Container if you have an issue with that" is that we're better off using any installation method that does not use core-DNS when wanting to have your own DNS setup. Words fail me to describe how wrong this reasoning is...but hey...it's open source, and you can always issue a pull request & other snide remarks...

TLDR : HAOS is the only service that has recurring dns issues in my multi-VM home setup. coreDNS is broken, dev can't/won't fix. enf of.

ssummer commented 3 years ago

I don't see the point of even having the option to set the name server in HA, if it doesn't work and it doesn't use it. It's also very misleading to instead use another entirely different, undocumented name server and I don't really want another random company to see my (supposedly internal to my network) dns lookups. I agree this is the only application that has this problem that I am using/have ever used. I am using the most simple HA setup - a single raspberry pi dedicated to running the stock HAOS. I repeat - it doesn't seem unreasonable to expect that HA uses the name server that is specified.

ssummer commented 3 years ago

I think I have found something; forward . dns://192.168.66.1 dns://192.168.66.1 dns://127.0.0.1:5553 According to the coredns docs this will load balance requests amongst each 'upstream' - this will result in 1 in 3 dns requests going to the second coredns instance (port 5553) which will use cloudflare server - so this will certainly cause errors for local hostnames. I think the 'dns://127.0.0.1:5553' from the above line should be removed - the fallback plugin entries will ensure that if the local dns server doesn't have the answer it will go to cloudflare.

ssummer commented 3 years ago

So I tried removing dns://127.0.0.1:5553 from the first forward line in /etc/corefile and restarted coredns. I then changed all my IP addresses to hostnames in configuration.yaml and then restarted home assistant and success - all the hostnames now resolve fine. However if I restart the host my changes will get overwritten again. So seems like the /usr/share/corefile.tempio file in plugin-dns would need changing to implement this properly: forward . {{ JoinString .servers " " }} {{ if len .locals | eq 0 }}dns://127.0.0.11{{ else }}{{ JoinString .locals " " }}{{ end }} dns://127.0.0.1:5553 { However there may be another issue with this line - if the "servers" and "locals" in /config/coredns.json are different (not sure where each of these comes from) then a similar issue will still occur as dns lookups will flip between those two - in my case both are the same, so it's fine: dns://192.168.66.1 It looks to me like it was expected that coredns would try each upstream in turn - but that is not the case - it treats them all as equals and will load balance across them. Sequential option has been chosen, so first dns query one will use the first upstream, the second dns query will use the second, the third will use the third upstream and then the fourth query will wrap round and use the first upstream and so on.

craSH commented 3 years ago

Just chiming in here to say that I'm running in to the same problems on my home network, and I found this issue once I got down in to the coredns container's config (I didn't know exactly where to look for issues before getting that deep in). I agree with everything posted here so far, and am surprised there isn't more activity here - I imagine lots of HA users have their setup communicating with local devices via internal hostnames.

My local network uses a subdomain from a real internet zone, not a fake TLD or the mDNS .local zone (e.g. lan.example.com)

I have not yet looked in to how the main HA config makes its way into the coredns config, but if we could have an option to entirely disable the use of 3rd party DoT that really seems desirable. One of the reasons I use HA vs. another system is that I thought it left me in control of my data, and now this update has started sending DNS requests for my local resources to a third party DoT server without my knowing or wanting that to occur. I run my own local recursors and authoritative DNS for those resources, and would prefer HA use those systems which I control.

@ssummer and @Zixim I'm happy to help debug or provide more info for my setup, and dig into making some test PRs and the like (but I haven't done any dev/contribution to the HA codebases in the past).

Edit: Changed DoH references to DoT, as the coredns config is using tls://, not https:// for it's external DNS.

craSH commented 3 years ago

I found another problem which I think may actually be the root cause of this issue. I noticed that I still actually see all the requests for my local resources hitting my local DNS server, but all responses are coming back SERVFAIL (DNS level failure) - it seem that coredns is requiring DNSSEC signed records from local resolvers. I performed a packet capture while running the following queries for my local PurpleAir device, from the HassIO host (via the HACP ssh/terminal addon) to demonstrate.

dig purpleair-5df.hamwan.tlr.im. - use system resolver, which ends up routing the request through coredns in the plugin-dns container.

This causes a query to hit my local DNS server like so, as decoded with Wireshark:

Frame 1: 110 bytes on wire (880 bits), 110 bytes captured (880 bits)
Ethernet II, Src: Raspberr_7c:67:22 (dc:a6:32:7c:67:22), Dst: Wibrain_45:44:90 (00:1e:06:45:44:90)
Internet Protocol Version 4, Src: homeassistant.lan.tlr.im (10.9.7.13), Dst: dnsdist.lan.tlr.im (10.9.7.21)
User Datagram Protocol, Src Port: 39937 (39937), Dst Port: domain (53)
Domain Name System (query)
    Transaction ID: 0x6684
    Flags: 0x0120 Standard query
        0... .... .... .... = Response: Message is a query
        .000 0... .... .... = Opcode: Standard query (0)
        .... ..0. .... .... = Truncated: Message is not truncated
        .... ...1 .... .... = Recursion desired: Do query recursively
        .... .... .0.. .... = Z: reserved (0)
        .... .... ..1. .... = AD bit: Set
        .... .... ...0 .... = Non-authenticated data: Unacceptable
    Questions: 1
    Answer RRs: 0
    Authority RRs: 0
    Additional RRs: 1
    Queries
        purpleair-5df.hamwan.tlr.im: type A, class IN
            Name: purpleair-5df.hamwan.tlr.im
            [Name Length: 27]
            [Label Count: 4]
            Type: A (Host Address) (1)
            Class: IN (0x0001)
    Additional records
        <Root>: type OPT
            Name: <Root>
            Type: OPT (41)
            UDP payload size: 2048
            Higher bits in extended RCODE: 0x00
            EDNS0 version: 0
            Z: 0x8000
                1... .... .... .... = DO bit: Accepts DNSSEC security RRs
                .000 0000 0000 0000 = Reserved: 0x0000
            Data length: 12
            Option: COOKIE
                Option Code: COOKIE (10)
                Option Length: 8
                Option Data: b770ff1ea6bc60a0
                Client Cookie: b770ff1ea6bc60a0
                Server Cookie: <MISSING>
    [Response In: 2]

And the corresponding response, which in my case is a SERVFAIL because my local DNS server does have DNSSEc signed records available for my local zone (which I suspect is the case for most users with local DNS from their home routers/etc!)

Domain Name System (response)
    Transaction ID: 0x6684
    Flags: 0x8582 Standard query response, Server failure
        1... .... .... .... = Response: Message is a response
        .000 0... .... .... = Opcode: Standard query (0)
        .... .1.. .... .... = Authoritative: Server is an authority for domain
        .... ..0. .... .... = Truncated: Message is not truncated
        .... ...1 .... .... = Recursion desired: Do query recursively
        .... .... 1... .... = Recursion available: Server can do recursive queries
        .... .... .0.. .... = Z: reserved (0)
        .... .... ..0. .... = Answer authenticated: Answer/authority portion was not authenticated by the server
        .... .... ...0 .... = Non-authenticated data: Unacceptable
        .... .... .... 0010 = Reply code: Server failure (2)
    Questions: 1
    Answer RRs: 0
    Authority RRs: 0
    Additional RRs: 1
    Queries
        purpleair-5df.hamwan.tlr.im: type A, class IN
            Name: purpleair-5df.hamwan.tlr.im
            [Name Length: 27]
            [Label Count: 4]
            Type: A (Host Address) (1)
            Class: IN (0x0001)
    Additional records
        <Root>: type OPT
            Name: <Root>
            Type: OPT (41)
            UDP payload size: 1232
            Higher bits in extended RCODE: 0x00
            EDNS0 version: 0
            Z: 0x8000
                1... .... .... .... = DO bit: Accepts DNSSEC security RRs
                .000 0000 0000 0000 = Reserved: 0x0000
            Data length: 0
    [Request In: 1]
    [Time: 0.006167000 seconds]

Note near the bottom within the OPT record, the DO flag is set "DO bit: Accepts DNSSEC security RRs" - from dig(1):

+[no]dnssec Requests DNSSEC records be sent by setting the DNSSEC OK bit (DO) in the OPT record in the additional section of the query.

I can stimulate the same response if I force the DO flag to be set with dig(1) and specify my local DNS server explicitly:

dig purpleair-5df.hamwan.tlr.im. @10.9.7.21 +dnssec - use my local network resolver directly, with the same DO flag set like the coredns requests.

The same request/response is observed in this case. I'll show the dig output here (the wireshark decoded results are essentially the same):

; <<>> DiG 9.16.6 <<>> purpleair-5df.hamwan.tlr.im. @10.9.7.21 +dnssec
;; global options: +cmd
;; Got answer:
;; ->>HEADER<<- opcode: QUERY, status: SERVFAIL, id: 65085
;; flags: qr aa rd ra; QUERY: 1, ANSWER: 0, AUTHORITY: 0, ADDITIONAL: 1

;; OPT PSEUDOSECTION:
; EDNS: version: 0, flags: do; udp: 1232
;; QUESTION SECTION:
;purpleair-5df.hamwan.tlr.im.   IN  A

;; Query time: 9 msec
;; SERVER: 10.9.7.21#53(10.9.7.21)
;; WHEN: Sun Jan 17 10:20:58 PST 2021
;; MSG SIZE  rcvd: 56

Now, I can verify the same request works without the DO flag set by passing +nodnssec to dig(1).

dig purpleair-5df.hamwan.tlr.im. @10.9.7.21 +nodnssec - use my local network resolver directly, without requesting DNSSEC records:

; <<>> DiG 9.16.6 <<>> purpleair-5df.hamwan.tlr.im. @10.9.7.21 +nodnssec
;; global options: +cmd
;; Got answer:
;; ->>HEADER<<- opcode: QUERY, status: NOERROR, id: 41365
;; flags: qr aa rd ra; QUERY: 1, ANSWER: 1, AUTHORITY: 0, ADDITIONAL: 1

;; OPT PSEUDOSECTION:
; EDNS: version: 0, flags:; udp: 1232
;; QUESTION SECTION:
;purpleair-5df.hamwan.tlr.im.   IN  A

;; ANSWER SECTION:
purpleair-5df.hamwan.tlr.im. 900 IN A   192.168.88.35

;; Query time: 9 msec
;; SERVER: 10.9.7.21#53(10.9.7.21)
;; WHEN: Sun Jan 17 10:21:49 PST 2021
;; MSG SIZE  rcvd: 72

From what I can tell, the CoreDNS configuration as currently defined will cause any local queries which return SERVFAIL to immediately retry at the external DNS over TLS server (e.g. Cloudflare) due to this bit of configuration in the .:53 {} block:

fallback SERVFAIL . dns://127.0.0.1:5553

This is probably a violation of the EDNS0 RFC (a SHOULD statement in it, at least) (https://tools.ietf.org/html/rfc2671#section-5.3):

5.3. Responders who do not understand these protocol extensions are expected to send a response with RCODE NOTIMPL, FORMERR, or SERVFAIL. Therefore use of extensions should be "probed" such that a responder who isn't known to support them be allowed a retry with no extensions if it responds with such an RCODE. If a responder's capability level is cached by a requestor, a new probe should be sent periodically to test for changes to responder capability.

It is not clear to me how one configured CoreDNS to not set the DO flag for it's initial requests, but that's probably not the correct approach anyways - I think making the CoreDNS configuration somehow retry without the flag upon receiving an initial response of NOTIMPL, FORMERR, or SERVFAIL is ideal.

To further verify this, I suppose I'll setup DNSSec for my internal zones and see if that "fixes" it, too.

craSH commented 3 years ago

Once upgrading my local zones to support DNSSEC, this issue does go away for me, at least somewhat validating my analysis above. But I still want to emphasize this is important to fix, as most users will probably not have DNSSEC signed local DNS in their local home networks.

New dig results after enabling DNSSSEC for my hamwan.tlr.im. zone:

Direct query to my DNS server demonstrating new DNSSEC support (DO flag set):

dig purpleair-5df.hamwan.tlr.im. @10.9.7.21 +dnssec

; <<>> DiG 9.16.6 <<>> purpleair-5df.hamwan.tlr.im. @10.9.7.21 +dnssec
;; global options: +cmd
;; Got answer:
;; ->>HEADER<<- opcode: QUERY, status: NOERROR, id: 42046
;; flags: qr aa rd ra; QUERY: 1, ANSWER: 2, AUTHORITY: 0, ADDITIONAL: 1

;; OPT PSEUDOSECTION:
; EDNS: version: 0, flags: do; udp: 1232
;; QUESTION SECTION:
;purpleair-5df.hamwan.tlr.im.   IN  A

;; ANSWER SECTION:
purpleair-5df.hamwan.tlr.im. 900 IN RRSIG   A 13 4 900 20210128000000 20210107000000 55748 hamwan.tlr.im. C2AjuHRgGOAe/etgrasytF5vAXb5dHONUXY81Y2KGdKqGXxb5XsKp2V/ 7quJXooLryvZB6QcQjJMzXUgZaGyYw==
purpleair-5df.hamwan.tlr.im. 900 IN A   192.168.88.35

;; Query time: 9 msec
;; SERVER: 10.9.7.21#53(10.9.7.21)
;; WHEN: Sun Jan 17 11:55:26 PST 2021
;; MSG SIZE  rcvd: 181

Query to local DNS resolver (coredns / plugin-dns) with explicit dnssec requested, and returned properly (DO flag set):

dig purpleair-5df.hamwan.tlr.im. +dnssec

; <<>> DiG 9.16.6 <<>> purpleair-5df.hamwan.tlr.im. +dnssec
;; global options: +cmd
;; Got answer:
;; ->>HEADER<<- opcode: QUERY, status: NOERROR, id: 23367
;; flags: qr aa rd ra; QUERY: 1, ANSWER: 2, AUTHORITY: 0, ADDITIONAL: 1

;; OPT PSEUDOSECTION:
; EDNS: version: 0, flags: do; udp: 4096
; COOKIE: 40e4e84a9c3a9bca (echoed)
;; QUESTION SECTION:
;purpleair-5df.hamwan.tlr.im.   IN  A

;; ANSWER SECTION:
purpleair-5df.hamwan.tlr.im. 10 IN  A   192.168.88.35
purpleair-5df.hamwan.tlr.im. 10 IN  RRSIG   A 13 4 900 20210128000000 20210107000000 55748 hamwan.tlr.im. C2AjuHRgGOAe/etgrasytF5vAXb5dHONUXY81Y2KGdKqGXxb5XsKp2V/ 7quJXooLryvZB6QcQjJMzXUgZaGyYw==

;; Query time: 0 msec
;; SERVER: 172.30.32.3#53(172.30.32.3)
;; WHEN: Sun Jan 17 11:56:18 PST 2021
;; MSG SIZE  rcvd: 247

Query to local DNS resolver (coredns / plugin-dns) without explicit dnssec requested (no DO flag set):

dig purpleair-5df.hamwan.tlr.im.

; <<>> DiG 9.16.6 <<>> purpleair-5df.hamwan.tlr.im.
;; global options: +cmd
;; Got answer:
;; ->>HEADER<<- opcode: QUERY, status: NOERROR, id: 10039
;; flags: qr aa rd ra; QUERY: 1, ANSWER: 1, AUTHORITY: 0, ADDITIONAL: 1

;; OPT PSEUDOSECTION:
; EDNS: version: 0, flags:; udp: 4096
; COOKIE: d3a85ca6efd535be (echoed)
;; QUESTION SECTION:
;purpleair-5df.hamwan.tlr.im.   IN  A

;; ANSWER SECTION:
purpleair-5df.hamwan.tlr.im. 8  IN  A   192.168.88.35

;; Query time: 0 msec
;; SERVER: 172.30.32.3#53(172.30.32.3)
;; WHEN: Sun Jan 17 11:57:16 PST 2021
;; MSG SIZE  rcvd: 111
Zixim commented 3 years ago

@crash

@ssummer and @Zixim I'm happy to help debug or provide more info for my setup, and dig into making some test PRs and the like (but I haven't done any dev/contribution to the HA codebases in the past).

I feel like I mis-explained : the bug-report I opened about this whole coreDNS mess was closed by the dev, after a kind of won't fix remark. Perhaps you could start from scratch with a new bug-report, we might get lucky, with 2020 being behind us now.

I wrote "it's open source, and you can always issue a pull request " because that's a typical answer users get from devs, when the dev feels like the users are asking too much. It hasn't happened (yet) in this case, but the way in which the dev acknowledged the bug & told me to use another installation method sure did taste like that.

craSH commented 3 years ago

@Zixim Agreed, I actually followed you there, but was trying to be an open minded as possible in my first post about this :D Personally, I'd probably opt for a pull request that undoes adding the coredns plugin at all, I don't quite see the benefit for this project.. but that's a strong opinion in of itself.

Is there a chat server folks hang out on we could discuss this further, without flooding GH issue threads?

danielbrunt57 commented 3 years ago

I've been troubleshooting 1st time setup of a REST sensor with this error off and on for a few weeks now:

Logger: homeassistant.components.rest.data
Source: components/rest/data.py:67
Integration: rest (documentation, issues)
First occurred: 9:23:26 PM (21 occurrences)
Last logged: 10:15:57 PM
Error fetching data: https://[my_domain].duckdns.org:8123/api/config failed with [Errno -2] Name does not resolve 

and I've finally ended up here. None of my router defined static entries can be resolved by ha dns so i think I am in the right place. I know TELUS will just say "dns sec?? what's that?" image I've removed my split DNS entry for [my_domain].duckdns.org and let it resolve to the public IP but the REST sensor now says

Logger: homeassistant.components.rest.data
Source: components/rest/data.py:67
Integration: rest (documentation, issues)
First occurred: 10:37:16 PM (1 occurrences)
Last logged: 10:37:16 PM
Error fetching data: https://[my_domain].duckdns.org:8123/api/config failed with 

since the router does not support loop-back.

I managed to finally get my REST sensor working using the following config:

  - platform: rest
    name: Hassio Configuration
#    resource: !secret resource_hassio_main_config
    resource: https://homeassistant.local.hass.io:8123/api/config
    verify_ssl: false
    authentication: basic
    value_template: >
      {{ value_json.version }}
    json_attributes:
      - components
      - unit_system
      - config_dir
      - version
    headers:
      Content-Type: application/json
      Authorization: !secret api_bearer_token
      User-Agent: Home Assistant REST sensor

AND I now know more about ha dns. Forcing DNSSEC is definitely wrong in my opinion. There should at least be a way to disable it for local DNS...

Zixim commented 3 years ago

@maiko29 your issue absolutely does not sound like any kind of DNS issue. You should seek help on the forum or on discord.

McGiverGim commented 3 years ago

Sorry, I don't understand exactly all of this (my knowledge about networking or linux is limited) but I think I have the same problem commented here.

My local device names are not being resolved. They were some weeks ago, but I don't know when this changed.

My router acts as DNS, and is assigned by DHCP to all the devices, including the raspberry pi with Home Assistant, and things like localdevicename.mydomain were working, but now they don't. Other computers in my network work continue resolving the names ok, so it seems a problem with the DNS used in Home Assistant.

Something I can do/help with to fix this?

McGiverGim commented 3 years ago

If it helps... the nslookup resolves the name, but it throws an error:

➜  ~ nslookup camara-pasillo.piminet
Server:         172.30.32.3
Address:        172.30.32.3#53

Name:   camara-pasillo.piminet
Address: 192.168.100.20
** server can't find camara-pasillo.piminet: NXDOMAIN

➜  ~ ping camara-pasillo.piminet
ping: bad address 'camara-pasillo.piminet'

As you can see, the address is finding the correct IP (192.168.100.20 in this case), but it throws an NXDOMAIN error later that produces that this IP is not used. As I said, my knowledge about DNS and Linux is very reduced, so I don't know how to interpret that.

danielbrunt57 commented 3 years ago

My understanding of this problem is HA requires a DNSSEC lookup. My router does not support that so I had to add & configure the AdGuard Home addon for HA to use for its secure DNS lookups. I could not find any other way...

Zixim commented 3 years ago

HA requires a DNSSEC lookup.

Don't think so. My local dns resolver (Pi-hole) doesn't use DNSSEC for local hostname resolution. All works fine, untill Broken CoreDNS decides it needs to stop using my dns server and instead start using a HARDCODED dns server on the internet, which of course has no idea about my local hosts. Worse, CoreDNS then leaks internal hostnames to some server in the internet, and we can't stop it doing that.

No developer wants to fix this glaring security bug, so we should try to get the feature request voted up : https://community.home-assistant.io/t/improve-privacy-stop-using-hardcoded-dns/273496 Please help.

McGiverGim commented 3 years ago

I’ve found a workaround modifying the coredns template… I don’t know if this can have some drawbacks, but it seems to work and I have not found any problem until now.

In the hassio_dns docker, that contains the coredns server, there is a template file with the configuration of the coredns. I’ve modified that:

https://github.com/home-assistant/plugin-dns/blob/b3827bb1c79d84ef72153d7fac91157e17592c73/rootfs/usr/share/tempio/corefile#L11-L13 adding my local domain (piminet) at the end:

template ANY AAAA local.hass.io hassio piminet {
        rcode NOERROR
}

In this way it returns NOERROR in place of NXDOMAIN and now it works and resolves the local domains without problem, at least in my case 😃 .

If this solution is ok, maybe some real Home Assistant developer can add a new option to the ha dns CLI command to add the local domain here automatically.

Zixim commented 3 years ago

on next update, your edits will get wiped out

McGiverGim commented 3 years ago

I know. This is the reason why I ask to include it in the base by a real Home Assistant developer, if the solution is acceptable and does not break other things.

stale[bot] commented 3 years ago

This issue has been automatically marked as stale because it has not had recent activity. It will be closed if no further activity occurs. Thank you for your contributions.

Zixim commented 3 years ago

not solved. Need developer input.

jherby2k commented 3 years ago

Bumping so this doesn't go stale. Surprised this isn't getting more traction!

adub08 commented 3 years ago

I'm still getting this issue too, its been ongoing for quite some time and is things for me now

alexdelprete commented 3 years ago

I really can't understand why devs refuse to take a serious look at this, unfortunately I haven't been able to read any of their explanations or feedback regarding this. It's a real shame, I love HA, but this coredns plugin is absolutely pathetic, a really bad design for an important automation solution like HA.

officiallybob commented 3 years ago

It is getting pretty old to keep playing around with ha dns restart at the right times to get HA to load and connect to the DB server properly. Would love to see some action on this.

Zixim commented 3 years ago

well...keep mentioning these :

@pvizeli for closing this issue with this gem of a comment : https://github.com/home-assistant/plugin-dns/issues/6#issuecomment-727186715 and @balloob , just for good measure & for allowing ivory tower development to happen.

And maybe something might happen. Do not hold your breath though, it's been broken since well before Oct2020.

balloob commented 3 years ago

I've temporarily blocked Zixim because he keeps tagging Home Assistant developers across both GitHub and the community forums. Online harassment campaigns are a violation of our CLA.

If people know of a fix, feel free to work on it and open a PR.

danielbrunt57 commented 3 years ago

I had trouble with this back when I was trying to configure a REST sensor. I managed to get it working via adding trusted_networks to my config, using http://192.168.1.104:8123 and also installing ADGuard to hold my static IP entries. That was a while ago so I think that is all I did to get the REST sensor working. The issue seemed to be with using my TELUS router for DNS where I had static entries defined and I recall there being an error related to DNSSEC which TELUS does not support. The other issue was I needed to use https://xxx.duckdns.org for internal URL and make it resolve to the local IP of HA which I think ADGuard allowed me to pull off.

Last week I managed to deploy PowerDNS on my Debian box which is far more robust and reliable than the router's DNS. I've removed ADGuard and reconfigured HA's DNS via Supervisor->System->Host panel. But I was now again having issues resolving DNS; i.e. Alexa Media Player (and I'm not sure what else) was broken, until I discovered the ha dns command which revealed this:

➜  / ha dns info
host: 172.30.32.3
locals:
- dns://192.168.1.103
servers:
- dns://192.168.1.254
update_available: false
version: 2021.06.0
version_latest: 2021.06.0

The DNS entry I changed in the GUI seems to only change locals: as servers: was still pointing to my router where DNSSEC is not supported. Once I changed servers: to point to my new PowerDNS server w/DNSSEC support, everything is now resolved and working.

➜  / ha dns options --servers dns://192.168.1.103
➜  / ha dns info
host: 172.30.32.3
locals:
- dns://192.168.1.103
servers:
- dns://192.168.1.103

This was my solution but I think HA should allow you to manipulate the locals: AND servers: entries from the GUI (more transparency) and also have a tick box to be able to disable DNSSEC for each so you can use an internal DNS server on your network that does not support DNSSEC.

danielbrunt57 commented 3 years ago

I’ve found a workaround modifying the coredns template… I don’t know if this can have some drawbacks, but it seems to work and I have not found any problem until now.

In the hassio_dns docker, that contains the coredns server, there is a template file with the configuration of the coredns.

I just had a look-see with portainer and my hassio_dns image is not used... image

blalor commented 3 years ago

I've temporarily blocked Zixim because he keeps tagging Home Assistant developers across both GitHub and the community forums. Online harassment campaigns are a violation of our CLA.

If people know of a fix, feel free to work on it and open a PR.

Seems like a bunch of people have been working around the issue by modifying coredns config. How about exploring that as an official fix?

jjvandenberg commented 3 years ago

@danielbrunt57 : it shows "Unused" because it is hidden from the Settings/Hidden Containers menu. Unhide it and it will be used. Guess it's bug in Portainer.

jjvandenberg commented 3 years ago

I've temporarily blocked Zixim because he keeps tagging Home Assistant developers across both GitHub and the community forums. Online harassment campaigns are a violation of our CLA.

If people know of a fix, feel free to work on it and open a PR.

Seems like a bunch of people have been working around the issue by modifying coredns config. How about exploring that as an official fix?

I think people that are not so into the technicalities of this DNS stuff trust the developers to fix it. HomeAssistant is such a actively maintained project, why is the DNS part so neglected ?

pops106 commented 3 years ago

Don't know if this helps but I have had this problem for a few days wanting to move to DNS entries for my entities in case IP changes on devices.

On my router I had set primary DNS to HA IP which is running AdGuard and secondary IP to 8.8.8.8 in case HA was down.

In AdGuard I created DNS rewrites for kitchenlights.home.local for example and my windows PC's appeared to be working fine resolving addresses but HA through SSH couldn't.

doing dig kitchenlights.home.local @192.168.5.1 which is the router address worked ping kitchednlights.home.local fails with bad address

I just realised on my windows PC if I ping kitchenlights.home.local it works, then flushdns ping again it fails, flushdns it works so it looks like my router is roundrobin across the primary and secondary DNS servers or load balancing or sending to both not sure.

As soon as I took the secondary DNS out everything now works including HA, I can now ping kitchenlights.home.local without any problems.

So my guess is HA DNS is super sensitive to any resolution issue and starts forwarding to the hard encoded ones.

lialosiu commented 2 years ago

image

I just block the port 853, and what the f...... Why hassio_dns try to get dns query so many times?

And I even cant change this config? when I change

    forward . dns://10.1.1.1 dns://10.1.1.1 dns://127.0.0.1:5553 {
        except local.hass.io
        policy sequential
        health_check 1m
    }
    fallback REFUSED,SERVFAIL,NXDOMAIN . dns://127.0.0.1:5553

to

    forward . dns://10.1.1.1 {
        except local.hass.io
        policy sequential
        health_check 1m
    }
    fallback REFUSED,SERVFAIL,NXDOMAIN . dns://10.1.1.1

and reboot, everything got reset. This is SO NOISY.

gjdoornink commented 2 years ago

Hello,

I have just submitted pull request #55 that should fix the intermittent DNS lookup failures for hosts in the local network. The pull request ensures the DNS requests are only forwarded to dns://127.0.0.1:5553 if no DNS servers were specified through a DHCP server or the local configuration. This of course assumes that the specified DNS servers are able to forward DNS lookups to external servers if needed, which should typically be the case.

Kind regards,

alexdelprete commented 2 years ago

The pull request ensures the DNS requests are only forwarded to dns://127.0.0.1:5553 if no DNS servers were specified through a DHCP server or the local configuration.

Thanks a lot for this fix G.J. hope it gets reviewed and approved as soon as possible.

fenichelar commented 2 years ago

@gjdoornink 👍

I was planning to test the DNS config that would be generated by your PR, but I can't find the corefile on the filesystem anywhere. Do you know where it gets saved?

alexdelprete commented 2 years ago

Do you know where it gets saved?

From the PR: image

That would be: /usr/share/tempio/corefile

fenichelar commented 2 years ago

@alexdelprete That is what I was expecting but corefile is not in that directory on my machine:

$ ls -la /usr/share/tempio/
total 16
drwxr-xr-x    2 root     root          4096 May 26 09:23 .
drwxr-xr-x    1 root     root          4096 May 26 09:23 ..
-rw-r--r--    1 root     root            99 May 26 09:23 homeassistant.profile
-rw-r--r--    1 root     root           500 May 26 09:23 sshd_config

image

alexdelprete commented 2 years ago

It's in the dns container (hassio_dns):

docker exec -it hassio_dns /bin/bash

fenichelar commented 2 years ago

@alexdelprete Docker is not in the path and I am not seeing the executable anywhere. What am I missing?

alexdelprete commented 2 years ago

are you accessing the host server? I access my HassOS host via SSH and I have access to docker.

alexdelprete commented 2 years ago

Or you can use the portainer addon to access the dns container:

image

fenichelar commented 2 years ago

@alexdelprete Yes, I am SSHing to the host:

$ ssh -i ~/.ssh/hassio_rsa root@10.1.0.101

| |  | |                          /\           (_)   | |            | |
| |__| | ___  _ __ ___   ___     /  \   ___ ___ _ ___| |_ __ _ _ __ | |_
|  __  |/ _ \| '_ \ _ \ / _ \   / /\ \ / __/ __| / __| __/ _\ | '_ \| __|
| |  | | (_) | | | | | |  __/  / ____ \\__ \__ \ \__ \ || (_| | | | | |_
|_|  |_|\___/|_| |_| |_|\___| /_/    \_\___/___/_|___/\__\__,_|_| |_|\__|

Welcome to the Home Assistant command line.

System information
  IPv4 addresses for eno1:  10.1.0.101/16
  IPv4 addresses for wlp1s0:

  OS Version:               Home Assistant OS 6.4
  Home Assistant Core:      2021.9.7

  Home Assistant URL:       http://hassio.local:8123
  Observer URL:             http://hassio.local:4357

I installed portainer.io and am able to access the container. Good suggestion!

Where is the docker executable located on your system? which docker

alexdelprete commented 2 years ago

Yes, I am SSHing to the host:

No, you're accessing the core container. What are you using, the terminal addon?

You need SSH access to the OS, not to one of the containers. I used this document to enable SSH to the OS: https://github.com/home-assistant/operating-system/blob/dev/Documentation/configuration.md

fenichelar commented 2 years ago

@alexdelprete Got it! Thank you!!

alexdelprete commented 2 years ago

@alexdelprete Got it! Thank you!!

No problem. :)

there's also this guide: https://developers.home-assistant.io/docs/operating-system/debugging/

And there's an addon (but I prefer to use the "official" debugging guide solution): https://community.home-assistant.io/t/add-on-hassos-ssh-port-22222-configurator/264109