m13253 / dns-over-https

High performance DNS over HTTPS client & server
https://developers.google.com/speed/public-dns/docs/dns-over-https
MIT License
1.96k stars 221 forks source link

DOH Client stops responding after internet access disruption #146

Closed buckaroogeek closed 1 year ago

buckaroogeek commented 1 year ago

Greetings

I am not sure if this is a problem.

My network uses the DOH client along with a pihole for DNS. Client machines use port 53 udp/tcp for DNS from the pihole. The pihole uses the DOH client for DNS requests. The DOH client then uses Google DOH servers. The DOH client runs in a container on docker on a macvlan with it's own unique network IP. As does the pihole.

The network has a pfsense router/firewall. I did an OS update for the pfsense which, in turn, stopped internet access for about 2 minutes. After internet access was restored I noticed DNS problems. When I tried dig using the DOH client as source, I got a "no response from server". I restarted the DOH client and dig returned a normal and successful result.

I applied a second update to the pfsense a day later, again blocking internet access for about 2 minutes. I had the same issue with the DOH client. A simple docker container restart solved the problem.

To be fair I did not wait more than a couple of minutes after internet access was restored before restarting the DOH client container. It is possible that normal function would have resumed at some point but I did not want to wait and find out.

So I am not sure if this is a problem or expected behaviour. My expectation was that the DOH client would resume connectivity with the Google DOH servers without container restart. But perhaps I did not wait long enough. In any case, I thought I would let you know.

best regards

m13253 commented 1 year ago

Yes, this is a known problem.

The issue mainly lies on the connection reusing (multiplexing) mechanism in Go’s HTTP2 library. Go does not know the old connection is already unusable, and still tries to reuse it to send requests.

I have tried very hard to workaround this issue in various ways, for example, creating a new connection pool whenever a timeout is detected. I could only get it a little bit better, not completely perfect.

I also added a NetworkManager trigger script to automatically restart doh-client whenever there is a network change. However, this only works if the system is using NetworkManager, and the disconnection is at the NIC level.

Currently, I can’t think of a better solution… I will appreciate it if you can provide any.

buckaroogeek commented 1 year ago

Thank you for the prompt and very informative response. I will take a look and see if I can contribute - not really a go programmer per se. I will close and potentially submit a PR.