mirage / ocaml-dns

OCaml implementation of the DNS protocol
BSD 2-Clause "Simplified" License
105 stars 43 forks source link

Bad middle step about DNS resolver on Mirage when we ask a domain-name #255

Closed dinosaure closed 3 years ago

dinosaure commented 3 years ago

For some domain-name (so it's difficult to reproduce), if we deploy a DNS resolver with MirageOS (hvt), the resolver does not give to us a response the first time but it does the second time. Let's deploy a DNS resolver on 10.0.0.2, then with this code:

module Random = Mirage_crypto_rng_mirage.Make (OS.Time)(Mclock)
module DNS = Dns_client_mirage.Make (Random)(OS.Time)(Mclock)(Tcpip_stack_socket.V4V6)

let run domain_name =
  let open Lwt.Infix in
  let domain_name = Domain_name.(host_exn (of_string_exn domain_name)) in
  Random.initialize (module Mirage_crypto_rng.Fortuna) >>= fun () ->
  Tcpv4v6_socket.connect ~ipv4_only:false ~ipv6_only:false (Ipaddr.V4.Prefix.of_string_exn "0.0.0.0/0") None >>= fun tcpv4v6 ->
  Udpv4v6_socket.connect ~ipv4_only:false ~ipv6_only:false (Ipaddr.V4.Prefix.of_string_exn "0.0.0.0/0") None >>= fun udpv4v6 ->
  Tcpip_stack_socket.V4V6.connect udpv4v6 tcpv4v6 >>= fun socket ->
  let dns = DNS.create ~nameserver:(`TCP, (Ipaddr.of_string_exn "10.0.0.2", 53)) socket in
  DNS.gethostbyname dns domain_name >>= function
  | Ok ipv4 -> Fmt.pr "%a\n%!" Ipaddr.V4.pp ipv4 ; Lwt.return_unit
  | Error _ -> Fmt.pr "not found\n%!" ; Lwt.return_unit

let () = Lwt_main.run (run Sys.argv.(1))

(*
(executable
 (name gethostbyname)
 (libraries dns-client.mirage mirage-unix mirage-crypto-rng-mirage tcpip.stack-socket mirage-clock-unix))
*)

We can do:

$ dune exec ./gethostbyname -- irc.libera.chat
not found
$ dune exec ./gethostbyname -- irc.libera.chat
103.196.36.38

It seems that the first time, the resolver gives to us which authority handles irc.libera.chat and put it into the cache. The second time, it found the authority into the cache and ask to it the A record and return it. I'm not sure that it's the correct behavior (and may be it is) but I tried a different configuration (like the DNS resolver and the DNS stub-resolver which points to our resolver) and the result is worse (not found response from the stub-resolver in any cases).

dinosaure commented 3 years ago

To help a bit about the situation, it seems that we got a "question mismatch" when a first call to irc.libera.chat gives to us a response about tia.ns.cloudfare.com at the end. If we ask to bc.root-servers.net/192.33.4.12 (as the resolver does), we got:

$ dig A irc.libera.chat @192.33.4.12
; <<>> DiG 9.11.5-P4-5.1+deb10u5-Debian <<>> A irc.libera.chat @192.33.4.12
;; global options: +cmd
;; Got answer:
;; ->>HEADER<<- opcode: QUERY, status: NOERROR, id: 55596
;; flags: qr rd; QUERY: 1, ANSWER: 0, AUTHORITY: 4, ADDITIONAL: 9
;; WARNING: recursion requested but not available

;; OPT PSEUDOSECTION:
; EDNS: version: 0, flags:; udp: 1232
; COOKIE: fbdf1da68f065070e9890a2c60d8597c439809090fb1c01e (good)
;; QUESTION SECTION:
;irc.libera.chat.       IN  A

;; AUTHORITY SECTION:
chat.           172800  IN  NS  demand.alpha.aridns.net.au.
chat.           172800  IN  NS  demand.gamma.aridns.net.au.
chat.           172800  IN  NS  demand.delta.aridns.net.au.
chat.           172800  IN  NS  demand.beta.aridns.net.au.

;; ADDITIONAL SECTION:
demand.beta.aridns.net.au. 172800 IN    A   37.209.194.7
demand.alpha.aridns.net.au. 172800 IN   A   37.209.192.7
demand.delta.aridns.net.au. 172800 IN   A   37.209.198.7
demand.gamma.aridns.net.au. 172800 IN   A   37.209.196.7
demand.beta.aridns.net.au. 172800 IN    AAAA    2001:dcd:2::7
demand.alpha.aridns.net.au. 172800 IN   AAAA    2001:dcd:1::7
demand.delta.aridns.net.au. 172800 IN   AAAA    2001:dcd:4::7
demand.gamma.aridns.net.au. 172800 IN   AAAA    2001:dcd:3::7

;; Query time: 2 msec
;; SERVER: 192.33.4.12#53(192.33.4.12)
;; WHEN: Sun Jun 27 12:57:00 CEST 2021
;; MSG SIZE  rcvd: 368

So, logically we will ask then to one of this server where is irc.libera.chat. For instance, we took demand.delta.aridns.net.au/37.209.198.7:

$ dig A irc.libera.chat @37.209.198.7
; <<>> DiG 9.11.5-P4-5.1+deb10u5-Debian <<>> A irc.libera.chat @37.209.198.7
;; global options: +cmd
;; Got answer:
;; ->>HEADER<<- opcode: QUERY, status: NOERROR, id: 35229
;; flags: qr rd; QUERY: 1, ANSWER: 0, AUTHORITY: 2, ADDITIONAL: 1
;; WARNING: recursion requested but not available

;; OPT PSEUDOSECTION:
; EDNS: version: 0, flags:; udp: 4096
;; QUESTION SECTION:
;irc.libera.chat.       IN  A

;; AUTHORITY SECTION:
libera.chat.        86400   IN  NS  ricardo.ns.cloudflare.com.
libera.chat.        86400   IN  NS  tia.ns.cloudflare.com.

;; Query time: 2 msec
;; SERVER: 37.209.198.7#53(37.209.198.7)
;; WHEN: Sun Jun 27 12:59:46 CEST 2021
;; MSG SIZE  rcvd: 101

And this is where we find tia.ns.cloudfare.com. I think, in such situation, the resolver don't know yet where is tia.ns.cloudfare.com and returns that to our gethostbyname.ml. This is where dig is a bit clever and re-ask then where is tia.ns.cloudfare.com to the resolver (see, for instance, the first 2 lines):

$ dig A irc.libera.chat @10.0.0.2
;; Question section mismatch: got ricardo.ns.cloudflare.com/A/IN
;; Question section mismatch: got tia.ns.cloudflare.com/A/IN

; <<>> DiG 9.11.5-P4-5.1+deb10u5-Debian <<>> A irc.libera.chat @10.0.0.10
;; global options: +cmd
;; Got answer:
;; ->>HEADER<<- opcode: QUERY, status: NOERROR, id: 9036
;; flags: qr rd; QUERY: 1, ANSWER: 3, AUTHORITY: 0, ADDITIONAL: 1
;; WARNING: recursion requested but not available

;; OPT PSEUDOSECTION:
; EDNS: version: 0, flags:; udp: 512
;; QUESTION SECTION:
;irc.libera.chat.       IN  A

;; ANSWER SECTION:
irc.libera.chat.    120 IN  A   93.158.239.37
irc.libera.chat.    120 IN  A   163.123.192.192
irc.libera.chat.    120 IN  A   176.58.122.119

;; Query time: 4 msec
;; SERVER: 10.0.0.10#53(10.0.0.10)
;; WHEN: Sun Jun 27 13:02:45 CEST 2021
;; MSG SIZE  rcvd: 92

So I'm not sure that it's a good behavior or not - may be the resolver should try to resolve tia.ns.cloudfare.com instead of to response a question. Or may be gethostbyname.ml should handle correctly such case and re-ask to the resolver if the question mismatch recursively. In any way, it will help of lot to solve such pattern for me to be able to safely launch a unikernel to IRC ☺️ !

hannesm commented 3 years ago

thanks for your report. indeed the resolver implementation is at the moment not very useful (it used to be in a better shape) -- and it uses the wrong query to answer. will need some further investigation and a test case to fix this. I'll hopefully get around it in the upcoming week.

dinosaure commented 3 years ago

Thanks, I will try to play on my side with the new version and give a feedback :+1:.

dinosaure commented 3 years ago

I can confirm that the issue is fixed :+1:.