ostreedev / ostree

Operating system and container binary deployment and upgrades
https://ostreedev.github.io/ostree/
Other
1.31k stars 302 forks source link

ostree pull seems to do a lot of dns lookups #894

Open alexlarsson opened 7 years ago

alexlarsson commented 7 years ago

I've seen a lot of reports of DNS issues during flatpak install. It seems that sometimes one of the dns servers for gnome.org has a hiccup, which tends to abort the entire install.

Example user report

• [ironman:~] bratner $ flatpak install --user http://flatpak.pitivi.org/pitivi.flatpakref
This application depends on runtimes from:
  http://sdk.gnome.org/repo/
Configure this as new remote 'gnome' [y/n]: y
Installing: org.pitivi.Pitivi/x86_64/stable
Required runtime for org.pitivi.Pitivi/x86_64/stable (org.gnome.Platform/x86_64/3.22) is not installed, searching...
Found in remote gnome, do you want to install it? [y/n]: y
Installing: org.gnome.Platform/x86_64/3.22 from gnome
[#=                  ] Downloading: 0 bytes/183.1 MB (0 bytes/s)               
error: While pulling runtime/org.gnome.Platform/x86_64/3.22 from remote gnome: Error resolving 'sdk.gnome.org': Name or service not known
• [ironman:~] bratner $ ping sdk.gnome.org
PING sdk.gnome.org (209.132.180.169) 56(84) bytes of data.
64 bytes from sdk.gnome.org (209.132.180.169): icmp_seq=1 ttl=47 time=235 ms
64 bytes from sdk.gnome.org (209.132.180.169): icmp_seq=2 ttl=47 time=236 ms
^C
--- sdk.gnome.org ping statistics ---
2 packets transmitted, 2 received, 0% packet loss, time 1001ms
rtt min/avg/max/mdev = 235.710/235.972/236.235/0.552 ms
• [ironman:~] bratner $ dig sdk.gnome.org

; <<>> DiG 9.10.3-P4-Ubuntu <<>> sdk.gnome.org
;; global options: +cmd
;; Got answer:
;; ->>HEADER<<- opcode: QUERY, status: NOERROR, id: 1259
;; flags: qr ra; QUERY: 1, ANSWER: 1, AUTHORITY: 0, ADDITIONAL: 1

;; OPT PSEUDOSECTION:
; EDNS: version: 0, flags:; udp: 4096
;; QUESTION SECTION:
;sdk.gnome.org.         IN  A

;; ADDITIONAL SECTION:
sdk.gnome.org.      772 IN  A   209.132.180.169

;; Query time: 1 msec
;; SERVER: 127.0.1.1#53(127.0.1.1)
;; WHEN: Tue May 30 11:37:50 IDT 2017
;; MSG SIZE  rcvd: 58

• [ironman:~] bratner $ cat /etc/resolv.conf 
 # Dynamic resolv.conf(5) file for glibc resolver(3) generated by resolvconf(8)
 #     DO NOT EDIT THIS FILE BY HAND -- YOUR CHANGES WILL BE OVERWRITTEN
nameserver 127.0.1.1
search Home
• [ironman:~] bratner $ sudo vi /etc/resolv.conf 
• [ironman:~] bratner $ flatpak install --user http://flatpak.pitivi.org/pitivi.flatpakref
Installing: org.pitivi.Pitivi/x86_64/stable
Required runtime for org.pitivi.Pitivi/x86_64/stable (org.gnome.Platform/x86_64/3.22) is not installed, searching...
Found in remote gnome, do you want to install it? [y/n]: y
Installing: org.gnome.Platform/x86_64/3.22 from gnome
[####################] 9 delta parts, 73 loose fetched; 178829 KiB transferred in 68 seconds
Installing: org.gnome.Platform.Locale/x86_64/3.22 from gnome
[####################] 3 metadata, 1 content objects fetched; 13 KiB transferred in 4 seconds
Installing: org.pitivi.Pitivi/x86_64/stable from org.pitivi.Pitivi-1-origin
[####################] 6 delta parts, 34 loose fetched; 117805 KiB transferred in 35 seconds
• [ironman:~] bratner $ cat /etc/resolv.conf 
# Dynamic resolv.conf(5) file for glibc resolver(3) generated by resolvconf(8)
#     DO NOT EDIT THIS FILE BY HAND -- YOUR CHANGES WILL BE OVERWRITTEN
nameserver 8.8.8.8
nameserver 127.0.1.1
search Home

I wonder if we're doing something wrong wrt name resolution? Shouldn't glibc cache the resolve from the first lookup and reuse for each pull? Do we need to do the resolve once and then manually specify the IP for each request in the pull?

dustymabe commented 7 years ago

+1 - i've definitely seen issues where a failure to resolve a name during the middle of a pull results in the pull being aborted.

cgwalters commented 7 years ago

I don't believe glibc does any caching unless nscd is enabled, and nscd has lots of issues. I suggest NetworkManager with dns=dnsmasq.

cgwalters commented 7 years ago

There's also systemd-resolved.

alexlarsson commented 7 years ago

@cgwalters Don't you think it makes sense to only do a dns resolve once per pull operation though? I mean that way you'd e.g. always hit the same mirror instance for the entire operation, and also avoid a lot of dns requests.

alexlarsson commented 7 years ago

@cgwalters I mean, in the non-static-delta case, are we really doing one DNS resolve operation for each object?

cgwalters commented 7 years ago

Well, HTTP keepalives should really obviate most DNS issues except for initial setup. It looks like pitivi.org does keepalives.

I'm uncertain about caching DNS in ostree explicitly...feels like it's more libsoup/libcurl or the system's job; and actually one thing we likely want to enable is higher level software like gnome-software be able to use GNetworkMonitor to dynamically watch for repos to become available. Doing something like that would end up implicitly caching DNS at a higher level.

alexlarsson commented 7 years ago

That all sounds fine in theory, but people show up all the time with these dns issues.

cgwalters commented 7 years ago

I'm not saying there's no problem but...oh hm interesting, Firefox caches DNS. (I was about to argue that one really wanted a system-wide cache for web browsing etc. - AIUI e.g. Windows has a system-wide cache)

One thing here is that currently we do use multiple connections - so even if we're using keepalives, I suspect in the middle of a pull at least with libsoup we may end up failing if DNS transiently fails during a pull for the same server, even if we have an open connection.

alexlarsson commented 7 years ago

I think what happens is that gnome has multiple dns servers, but generally only one is borked. So, if we resolve once we'll either fail the entire pull, or succeed it. Whereas without caching we're pretty much guaranteed to hit the borked one.

Also, i'm not sure we want a generic cache, but rather something that is part of OtPullData. I.e. one resolve per pull operation.

alexlarsson commented 7 years ago

Additionally, it just seems safer in terms of things like round-robin dns mirroring to use the same server for the entire pull operation.