ziglang / zig

General-purpose programming language and toolchain for maintaining robust, optimal, and reusable software.
https://ziglang.org
MIT License
33.78k stars 2.47k forks source link

error: TemporaryNameServerFailure when using package management on Termux #14636

Open leap0x7b opened 1 year ago

leap0x7b commented 1 year ago

Zig Version

0.11.0-dev.1606+3c2a43fdc

Steps to Reproduce and Observed Behavior

  1. Create a Zig project using either zig init-lib or zig init-exe
  2. Create a build.zig.zon file:
    .{
    .name = "viisi",
    .description = "A RISC-V hobby computer inspired by old 80s/90s UNIX workstations",
    .version = "0.1.0",
    .dependencies = .{
        .clap = .{
            .url = "https://github.com/Hejsil/zig-clap/archive/272d8e2088b2cae037349fb260dc05ec46bba422.tar.gz",
        },
    },
    }
  3. Run zig build

Expected Behavior

It should be able to resolve github.com and download the file.

hryx commented 1 year ago

I'm also getting this on Arch Linux with systemd-networkd and systemd-resolved. Working just fine on macOS though. Not sure if this helps but:

$ strace --trace=network zig build test
socket(AF_INET, SOCK_DGRAM|SOCK_CLOEXEC|SOCK_NONBLOCK, IPPROTO_IP) = 8
bind(8, {sa_family=AF_INET, sin_port=htons(0), sin_addr=inet_addr("0.0.0.0")}, 16) = 0
sendto(8, "1\334\1\0\0\1\0\0\0\0\0\0\6github\3com\0\0\1\0\1", 28, MSG_NOSIGNAL, {sa_family=AF_INET, sin_port=htons(53), sin_addr=inet_addr("127.0.0.1")}, 16) = 28
sendto(8, "2\245\1\0\0\1\0\0\0\0\0\0\6github\3com\0\0\34\0\1", 28, MSG_NOSIGNAL, {sa_family=AF_INET, sin_port=htons(53), sin_addr=inet_addr("127.0.0.1")}, 16) = 28
sendto(8, "1\334\1\0\0\1\0\0\0\0\0\0\6github\3com\0\0\1\0\1", 28, MSG_NOSIGNAL, {sa_family=AF_INET, sin_port=htons(53), sin_addr=inet_addr("127.0.0.1")}, 16) = 28
sendto(8, "2\245\1\0\0\1\0\0\0\0\0\0\6github\3com\0\0\34\0\1", 28, MSG_NOSIGNAL, {sa_family=AF_INET, sin_port=htons(53), sin_addr=inet_addr("127.0.0.1")}, 16) = 28
error: TemporaryNameServerFailure
+++ exited with 1 +++
andrewrk commented 1 year ago

A related issue, #14900, was solved recently. Did it solve this issue as well?

hryx commented 1 year ago

I see a different error now, error.ConnectionFailed. The strace output looks identical to before. Let me know if there are any better diagnostics I can provide.

leap0x7b commented 1 year ago

Still doesn't work, I also got error.ConnectionFailed as well

hryx commented 1 year ago

I went to debug this by altering std.http.Client, but got a strange result before I even changed any code.

I checked out 602029bb2 (commit of latest release at the time of this test), build stage3 (Release) and stage4 (Debug), and ran zig build on a project with dependencies, making sure to clear the global package cache between runs. Both of those built zig executables successfully fetched deps.

But I downloaded the official release with the same commit hash and got error.ConnectionFailed as before. I would run the official build through a debugger but of course it is stripped, so I'm trying to think of a next step.

mikdusan commented 1 year ago

@hryx,

just to rule it out, does this run without error?

const std = @import("std");

pub fn main() !void {
    var general_purpose_allocator = std.heap.GeneralPurposeAllocator(.{}){};
    const gpa = general_purpose_allocator.allocator();

    var http_client: std.http.Client = .{ .allocator = gpa };
    defer http_client.deinit();

    const uri = try std.Uri.parse("http://github.com");
    var req = try http_client.request(uri, .{}, .{});
    defer req.deinit();
}
hryx commented 1 year ago

@mikdusan It does not! Good idea to test by using an HTTP client directly.

Error output:

``` error: ConnectionFailed /home/hryx/lib/std/net.zig:45:9: 0x46a95a in parseIp (main) return error.InvalidIPAddressFormat; ^ /home/hryx/lib/std/net.zig:75:29: 0x46a549 in parseExpectingFamily (main) os.AF.UNSPEC => return parseIp(name, port), ^ /home/hryx/lib/std/net.zig:1414:48: 0x476584 in linuxLookupNameFromDns (main) if (ap[0].len < 4 or (ap[0][3] & 15) == 2) return error.TemporaryNameServerFailure; ^ /home/hryx/lib/std/net.zig:1358:5: 0x4778ec in linuxLookupNameFromDnsSearch (main) return linuxLookupNameFromDns(addrs, canon, name, family, rc, port); ^ /home/hryx/lib/std/net.zig:996:17: 0x4785e7 in linuxLookupName (main) try linuxLookupNameFromDnsSearch(addrs, canon, name, family, port); ^ /home/hryx/lib/std/net.zig:933:9: 0x428661 in getAddressList (main) try linuxLookupName(&lookup_addrs, &canon, name, family, flags, port); ^ /home/hryx/lib/std/net.zig:709:18: 0x35685f in tcpConnectToHost (main) const list = try getAddressList(allocator, name, port); ^ /home/hryx/lib/std/http/Client.zig:896:9: 0x3209ac in connect (main) return error.ConnectionFailed; ^ /home/hryx/lib/std/http/Client.zig:994:23: 0x30f194 in request (main) .connection = try client.connect(host, port, protocol), ^ /home/hryx/tmp/issue14636/main.zig:11:15: 0x30e942 in main (main) var req = try http_client.request(uri, .{}, .{}); ^ ```

I added a log right before the error return in linuxLookupNameFromDns with the values of ap:

info: ap0: 7c8f850200010000000000000667697468756203636f6d0000010001 = githubcom
info: ap1: 7f20850200010000000000000667697468756203636f6d00001c0001 = Tgithubcom
hryx commented 1 year ago

I fixed the issue on my machine. I'll report what I found in case it helps @leap0x7b or others, but I'm left to speculate about the original root cause.

The issue was mundane: my /etc/resolv.conf was supposed to be a symlink to /run/systemd/resolve/stub-resolv.conf, but it was a plain file. When systemd-resolved is used, it needs to be a link to the stub file. The solution was to remove /etc/resolv.conf and make it a link, but it could probably also be solved by reinstalling the relevant package.

My guess is that my system got into a bad state long ago but didn't show symptoms until the Zig HTTP client tried to resolve DNS by reading resolv.conf directly. It was an empty file (except for a comment), so name resolution obviously failed. I don't understand why every other program was still able to use DNS — maybe glibc falls back to discovering a local nameserver, while musl doesn't. In fact, Zig built locally with glibc was able to make these HTTP requests, even with my previously bad system state.

As for how my system got into that state, who knows, but I have had at least one service that modifies /etc/resolv.conf, and it could have been a since-fixed bug in the service or even package install script. I wasn't able to reproduce the issue after reinstalling Tailscale and NetworkManager, for example.

Anyway, no Zig bug here as far as I am concerned.

pfgithub commented 1 year ago

On Android (termux), /etc/resolv.conf doesn't exist

const std = @import("std");

pub fn main() !void {
    var general_purpose_allocator = std.heap.GeneralPurposeAllocator(.{}){};
    const gpa = general_purpose_allocator.allocator();

    var http_client: std.http.Client = .{ .allocator = gpa };
    defer http_client.deinit();

    const uri = try std.Uri.parse("http://github.com");
    var req = try http_client.request(.GET, uri, .{
        .allocator = gpa,
    }, .{});
    defer req.deinit();
}
openat(AT_FDCWD, "/etc/resolv.conf", O_RDONLY|O_NOCTTY|O_LARGEFILE|O_CLOEXEC) = -1 ENOENT (No such file or directory)

wget and other programs work fine

$PREFIX/etc/resolv.conf does exist (/data/data/com.termux/files/usr/etc/resolv.conf)

Packages in the termux package manager have to patch this folder it seems: https://github.com/termux/termux-packages/pull/14738/files

Applying these patches to std/net.zig appears to fix the issue for using http_client.request, but zig would have to be recompiled with these changes to fix zig build:

-   const file = fs.openFileAbsoluteZ("/etc/hosts", .{}) catch |err| switch (err) {
+   const file = fs.openFileAbsoluteZ("/data/data/com.termux/files/usr/etc/hosts", .{}) catch |err| switch (err) {
-   const file = fs.openFileAbsoluteZ("/etc/resolv.conf", .{}) catch |err| switch (err) {
+   const file = fs.openFileAbsoluteZ("/data/data/com.termux/files/usr/etc/resolv.conf", .{}) catch |err| switch (err) {

I'm not sure what to do about this?

Some possible solutions:

truemedian commented 1 year ago

Until the problem is solved in zig, the likely temporary solution seems to be laid out in the https://wiki.termux.com/wiki/Differences_from_Linux that you linked. You can use termux-chroot from proot to get back FHS compliance for now.

floooh commented 8 months ago

+1, I'm also hitting this problem on my Ubuntu laptop. For some reason, specifically github.com takes 10 seconds to resolve when pinging (github.io is fast). Git operations are also delayed by those 10 seconds but eventually work. The Zig package manager however fails with TemporaryNameServerFailure.

(PS: zig 0.12.0-dev.2236+32e88251e)

PS: it works after changing DNS servers from my router to 8.8.8.8 and 8.8.4.4 (make sure that systemd actual sees those changes via resolvectl status. I had to log out and in of the desktop session after changing the DNS settings in the KDE control panel.

PPS: FWIW, on my Mac it's always fast, no matter if my router is used for DNS or the Google DNS servers.

JerwuQu commented 7 months ago

Great stuff @pfgithub!

Having looked at #14146 a bit, I also got interested in wanting to get this fixed. Unfortunately I don't have an awesome idea on how to solve it after the $PREFIX idea was rejected for that one.

I think adding an --dns x.y.z.w argument for zig fetch and zig build makes the most sense, and it should be okay in practice as well since it should only be required when (re)fetching dependencies.

There could also be fallback nameservers for popular providers like 1.1.1.1 or 8.8.8.8 but I'm not as hyped on that idea since it could bypass what the user actually wants, for example if they have a resolv.conf elsewhere (like in the case of Termux.)

pfgithub commented 7 months ago

@JerwuQu Wouldn't this mean any program using zig http would need to have its own --dns argument? This is a problem with the HTTP client in the standard library, not just zig fetch and zig build

JerwuQu commented 7 months ago

@pfgithub That is very true. I honestly didn't consider that at first, only thinking about the zig build angle.

In that case I feel more strongly that an environment variable for specifying DNS servers (or an alternative resolv.conf path) is needed. I think specifying nameservers is better than pointing to an alternative resolv.conf path since it then turns into feature that could also be used by FHS-systems rather than a workaround for non-FHS-systems. This could be in the same category as the much prevalent HTTP_PROXY and friends, that I think are also reasonable for the stdlib HTTP client to respect.

I searched quite a bit and can't find any good names that are shared between multiple projects. One project is resolvconf-override which uses NAMESERVER1 and NAMESERVER2. Another project I found is SkyDNS which uses SKYDNS_NAMESERVERS. It seems most large projects respect resolv.conf with an argument override (e.g. cURL). I believe the main reason of this is because there isn't a universally used environment variable already.

I therefore see two options:

The downside here is that package maintainers can't reasonably set this variable for their users, and would still need to patch the stdlib to get their resolv.conf as default. Requiring users of Zig and Zig-made projects to set this envvar just to have networked applications working is not desirable. Should there then be a flag for compilation (same realm as #18778) to set where the system resolv.conf is located?

I would be keen to hear the thoughts of a Zig maintainer on this issue.