Open leap0x7b opened 1 year ago
I'm also getting this on Arch Linux with systemd-networkd and systemd-resolved. Working just fine on macOS though. Not sure if this helps but:
$ strace --trace=network zig build test
socket(AF_INET, SOCK_DGRAM|SOCK_CLOEXEC|SOCK_NONBLOCK, IPPROTO_IP) = 8
bind(8, {sa_family=AF_INET, sin_port=htons(0), sin_addr=inet_addr("0.0.0.0")}, 16) = 0
sendto(8, "1\334\1\0\0\1\0\0\0\0\0\0\6github\3com\0\0\1\0\1", 28, MSG_NOSIGNAL, {sa_family=AF_INET, sin_port=htons(53), sin_addr=inet_addr("127.0.0.1")}, 16) = 28
sendto(8, "2\245\1\0\0\1\0\0\0\0\0\0\6github\3com\0\0\34\0\1", 28, MSG_NOSIGNAL, {sa_family=AF_INET, sin_port=htons(53), sin_addr=inet_addr("127.0.0.1")}, 16) = 28
sendto(8, "1\334\1\0\0\1\0\0\0\0\0\0\6github\3com\0\0\1\0\1", 28, MSG_NOSIGNAL, {sa_family=AF_INET, sin_port=htons(53), sin_addr=inet_addr("127.0.0.1")}, 16) = 28
sendto(8, "2\245\1\0\0\1\0\0\0\0\0\0\6github\3com\0\0\34\0\1", 28, MSG_NOSIGNAL, {sa_family=AF_INET, sin_port=htons(53), sin_addr=inet_addr("127.0.0.1")}, 16) = 28
error: TemporaryNameServerFailure
+++ exited with 1 +++
A related issue, #14900, was solved recently. Did it solve this issue as well?
I see a different error now, error.ConnectionFailed
. The strace output looks identical to before. Let me know if there are any better diagnostics I can provide.
Still doesn't work, I also got error.ConnectionFailed
as well
I went to debug this by altering std.http.Client, but got a strange result before I even changed any code.
I checked out 602029bb2 (commit of latest release at the time of this test), build stage3 (Release) and stage4 (Debug), and ran zig build on a project with dependencies, making sure to clear the global package cache between runs. Both of those built zig executables successfully fetched deps.
But I downloaded the official release with the same commit hash and got error.ConnectionFailed
as before. I would run the official build through a debugger but of course it is stripped, so I'm trying to think of a next step.
@hryx,
just to rule it out, does this run without error?
const std = @import("std");
pub fn main() !void {
var general_purpose_allocator = std.heap.GeneralPurposeAllocator(.{}){};
const gpa = general_purpose_allocator.allocator();
var http_client: std.http.Client = .{ .allocator = gpa };
defer http_client.deinit();
const uri = try std.Uri.parse("http://github.com");
var req = try http_client.request(uri, .{}, .{});
defer req.deinit();
}
@mikdusan It does not! Good idea to test by using an HTTP client directly.
Error output:
I added a log right before the error return in linuxLookupNameFromDns
with the values of ap
:
info: ap0: 7c8f850200010000000000000667697468756203636f6d0000010001 = githubcom
info: ap1: 7f20850200010000000000000667697468756203636f6d00001c0001 = Tgithubcom
I fixed the issue on my machine. I'll report what I found in case it helps @leap0x7b or others, but I'm left to speculate about the original root cause.
The issue was mundane: my /etc/resolv.conf
was supposed to be a symlink to /run/systemd/resolve/stub-resolv.conf
, but it was a plain file. When systemd-resolved is used, it needs to be a link to the stub file. The solution was to remove /etc/resolv.conf
and make it a link, but it could probably also be solved by reinstalling the relevant package.
My guess is that my system got into a bad state long ago but didn't show symptoms until the Zig HTTP client tried to resolve DNS by reading resolv.conf directly. It was an empty file (except for a comment), so name resolution obviously failed. I don't understand why every other program was still able to use DNS — maybe glibc falls back to discovering a local nameserver, while musl doesn't. In fact, Zig built locally with glibc was able to make these HTTP requests, even with my previously bad system state.
As for how my system got into that state, who knows, but I have had at least one service that modifies /etc/resolv.conf
, and it could have been a since-fixed bug in the service or even package install script. I wasn't able to reproduce the issue after reinstalling Tailscale and NetworkManager, for example.
Anyway, no Zig bug here as far as I am concerned.
On Android (termux), /etc/resolv.conf doesn't exist
const std = @import("std");
pub fn main() !void {
var general_purpose_allocator = std.heap.GeneralPurposeAllocator(.{}){};
const gpa = general_purpose_allocator.allocator();
var http_client: std.http.Client = .{ .allocator = gpa };
defer http_client.deinit();
const uri = try std.Uri.parse("http://github.com");
var req = try http_client.request(.GET, uri, .{
.allocator = gpa,
}, .{});
defer req.deinit();
}
openat(AT_FDCWD, "/etc/resolv.conf", O_RDONLY|O_NOCTTY|O_LARGEFILE|O_CLOEXEC) = -1 ENOENT (No such file or directory)
wget
and other programs work fine
$PREFIX/etc/resolv.conf
does exist (/data/data/com.termux/files/usr/etc/resolv.conf
)
Packages in the termux package manager have to patch this folder it seems: https://github.com/termux/termux-packages/pull/14738/files
Applying these patches to std/net.zig appears to fix the issue for using http_client.request, but zig would have to be recompiled with these changes to fix zig build:
- const file = fs.openFileAbsoluteZ("/etc/hosts", .{}) catch |err| switch (err) {
+ const file = fs.openFileAbsoluteZ("/data/data/com.termux/files/usr/etc/hosts", .{}) catch |err| switch (err) {
- const file = fs.openFileAbsoluteZ("/etc/resolv.conf", .{}) catch |err| switch (err) {
+ const file = fs.openFileAbsoluteZ("/data/data/com.termux/files/usr/etc/resolv.conf", .{}) catch |err| switch (err) {
I'm not sure what to do about this?
Some possible solutions:
$PREFIX
and change the paths for /etc/resolv.conf and /etc/hosts
Until the problem is solved in zig, the likely temporary solution seems to be laid out in the https://wiki.termux.com/wiki/Differences_from_Linux that you linked. You can use termux-chroot
from proot
to get back FHS compliance for now.
+1, I'm also hitting this problem on my Ubuntu laptop. For some reason, specifically github.com takes 10 seconds to resolve when pinging (github.io is fast). Git operations are also delayed by those 10 seconds but eventually work. The Zig package manager however fails with TemporaryNameServerFailure.
(PS: zig 0.12.0-dev.2236+32e88251e)
PS: it works after changing DNS servers from my router to 8.8.8.8 and 8.8.4.4 (make sure that systemd actual sees those changes via resolvectl status
. I had to log out and in of the desktop session after changing the DNS settings in the KDE control panel.
PPS: FWIW, on my Mac it's always fast, no matter if my router is used for DNS or the Google DNS servers.
Great stuff @pfgithub!
Having looked at #14146 a bit, I also got interested in wanting to get this fixed.
Unfortunately I don't have an awesome idea on how to solve it after the $PREFIX
idea was rejected for that one.
I think adding an --dns x.y.z.w
argument for zig fetch
and zig build
makes the most sense, and it should be okay in practice as well since it should only be required when (re)fetching dependencies.
There could also be fallback nameservers for popular providers like 1.1.1.1
or 8.8.8.8
but I'm not as hyped on that idea since it could bypass what the user actually wants, for example if they have a resolv.conf
elsewhere (like in the case of Termux.)
@JerwuQu Wouldn't this mean any program using zig http would need to have its own --dns
argument? This is a problem with the HTTP client in the standard library, not just zig fetch
and zig build
@pfgithub That is very true. I honestly didn't consider that at first, only thinking about the zig build
angle.
In that case I feel more strongly that an environment variable for specifying DNS servers (or an alternative resolv.conf
path) is needed. I think specifying nameservers is better than pointing to an alternative resolv.conf
path since it then turns into feature that could also be used by FHS-systems rather than a workaround for non-FHS-systems.
This could be in the same category as the much prevalent HTTP_PROXY
and friends, that I think are also reasonable for the stdlib HTTP client to respect.
I searched quite a bit and can't find any good names that are shared between multiple projects.
One project is resolvconf-override which uses NAMESERVER1
and NAMESERVER2
.
Another project I found is SkyDNS which uses SKYDNS_NAMESERVERS
.
It seems most large projects respect resolv.conf
with an argument override (e.g. cURL). I believe the main reason of this is because there isn't a universally used environment variable already.
I therefore see two options:
NAMESERVERS
or DNS_NAMESERVERS
and let Zig applications set a precedent for other projects. Perhaps in the future we would see this supported as commonly as HTTP_PROXY
.ZIG_NAMESERVERS
to solve the specific Zig case.The downside here is that package maintainers can't reasonably set this variable for their users, and would still need to patch the stdlib to get their resolv.conf
as default. Requiring users of Zig and Zig-made projects to set this envvar just to have networked applications working is not desirable.
Should there then be a flag for compilation (same realm as #18778) to set where the system resolv.conf
is located?
I would be keen to hear the thoughts of a Zig maintainer on this issue.
Zig Version
0.11.0-dev.1606+3c2a43fdc
Steps to Reproduce and Observed Behavior
zig init-lib
orzig init-exe
zig build
Expected Behavior
It should be able to resolve github.com and download the file.