ziglang / zig

General-purpose programming language and toolchain for maintaining robust, optimal, and reusable software.
https://ziglang.org
MIT License
35.18k stars 2.57k forks source link

HTTP client connection pool timeouts causing EndOfStream #21316

Open kassane opened 2 months ago

kassane commented 2 months ago

Zig Version

0.13.0

Steps to Reproduce and Observed Behavior

How to Reproduce?

$ git clone https://github.com/allyourcodebase/boost-libraries-zig
$ cd boost-libraries-zig
$ zig build # optional add --fetch

[!NOTE] Basically, these are 115 packages that list all the includePaths. However, sources are optionally linked.

Output: https://github.com/allyourcodebase/boost-libraries-zig/actions/runs/10722077115/job/29732148313

zig build --build-file $PWD/tests/build.zig --summary all -freference-trace -Dtarget=native-windows-msvc
D:\a\boost-libraries-zig\boost-libraries-zig/tests\..\build.zig.zon:18:20: error: unable to discover remote git server capabilities: EndOfStream
D:\a\boost-libraries-zig\boost-libraries-zig/tests\..\build.zig.zon:30:20: error: unable to discover remote git server capabilities: EndOfStream
D:\a\boost-libraries-zig\boost-libraries-zig/tests\..\build.zig.zon:38:20: error: unable to discover remote git server capabilities: EndOfStream
D:\a\boost-libraries-zig\boost-libraries-zig/tests\..\build.zig.zon:134:20: error: unable to discover remote git server capabilities: EndOfStream
D:\a\boost-libraries-zig\boost-libraries-zig/tests\..\build.zig.zon:142:20: error: unable to discover remote git server capabilities: EndOfStream
D:\a\boost-libraries-zig\boost-libraries-zig/tests\..\build.zig.zon:150:20: error: unable to discover remote git server capabilities: EndOfStream
D:\a\boost-libraries-zig\boost-libraries-zig/tests\..\build.zig.zon:222:20: error: unable to discover remote git server capabilities: EndOfStream
D:\a\boost-libraries-zig\boost-libraries-zig/tests\..\build.zig.zon:238:20: error: unable to discover remote git server capabilities: EndOfStream
D:\a\boost-libraries-zig\boost-libraries-zig/tests\..\build.zig.zon:262:20: error: unable to discover remote git server capabilities: EndOfStream
Error: Process completed with exit code 1.

Reference

Expected Behavior

No issue.

zadockmaloba commented 1 month ago

Experienced something similar. Happens when fetching nested dependencies.

E.g libpq - has dependencies on openssl, zstd and zlib

When building it as a standalone it fetches its dependencies correctly. But when we add libpq as a dependency in another project it fails to fetch. (on Linux at least, works fine on macOS Sonoma)

image

I have not looked deep into it but I think the issue comes when the url has git+https prefix

zadockmaloba commented 1 month ago

Check on this: git: fatal: Could not read from remote repository

It may actually be an ssh issue and not a zig fetch issue. At least it solved the issue for me

agagniere commented 1 month ago

@zadockmaloba I reproduced your issue here (only on Mac for some reason).

I then realised the difference between zstd and the other dependencies:

OK:

 .url = "git+https://github.com/allyourcodebase/zlib?ref=1.3.1#0918....",

Not OK:

.url = "git+https://github.com/allyourcodebase/zstd.git?ref=1.5.6-1#324a...",

Notice the extra .git !

Removing the .git solved the issue : after

ianprime0509 commented 1 month ago

@zadockmaloba regarding

It may actually be an ssh issue and not a zig fetch issue.

Zig's git+http(s) fetch support does not use SSH or even the git binary in any way, so any Git SSH configuration will not affect the fetch process for Git dependencies.


@agagniere Zig supports fetching both git+https://github.com/allyourcodebase/zstd and git+https://github.com/allyourcodebase/zstd.git; running zig fetch with both URLs will produce the same output. (This is because GitHub accepts both repository paths without redirecting, not because of any special behavior in Zig)


The reason it may seem like certain solutions are working is because this issue is very sporadic and can't be reproduced 100% consistently: even with the GitHub Actions run linked in the issue description, the next attempt of the same run succeeded: https://github.com/allyourcodebase/boost-libraries-zig/actions/runs/10722077115

I tried to investigate this a bit further: I hacked together an ugly patch to get std.crypto.tls.Client to write its keys to an SSLKEYLOGFILE so Wireshark would be able to decrypt the TLS traffic, and I ran zig build --fetch on the provided project a few times until it failed to fetch some of the packages. In all three failed package fetches I've reproduced so far, what happened is the initial /info/refs request (used to discover Git server capabilities) was sent on a reused connection which the server appears to have already attempted to close, e.g.:

image

I'm not at all knowledgeable when it comes to low-level networking stuff, though, so unfortunately this is as far as my investigation has gotten so far.

ianprime0509 commented 1 month ago

I'm not sure how I didn't notice this when I was debugging last night, but the server closing the connection is occurring 30 seconds after the last activity on the connection, so this seems to be a case of the server enforcing an idle timeout on the connection and closing it:

Screenshot from 2024-10-12 15-00-30

But the connection remains in the pool, and is reused for the next request despite being closed. So, in other words, there needs to be some sort of check when acquiring a connection from the pool to make sure that it's still valid to use (and/or enforce an idle timeout within the pool to proactively evict connections which are likely to be unusable).

arafel commented 1 month ago

If it helps anyone, currently I can reproduce this every time:

$ zig fetch --save=network git+https://github.com/ikskuh/zig-network
error: unable to discover remote git server capabilities: TlsInitializationFailed

Distributor ID: Ubuntu Description: Ubuntu 20.04.6 LTS Release: 20.04 Codename: focal

(x86_64)

ianprime0509 commented 1 month ago

@arafel that is most likely unrelated to the original issue here (EndOfStream vs TlsInitializationFailed), especially if you're able to reproduce it every time with a single dependency having no transitive dependencies. My first thought would be to check whether your system CA certificates are properly installed (on Ubuntu, the ca-certificates package): on a fresh Ubuntu 20.04 Docker container, ca-certificates is not installed by default, and I can reproduce your issue, but after installing ca-certificates it works fine. See also #14168 for a proposal which might improve this situation in the future, if that turns out to be your issue.