rust-lang / cargo

The Rust package manager
https://doc.rust-lang.org/cargo
Apache License 2.0
12.87k stars 2.43k forks source link

Print a warning if DNS lookup take too long and is retried. #9211

Open matklad opened 3 years ago

matklad commented 3 years ago

Recently, I've been observing Cargo being stuck on the "Downloading n crates" message for quite a while. In theory, this shouldn't be happening, as I have a very good internet connection here. I am not sure what caused this, there are two variables: I moved to a new place with a faster Internet, and bought a new PC.

The backtrace is not really illuminating:

(gdb) thread apply all bt

Thread 1 (Thread 0x7f2334d0aa80 (LWP 14311) "cargo"):
#0  0x00007f2334dfe329 in poll () from /nix/store/gafigwfaimlziam6qhw1m8dz4h952g1n-glibc-2.32-35/lib/libc.so.6
#1  0x0000561c3597d6d5 in Curl_poll ()
#2  0x0000561c35979bc9 in Curl_multi_wait ()
#3  0x0000561c359796b0 in curl_multi_wait ()
#4  0x0000561c3596b2db in curl::multi::Multi::wait ()
#5  0x0000561c3540a091 in cargo::core::package::Downloads::wait ()
#6  0x0000561c35407493 in cargo::core::package::PackageSet::get_one ()
#7  0x0000561c352f313c in cargo::core::source::Source::download_now ()
#8  0x0000561c35339fff in cargo::ops::common_for_install_and_uninstall::select_dep_pkg ()
#9  0x0000561c35328801 in cargo::ops::cargo_install::install_one ()
#10 0x0000561c35325e3f in cargo::ops::cargo_install::install ()
#11 0x0000561c3521086c in cargo::commands::install::exec ()
#12 0x0000561c35204ddd in cargo::cli::main ()
#13 0x0000561c3526b4f8 in cargo::main ()
#14 0x0000561c351ff976 in std::sys_common::backtrace::__rust_begin_short_backtrace ()
#15 0x0000561c351ff98c in std::rt::lang_start::_$u7b$$u7b$closure$u7d$$u7d$::h38785f359b3e8a4a ()
#16 0x0000561c35b970e7 in core::ops::function::impls::{{impl}}::call_once<(),Fn<()>> () at /rustc/a5a775e3f9e8043dad405e00aee0ae60882a7b71/library/core/src/ops/function.rs:259
#17 std::panicking::try::do_call<&Fn<()>,i32> () at /rustc/a5a775e3f9e8043dad405e00aee0ae60882a7b71//library/std/src/panicking.rs:379
#18 std::panicking::try<i32,&Fn<()>> () at /rustc/a5a775e3f9e8043dad405e00aee0ae60882a7b71//library/std/src/panicking.rs:343
#19 std::panic::catch_unwind<&Fn<()>,i32> () at /rustc/a5a775e3f9e8043dad405e00aee0ae60882a7b71//library/std/src/panic.rs:431
#20 std::rt::lang_start_internal () at /rustc/a5a775e3f9e8043dad405e00aee0ae60882a7b71//library/std/src/rt.rs:51
#21 0x0000561c3526d855 in main ()

Naturally, when I killed cargo and re-run with

env CARGO_HTTP_DEBUG=true CARGO_LOG=cargo::ops::registry=debug

cargo got unstuck relatively promptly. Though, I think I might have caught the slowness nontheless?

See the log here: https://gist.github.com/matklad/ca62d3a9f9a34ba6d7cfede0efedf38a

Note

  Downloaded rustfilt v0.2.1
  Downloaded 1 crate (9.0 KB) in 6.74s

If I try to reproduce it (which, naturally, I can't anymore), I get much shorter timing:

12:04:47|~/tmp/hello|HEAD⚡?
λ rm ~/.cargo/registry/src/github.com-1ecc6299db9ec823/rustfilt-0.2.1/ -rf && rm ~/.cargo/registry/cache/github.com-1ecc6299db9ec823/rustfilt-0.2.1.crate -rf && cargo install rustfilt
    Updating crates.io index
  Downloaded rustfilt v0.2.1
  Downloaded 1 crate (9.0 KB) in 0.86s

This reminds me of this beauty: https://github.com/rust-lang/cargo/issues/3564#issuecomment-273939998

λ cargo --version 
cargo 1.51.0-beta (34170fcd6 2021-02-04)
matklad commented 3 years ago

Naturally, the log has 6 seconds wide hole in it:

[2021-02-27T08:55:00Z DEBUG cargo::ops::registry] http-debug: *   Trying 140.82.121.6:443...
...SNIP...
[2021-02-27T08:55:01Z DEBUG cargo::ops::registry] http-debug: * Connection #0 to host github.com left intact

[2021-02-27T08:55:01Z DEBUG cargo::ops::registry] Version {
        version: "7.74.0-DEV",
        host: "unknown",
        feature_ipv6: true,
        feature_ssl: true,
        feature_libz: true,
        feature_ntlm: false,
        feature_gss_negotiate: false,
        feature_debug: false,
        feature_spnego: false,
        feature_largefile: false,
        feature_idn: false,
        feature_sspi: false,
        feature_async_dns: true,
        feature_conv: false,
        feature_tlsauth_srp: false,
        feature_ntlm_wb: false,
        feature_unix_domain_socket: true,
        ssl_version: "OpenSSL/1.1.1h",
        libz_version: "1.2.11",
        iconv_version_num: "0",
        brotli_version_num: "0",
        nghttp2_version_num: "1214a",
        nghttp2_version: "1.33.90",
        protocols: [
            "file",
            "http",
            "https",
            "mqtt",
        ],
    }

[2021-02-27T08:55:06Z DEBUG cargo::ops::registry] http-debug: *   Trying 52.85.243.102:443...                                                    
...SNIP...
[2021-02-27T08:55:08Z DEBUG cargo::ops::registry] http-debug: * Connection #1 to host static.crates.io left intact
matklad commented 3 years ago

I think I've debugged this! The problem was misconfigured DNS. I figured this out when I tried upgrading my system in parallel with stuck cargo, and saw the following output from my systems package manager:

 direnv allow
direnv: loading ~/projects/wasmer/.envrc
direnv: using nix
direnv: ([/nix/store/wmgfjn5z4dj3z46548gl1zpnk06b42w3-direnv-2.27.0/bin/direnv export fish]) is taking a while to execute. Use CTRL-C to give up.
warning: unable to download 'https://cache.nixos.org/i776f62w3vlkyybs0fy64645zg6509is.narinfo': Couldn't resolve host name (6); retrying in 341 ms
warning: unable to download 'https://cache.nixos.org/mpkz2cwrf4bh957c3bbx010g5243w91g.narinfo': Couldn't resolve host name (6); retrying in 250 ms
warning: unable to download 'https://cache.nixos.org/i776f62w3vlkyybs0fy64645zg6509is.narinfo': Couldn't resolve host name (6); retrying in 623 ms
warning: unable to download 'https://cache.nixos.org/mpkz2cwrf4bh957c3bbx010g5243w91g.narinfo': Couldn't resolve host name (6); retrying in 640 ms
error: interrupted by the user

It would be nice if Cargo printed an error message along these lines as well.

petr-tik commented 2 years ago

thanks for filing this. I may have experienced a similar problem with a corporate ssh config pointing to an internal crates registry and was wondering if they overlap enough to share a solution.

Specifically, I have 2 questions:

  1. Is there any heuristic timeout value (static or dynamically calculated from ping) that can be used to automatically interrupt downloading with an error message.
  2. What's the most actionable information that we can give back to the user to fix their network problems?