Closed william-alhant closed 1 month ago
I encountered the same issue, the first download attempt would fail, but the second would succeed.
I did a bit more testing. I tried to download packages from the same server with curl, and then extract them with tar/unzip: it is pretty fast, and nothing bad happens. So now I would lean towards an issue in pixi/rattler or reqwest/hyper...
Thats weird. I have seen this as well but I haven't been able to figure out why you get:
DEBUG hyper::proto::h1::conn: parse error (connection closed before message completed) with 0 bytes
DEBUG hyper::proto::h1::conn: error shutting down IO: Broken pipe (os error 32)
DEBUG hyper::client::client: client connection error: error shutting down connection: Broken pipe (os error 32)
I also don't know why I get this. Is there any other log or test that could help?
I tried to sample a call stack at random during download/install, with the pstack tool, on a recompiled debug build, and got this
Thread 9 (Thread 0x2adc0daf4700 (LWP 18931)):
#0 0x00002adc0bfe3d47 in flock () from /lib64/libc.so.6
#1 0x0000557b03a077d3 in fslock::unix::lock::h7a6e969c7a5dd50c (fd=13) at src/unix.rs:322
#2 0x0000557b03a07f80 in fslock::LockFile::lock::ha7b3f0a00b46253b (self=0x2adc0dae3ef0) at src/lib.rs:167
#3 0x0000557b03a0802a in fslock::LockFile::lock_with_pid::hc0667a16a00d9df6 (self=0x2adc0dae3ef0) at src/lib.rs:208
#4 0x0000557b0353af18 in rattler_networking::authentication_storage::backends::file::FileStorage::lock::heb4a62236029d1cf (self=0x557b07403700) at src/authentication_storage/backends/file.rs:55
#5 0x0000557b0353cfc2 in _$LT$rattler_networking..authentication_storage..backends..file..FileStorage$u20$as$u20$rattler_networking..authentication_storage..StorageBackend$GT$::get::hab54fcc510bbb6a5 (self=0x557b07403700, host=...) at src/authentication_storage/backends/file.rs:102
#6 0x0000557b03534865 in rattler_networking::authentication_storage::storage::AuthenticationStorage::get::h004c48059d2c2053 (self=0x557b075d5b20, host=...) at src/authentication_storage/storage.rs:88
#7 0x0000557b03535dd4 in rattler_networking::authentication_storage::storage::AuthenticationStorage::get_by_url::h2bd86cb3a1935aab (self=0x557b075d5b20, url=...) at src/authentication_storage/storage.rs:136
#8 0x0000557b03526e08 in _$LT$rattler_networking..authentication_middleware..AuthenticationMiddleware$u20$as$u20$reqwest_middleware..middleware..Middleware$GT$::handle::_$u7b$$u7b$closure$u7d$$u7d$::h1923ea75e8e75464 () at src/authentication_middleware.rs:29
#9 0x0000557b03a0bd2e in _$LT$core..pin..Pin$LT$P$GT$$u20$as$u20$core..future..future..Future$GT$::poll::h5865df82401df94a (self=..., cx=0x2adc0daf21a0) at /rustc/82e1608dfa6e0b5569232559e3d385fea5a93112/library/core/src/future/future.rs:125
#10 0x0000557b03a0bd2e in _$LT$core..pin..Pin$LT$P$GT$$u20$as$u20$core..future..future..Future$GT$::poll::h5865df82401df94a (self=..., cx=0x2adc0daf21a0) at /rustc/82e1608dfa6e0b5569232559e3d385fea5a93112/library/core/src/future/future.rs:125
#11 0x0000557b02923822 in reqwest_middleware::client::ClientWithMiddleware::execute_with_extensions::_$u7b$$u7b$closure$u7d$$u7d$::h95b8bea2d2aa0552 () at /home/wilalh01/.cargo/registry/src/index.crates.io-6f17d22bba15001f/reqwest-middleware-0.2.4/src/client.rs:160
#12 0x0000557b02923360 in reqwest_middleware::client::RequestBuilder::send::_$u7b$$u7b$closure$u7d$$u7d$::hf89aaa89b10acd4f () at /home/wilalh01/.cargo/registry/src/index.crates.io-6f17d22bba15001f/reqwest-middleware-0.2.4/src/client.rs:314
#13 0x0000557b01fb7018 in rattler_package_streaming::reqwest::tokio::get_reader::_$u7b$$u7b$closure$u7d$$u7d$::hb5c5c819077f717f () at /home/wilalh01/.cargo/registry/src/index.crates.io-6f17d22bba15001f/rattler_package_streaming-0.18.0/src/reqwest/tokio.rs:35
#14 0x0000557b01fb769a in rattler_package_streaming::reqwest::tokio::extract_conda::_$u7b$$u7b$closure$u7d$$u7d$::ha861a16ffa2bc992 () at /home/wilalh01/.cargo/registry/src/index.crates.io-6f17d22bba15001f/rattler_package_streaming-0.18.0/src/reqwest/tokio.rs:100
#15 0x0000557b01fb894c in rattler_package_streaming::reqwest::tokio::extract::_$u7b$$u7b$closure$u7d$$u7d$::h24e32e6b0fcf6afe () at /home/wilalh01/.cargo/registry/src/index.crates.io-6f17d22bba15001f/rattler_package_streaming-0.18.0/src/reqwest/tokio.rs:132
#16 0x0000557b021a2e9f in rattler::package_cache::PackageCache::get_or_fetch_from_url_with_retry::_$u7b$$u7b$closure$u7d$$u7d$::_$u7b$$u7b$closure$u7d$$u7d$::_$u7b$$u7b$closure$u7d$$u7d$::hce10fa620d924b68 () at /home/wilalh01/.cargo/registry/src/index.crates.io-6f17d22bba15001f/rattler-0.18.0/src/package_cache.rs:213
[...]
There might be some sort of locking issue
I checked with strace and it seems to confirm my suspicion:
44609 flock(12, LOCK_EX <unfinished ...>
44608 flock(13, LOCK_EX <unfinished ...>
44607 flock(15, LOCK_EX <unfinished ...>
44614 flock(16, LOCK_EX <unfinished ...>
44613 flock(17, LOCK_EX <unfinished ...>
44612 flock(14, LOCK_EX <unfinished ...>
44610 flock(18, LOCK_EX|LOCK_NB) = -1 EAGAIN (Resource temporarily unavailable) <0.000422>
44610 flock(18, LOCK_EX <unfinished ...>
44611 <... flock resumed> ) = 0 <6.045419>
44611 flock(19, LOCK_UN) = 0 <0.000261>
44609 <... flock resumed> ) = 0 <6.044859>
44609 flock(12, LOCK_UN <unfinished ...>
44611 flock(19, LOCK_EX|LOCK_NB <unfinished ...>
44609 <... flock resumed> ) = 0 <0.000421>
44611 <... flock resumed> ) = -1 EAGAIN (Resource temporarily unavailable) <0.000267>
44611 flock(19, LOCK_EX <unfinished ...>
44609 flock(12, LOCK_EX|LOCK_NB) = -1 EAGAIN (Resource temporarily unavailable) <0.000278>
44609 flock(12, LOCK_EX <unfinished ...>
44618 +++ exited with 0 +++
44608 <... flock resumed> ) = 0 <12.140519>
44608 flock(13, LOCK_UN) = 0 <0.000296>
44608 flock(13, LOCK_EX|LOCK_NB) = -1 EAGAIN (Resource temporarily unavailable) <0.000275>
44608 flock(13, LOCK_EX <unfinished ...>
44607 <... flock resumed> ) = 0 <15.201206>
44607 flock(15, LOCK_UN) = 0 <0.000319>
44607 flock(15, LOCK_EX|LOCK_NB) = -1 EAGAIN (Resource temporarily unavailable) <0.000258>
44607 flock(15, LOCK_EX <unfinished ...>
44614 <... flock resumed> ) = 0 <18.213597>
44614 flock(16, LOCK_UN) = 0 <0.000410>
44613 <... flock resumed> ) = 0 <18.215742>
44613 flock(17, LOCK_UN) = 0 <0.000271>
44612 <... flock resumed> ) = 0 <18.217978>
44614 flock(16, LOCK_EX|LOCK_NB) = -1 EAGAIN (Resource temporarily unavailable) <0.000044>
44614 flock(16, LOCK_EX <unfinished ...>
44612 flock(14, LOCK_UN <unfinished ...>
44613 flock(17, LOCK_EX|LOCK_NB <unfinished ...>
44612 <... flock resumed> ) = 0 <0.000388>
44613 <... flock resumed> ) = -1 EAGAIN (Resource temporarily unavailable) <0.000250>
44613 flock(17, LOCK_EX <unfinished ...>
44612 flock(14, LOCK_EX|LOCK_NB) = -1 EAGAIN (Resource temporarily unavailable) <0.000196>
44612 flock(14, LOCK_EX <unfinished ...>
44610 <... flock resumed> ) = 0 <30.000863>
Some of the flock syscalls take seconds to finish.
Actually, it might be an issue with my environment. My home directory (where pixi/rattler stores/locks credentials.json
) is mounted on NFS, and in my environment, NFS can often be under heavy load, and this would not be the first NFS related issue I have seen.
Is there a way to use another location for credentials.json
? I would like to see what happens when it is on regular local storage, maybe it would solve my issues...
Edit: I tried recompiling with a change replacing the default AuthenticationStorage
, with a different path for credentials.json
, not in my NFS home directory. It really seems to solve my issues. The download part of the installation goes really fast now.
I think it might be nice:
to make the path configurable (unless it is already and I missed it)
I just saw #994 is already implementing that part :)
Hi @william-alhant, thanks for debugging this!
I think we should solve this by reading the entire authentication file once and then just keeping it in cache. Thanks for debugging this!
@william-alhant I've implemented a cache here: https://github.com/mamba-org/rattler/pull/573
I hope that that would help with the issues you were seeing. We should only try to read the file once in the future, and lock it also only once. File locks are very slow over NFS.
Since both suggestions have been implemented Im closing this for now.
Feel free to reopen if the issue persists.
Checks
[X] I have checked that this issue has not already been reported.
[X] I have confirmed this bug exists on the latest version of pixi, using
pixi --version
.Reproducible example
Environment/reproduction os: Linux RHEL7 pixi version: 0.14
pixi.toml:
Issue description
Hello,
Thanks for developping pixi, I think it is quite nice in my personal opinion, and I would like to use it at work on the project I'm working on. That being said, I am encountering some error when downloading packages.
The error looks like this:
I ran this a few times, it does not always fail on the same package. As you can see, I am using a JFrog/artifactory mirror (set up as channel in pixi.toml). I can't find if this is a client side issue or a server side issue. From my perspective, it seems like it could be either some abnormal connection termination and/or maybe an oversight in error handling. The error occurs approximately a couple of minutes after downloading/installation starts, when it runs on a loaded machine. I have had more success when running on a less loaded machine.
Here are some logs I gathered by setting 'RUST_LOG=hyper::proto::h1::conn=debug,hyper::client=debug':
Expected behavior
Expected behavior is to handle the error more gracefully and continue the download/installation if possible.