rust-lang / crates.io

The Rust package registry
https://crates.io
Apache License 2.0
2.94k stars 598 forks source link

Apparent race condition in "waiting for" recently-published package to be available #9531

Open joshlf opened 3 days ago

joshlf commented 3 days ago

This happened at approximately 9:20am Pacific time on 2024-09-27 in case it's useful to cross-reference against logs.

I ran the following commands at https://github.com/google/zerocopy/commit/ec2cfffc99c1f040282d48b2784c140fd5a0f53d in order to publish (absolute path prefixes have been redacted). Note that there are two commands being run here (see cargo publish -p zerocopy about 2/3 of the way down). I'm including the output verbatim in case that's relevant, but apologies for it being somewhat verbose.

$ cargo publish --package zerocopy-derive && cargo publish --package zerocopy
    Updating crates.io index
warning: `/.../.cargo/credentials` is deprecated in favor of `credentials.toml`
note: if you need to support cargo 1.38 or earlier, you can symlink `credentials` to `credentials.toml`
   Packaging zerocopy-derive v0.8.0-alpha.26 (/.../zerocopy/zerocopy-derive)
   Verifying zerocopy-derive v0.8.0-alpha.26 (/.../zerocopy/zerocopy-derive)
    Updating crates.io index
   Compiling proc-macro2 v1.0.86
   Compiling unicode-ident v1.0.13
   Compiling quote v1.0.37
   Compiling syn v2.0.77
   Compiling zerocopy-derive v0.8.0-alpha.26 (/.../zerocopy/target/package/zerocopy-derive-0.8.0-alpha.26)
    Finished `dev` profile [unoptimized + debuginfo] target(s) in 2.48s
    Packaged 81 files, 419.9KiB (61.2KiB compressed)
   Uploading zerocopy-derive v0.8.0-alpha.26 (/.../zerocopy/zerocopy-derive)
    Uploaded zerocopy-derive v0.8.0-alpha.26 to registry `crates-io`
note: waiting for `zerocopy-derive v0.8.0-alpha.26` to be available at registry `crates-io`.
You may press ctrl-c to skip waiting; the crate should be available shortly.
   Published zerocopy-derive v0.8.0-alpha.26 at registry `crates-io`
    Updating crates.io index
warning: `/.../.cargo/credentials` is deprecated in favor of `credentials.toml`
note: if you need to support cargo 1.38 or earlier, you can symlink `credentials` to `credentials.toml`
   Packaging zerocopy v0.8.0-alpha.26 (/.../zerocopy)
   Verifying zerocopy v0.8.0-alpha.26 (/.../zerocopy)
    Updating crates.io index
error: failed to verify package tarball

Caused by:
  failed to select a version for the requirement `zerocopy-derive = "=0.8.0-alpha.26"`
  candidate versions found which didn't match: 0.8.0-alpha.25, 0.8.0-alpha.24, 0.8.0-alpha.23, ...
  location searched: crates.io index
  required by package `zerocopy v0.8.0-alpha.26 (/.../zerocopy/target/package/zerocopy-0.8.0-alpha.26)`
  if you are looking for the prerelease package it needs to be specified explicitly
      zerocopy-derive = { version = "0.8.0-alpha.25" }
  perhaps a crate was updated and forgotten to be re-vendored?
Joshuas-MBP:zerocopy josh$ cargo publish -p zerocopy
    Updating crates.io index
warning: `/.../.cargo/credentials` is deprecated in favor of `credentials.toml`
note: if you need to support cargo 1.38 or earlier, you can symlink `credentials` to `credentials.toml`
   Packaging zerocopy v0.8.0-alpha.26 (/.../zerocopy)
   Verifying zerocopy v0.8.0-alpha.26 (/.../zerocopy)
    Updating crates.io index
   Compiling zerocopy v0.8.0-alpha.26 (/.../zerocopy/target/package/zerocopy-0.8.0-alpha.26)
    Finished `dev` profile [unoptimized + debuginfo] target(s) in 1.66s
    Packaged 453 files, 1.4MiB (217.7KiB compressed)
   Uploading zerocopy v0.8.0-alpha.26 (/.../zerocopy)
    Uploaded zerocopy v0.8.0-alpha.26 to registry `crates-io`
note: waiting for `zerocopy v0.8.0-alpha.26` to be available at registry `crates-io`.
You may press ctrl-c to skip waiting; the crate should be available shortly.
   Published zerocopy v0.8.0-alpha.26 at registry `crates-io`

If I'm reading this correctly, this line...

note: waiting for `zerocopy-derive v0.8.0-alpha.26` to be available at registry `crates-io`.

...successfully blocked until zerocopy-derive was published, but the subsequent cargo publish --package zerocopy command nonetheless observed zerocopy-derive as not yet published. My understanding is that the "waiting for" feature is intended to ensure that this use case (automatically publishing one crate immediately followed by publishing another one) always succeeds without a race condition.

Turbo87 commented 3 days ago

@rust-lang/cargo I'm not sure, but this might be on your side? https://github.com/rust-lang/crates.io-index/commits/ca08a581d1282b5ffe1813ebe2da56e4ba83594f looks like the index was updated in the correct order so idk what happened here. this might be a race condition or something like that on the index updates performed by cargo 🤔

epage commented 3 days ago

Yeah, I'm assuming this is on Cargo's side but I have no idea what it could be. The "wait for package" code is basically doing the same check that the resolver is doing when it failed here. We re-download the index until we see the entry. The fact the failed publish is in its own process guarantees there wasn't a stale in-memory cache (compared to our upcoming workspace-publish where its all in one process).

What version of Cargo was used for this?

joshlf commented 3 days ago

We re-download the index until we see the entry.

Is it possible that there's caching happening on the server side of things? Maybe load balancing or something else that'd introduce the possibility of eventual consistency? I'm obviously just guessing since I don't know anything about crates.io's backend design.

What version of Cargo was used for this?

$ cargo --version
cargo 1.78.0 (54d8815d0 2024-03-26)
ehuss commented 3 days ago

I would assume the Fastly purge is not atomic. I would guess that the first call hit a server that got purged (or was a cache miss), and the second hit a server that had not yet purged.

epage commented 3 days ago

So the assumption is that the "wait for publish" got a result back from an updated server and then the second publish got a stale result, overwriting the new file and causing it to be removed from our on-disk caches? And I assume we use an etag that is only Eq and not Ord which would let us report these problems?

ehuss commented 3 days ago

Correct, I believe if-none-match is etag equality.

Turbo87 commented 3 days ago

I guess it's time to /cc @rust-lang/infra since they are in charge of the CDNs and we don't have access to them ourselves.

Turbo87 commented 3 days ago

I assume we use an etag that is only Eq and not Ord which would let us report these problems?

FWIW the CDN is sending a last-modified header that could potentially be used to implement Ord behavior on the invalidations