rust-lang / cargo

The Rust package manager
https://doc.rust-lang.org/cargo
Apache License 2.0
12.63k stars 2.4k forks source link

Loading of a git source is slow due to always checking if the submodule is updated #14603

Open epage opened 6 days ago

epage commented 6 days ago

When you have some git dependencies or patches, loading them takes a long time

For the traces from #14238, notice how slow the first resolve is compared to the second. A lot of the time is taken up checking each git dependencies submodules image

Note: this slow down is most relevant for tooling like rust-analyzer or rust-defined completions when it may be called in interactive contexts without any compilation happening or "cargo script" where cargo build is performed every time you run your script.

Related: #14395

epage commented 6 days ago

Previously discussed at https://rust-lang.zulipchat.com/#narrow/stream/246057-t-cargo/topic/Redundant.20code.20in.20.60GitSouce.60.3F

epage commented 6 days ago

When we GitDatabase::copy_into a checkout, we skip it if the checkout is "fresh" https://github.com/rust-lang/cargo/blob/9d66d13e4479f1d51347c20264713c2d6cd8b4fa/src/cargo/sources/git/utils.rs#L174-L185 We then unconditionally check for submodules https://github.com/rust-lang/cargo/blob/9d66d13e4479f1d51347c20264713c2d6cd8b4fa/src/cargo/sources/git/utils.rs#L186

submoduels should be as immutable as the checkout and we should be able to guard the whole process with the freshness check.

The main risk in doing that is if you switch back and forth between old cargo and Cargo with the submodules covered by the freshness check and the submodule update is interrupted. New cargo will see that the freshness check passxes and assumes the submodule is valid and goes on its way. We could do extra fancy things with the file used for this like in #14395 but it doesn't seem worth it.

epage commented 6 days ago

CC @osiewicz

osiewicz commented 6 days ago

It seems like old cargo will "heal" itself from the corrupted checkout, as it'll remove it and refetch everything. Given that, I would agree that we could just proceed as is? Alternatively, could we bundle submodule freshness check with the checkout freshness check?

epage commented 6 days ago

It seems like old cargo will "heal" itself from the corrupted checkout, as it'll remove it and refetch everything. Given that, I would agree that we could just proceed as is? Alternatively, could we bundle submodule freshness check with the checkout freshness check?

The corner case is

  1. With old cargo
    1. See lack of freshness file
    2. Fetch commit
    3. Checkout commit
    4. Write freshness file
    5. Perform submodule update and get interrupted
  2. With new cargo
    1. See freshness file and assume commit is fetched, checked out, and submodules are updated, bypassing all other steps

This kind of quick back and forth is likely when working directly with Cargo and an IDE running Cargo in the background.

That said, I think we would be fine with this. This will be limited to a specific set of cargo versions run in a specific order, with an interruption. Over time, the combination of versions will become less likely.

If we did something else about this, some potential mitigations