rust-lang / cargo

The Rust package manager
https://doc.rust-lang.org/cargo
Apache License 2.0
12.58k stars 2.39k forks source link

Add support for sparse git checkouts #11165

Open sdroege opened 1 year ago

sdroege commented 1 year ago

Problem

Currently if a git repository is listed as a dependency in Cargo.toml then the whole git repository is cloned, which might contain a lot of other unnecessary things.

Proposed Solution

It would be nice if cargo supported sparse git checkouts so that only a specific subdirectory of the repository is cloned. This subdirectory would have to be specified in Cargo.toml together with the dependency.

cargo would then use this subdirectory as the root and assume it to be either a plain crate or a workspace, like it now does for the actual repository root.

Notes

No response

weihanglo commented 1 year ago

Thank you for the suggestion. I learnt new stuff today!

As I understand it, with sparse-checkout, git still needs to clone the whole repository index. Only if pairing with partial-clone can git avoids full clone. That means Cargo needs the supports of both features from libgit2, which Cargo depends on for all git-related operations at this time being. Those two features in libgit2 are still under consideration/development, and you can track them from here and here. Without libgit2 supports them natively, Cargos hardly helps.

In the meanwhile, the Cargo team plans to do experiments on replacing some of git2 functionality in Cargo with gitoxide. The short-term goals of gitoxide doesn't seem to include either sparse-index and partial-clone. You might want to kindly ask them about their opinions on these features.

flying-sheep commented 7 months ago

Here’s the issue for partial clones: https://github.com/Byron/gitoxide/issues/1046

Byron commented 5 months ago

As an update, sparse-indices are supported in the sense that they can be read, and every index interaction does consider them. It's a bit of a fringe feature right now as well, but it's on the radar.

And indeed, partial checkouts would only help so-and-so much if it wasn't accompanied with a way of reducing the initial download size. Today, a shallow clone, i.e. receiving only the data needed for the most recent commit, can already help (presumably, it also takes more time on the remote to generate).

As a next step, I imagine doing a partial clone with blob filter, so only a single commit (or maybe even the whole history) without any blob is downloaded. Despite being custom-generated, it should be fast as blobs should be the most costly here. Finally, gitoxide (partial) checkouts would have to be partial-repository aware, collect the missing blobs, and download them separately as part of the checkout. That pack would only be the subset of blobs actually needed, which should be good for a speed-boost on all sides.

ibraheemdev commented 3 months ago

Could Cargo use the git cli for sparse checkouts instead?

weihanglo commented 3 months ago

The idea is that Cargo avoids depending on external binaries. We can't control how external binaries evolve. It might become a compatibility issue. That's why net.git-fetch-with-cli is not the default.

Slightly off-topic. There is a generalized idea: Cargo could provide a plugin interface for fetching sources. I forgot if there is already an existing issue for that.