Open sdroege opened 1 year ago
Thank you for the suggestion. I learnt new stuff today!
As I understand it, with sparse-checkout, git still needs to clone the whole repository index. Only if pairing with partial-clone can git avoids full clone. That means Cargo needs the supports of both features from libgit2, which Cargo depends on for all git-related operations at this time being. Those two features in libgit2 are still under consideration/development, and you can track them from here and here. Without libgit2 supports them natively, Cargos hardly helps.
In the meanwhile, the Cargo team plans to do experiments on replacing some of git2 functionality in Cargo with gitoxide. The short-term goals of gitoxide doesn't seem to include either sparse-index and partial-clone. You might want to kindly ask them about their opinions on these features.
Here’s the issue for partial clones: https://github.com/Byron/gitoxide/issues/1046
As an update, sparse-indices are supported in the sense that they can be read, and every index interaction does consider them. It's a bit of a fringe feature right now as well, but it's on the radar.
And indeed, partial checkouts would only help so-and-so much if it wasn't accompanied with a way of reducing the initial download size. Today, a shallow clone, i.e. receiving only the data needed for the most recent commit, can already help (presumably, it also takes more time on the remote to generate).
As a next step, I imagine doing a partial clone with blob filter, so only a single commit (or maybe even the whole history) without any blob is downloaded. Despite being custom-generated, it should be fast as blobs should be the most costly here. Finally, gitoxide
(partial) checkouts would have to be partial-repository aware, collect the missing blobs, and download them separately as part of the checkout. That pack would only be the subset of blobs actually needed, which should be good for a speed-boost on all sides.
Could Cargo use the git cli for sparse checkouts instead?
The idea is that Cargo avoids depending on external binaries. We can't control how external binaries evolve. It might become a compatibility issue. That's why net.git-fetch-with-cli
is not the default.
Slightly off-topic. There is a generalized idea: Cargo could provide a plugin interface for fetching sources. I forgot if there is already an existing issue for that.
Problem
Currently if a git repository is listed as a dependency in
Cargo.toml
then the whole git repository is cloned, which might contain a lot of other unnecessary things.Proposed Solution
It would be nice if cargo supported sparse git checkouts so that only a specific subdirectory of the repository is cloned. This subdirectory would have to be specified in
Cargo.toml
together with the dependency.cargo would then use this subdirectory as the root and assume it to be either a plain crate or a workspace, like it now does for the actual repository root.
Notes
No response