swiftlang / swift-package-manager

The Package Manager for the Swift Programming Language
Apache License 2.0
9.7k stars 1.33k forks source link

SwiftPM requires downloading entire git repository, which can be much larger than necessary #6062

Open calda opened 1 year ago

calda commented 1 year ago

Description

When downloading a package dependency via git, the entire repository and all its history is always downloaded. This is problematic considering git history can't be removed in a backwards-compatible way so there's no way to reduce the size of a large repository.

For example, the lottie-ios repo (that provides a Swift package) is 300+ MB. Lots of package consumers reached out to us to let us know that this was impacting their workflow. In regions with slow internet this can be especially problematic: https://github.com/airbnb/lottie-ios/issues/1822

It would be great if there were a way to address this without affecting the package integration flow for package consumers (e.g. still support the same .package(url:from:) syntax). Some approaches could include:

  1. Perform shallow clones (although I've read that SPM intentionally does not do this)
  2. Only download the git branch / tag referenced by the dependency (e.g. using --single-branch) rather than the entire repo. This would allow us to rewrite the history of the master branch to be much smaller, while maintaining the existing history on a separate branch / tag that isn't downloaded by default.
  3. Improve support for binary dependencies, particularly in Xcode (the Xcode SPM integration doesn't support remote binary dependencies as far as I can tell). One interesting approach I've see from Carthage is to automatically download and use the .xcframework.zip attached to the GitHub release if present. Supporting this an an option in SPM could be convenient.

As a workaround, we created a separate lottie-spm package that just wraps the remote XCFramework binary dependency that we attach to our GitHub releases:

let package = Package(
  name: "Lottie",
  platforms: [.iOS("11.0"), .macOS("10.10"), .tvOS("11.0")],
  products: [.library(name: "Lottie", targets: ["Lottie"])],
  targets: [
    .binaryTarget(
      name: "Lottie",
      url: "https://github.com/airbnb/lottie-ios/releases/download/4.0.1/Lottie.xcframework.zip",
      checksum: "b6d8b0b81975d91965b8bb00cffb0eae4b3d94538b6950a90bc1366afd5d4239")
  ])

This works decently well but adds extra release overhead and may be a bit confusing (since there are two separate repos for the same project).

Expected behavior

No response

Actual behavior

No response

Steps to reproduce

No response

Swift Package Manager version/commit hash

No response

Swift & OS version (output of swift --version && uname -a)

No response

neonichu commented 1 year ago

I think the ultimate solution to this will be the package registry which basically acts like option 2 of your list, probably even a bit better in practice.

One thing to keep in mind is that typically, there is no one tag that is being referenced, but a range of versions and SwiftPM needs to be able to look at the package manifests for possible versions to determine if they are usable (e.g. to check the tools-version). The package registry offers separate endpoints to get manifests without getting the entire source archive to solve this type of issue.

cgomez-rb commented 1 year ago

@neonichu I haven't worked on SPM a lot, but I was wondering if there is a way the scope of the package can be reduced to only the Source and Tests folders, without including test projects or images that could increase the size of the repository. In the end, if the user wants to see the test project, they can just pull the code from GitHub directly, but if you are importing from Xcode it will just download the code needed for the package to work. If this is already possible, please omit my comment, and point me to how we can do it. Thanks.

neonichu commented 1 year ago

@cgomez-rb it is not possible for SwiftPM to know which files are used/unused by a package without fetching the entire repo first and evaluating the manifest. It might be an interesting option to have in swift package archive-source though, so that package authors can choose to only upload the minimal content as part of their releases to the registry.

calda commented 1 year ago

It might be an interesting option to have in swift package archive-source though, so that package authors can choose to only upload the minimal content as part of their releases to the registry.

This would be great -- npm and Cocoapods both have support for this:

tomerd commented 1 year ago

I like the .npmignore idea (for registry publishing)

carloschaguendoml commented 1 year ago

Hi!, what do you think about using the --depth 1 argument when the repository is cloned, is another strategy used by cocoapods

aleufms commented 12 months ago

Hi!, what do you think about using the --depth 1 argument when the repository is cloned, is another strategy used by cocoapods

In GitRepository.swift, on fetch method, there is an explicit note saying that it will not create a shallow clone because it costs "more". Anyone knows what it means?

// Perform a bare clone.
//
// NOTE: We intentionally do not create a shallow clone here; the
// expected cost of iterative updates on a full clone is less than on a
// shallow clone.
guard !localFileSystem.exists(path) else {
    throw InternalError("\(path) already exists")
}

try self.clone(
    repository,
    repository.location.gitURL,
    path.pathString,
    ["--mirror"],
    progress: progressHandler
)
neonichu commented 12 months ago

@aleufms it's about what I mentioned above

One thing to keep in mind is that typically, there is no one tag that is being referenced, but a range of versions and SwiftPM needs to be able to look at the package manifests for possible versions to determine if they are usable (e.g. to check the tools-version).

In the common case, we don't know a version upfront, but a range (e.g. 1.0 < 2.0) and SwiftPM needs to potentially look at every version in the range to see if it is viable (based on tools-versions and other dependencies, so it'll parse the manifest). This operation is faster when we have a full clone and can just switch between versions.

deya-eldeen commented 11 months ago

can't we make a small adjustment, so that --depth = x is an option, it's understandable to keep the current behavior as default, but at least to have such an option available when resolving with xcodebuild?

deya-eldeen commented 11 months ago

I even tried to set git config --global fetch.depth 1 but SPM was not affected.

deltamualpha commented 10 months ago

Hi!, what do you think about using the --depth 1 argument when the repository is cloned, is another strategy used by cocoapods

In GitRepository.swift, on fetch method, there is an explicit note saying that it will not create a shallow clone because it costs "more". Anyone knows what it means?

This is a drive-by comment, but that probably references this old issue from the cocoapods approach to this problem: https://github.com/CocoaPods/CocoaPods/issues/4989#issuecomment-193772935. It turns out that shallow clones can cause surprisingly-high load on the github side of the connection.

tayloraswift commented 9 months ago

has there been any progress on this?

xedin commented 5 months ago

I was thinking about this recently so I'd like to leave a couple of thoughts here:

finestructure commented 3 months ago

Early on when working on the Swift Package Index's build system it became clear that cloning git repositories would be taking up a lot of our budgeted build time if we did full clones. At the same time we had to be able to check out any tag, branch, or even sha, which seems to be the same requirement SwiftPM has.

The solution we've arrived at might not be feasible or practical for SwiftPM, because I suspect there's more jumping around between revisions than we do, but I thought I'd share it anyway. Perhaps it can be made to work or spark ideas.

Here's what we do to run a build for tag 0.3.5 of our own package SemanticVersion:

git init .
git remote add origin https://github.com/SwiftPackageIndex/SemanticVersion.git
git fetch origin --depth=1 0.3.5
git reset --hard FETCH_HEAD
git submodule update --init --recursive

This is based on the following SO answer: https://stackoverflow.com/questions/3489173/how-to-clone-git-repository-with-specific-revision-changeset/3489576#3489576 and has served us well over the years.

robinkunde commented 2 months ago

I've also had to wrestle with this issue recently, and have a few thoughts:

neonichu commented 2 months ago

@robinkunde see above, SwiftPM has to parse the package manifests during dependency resolution, knowing about the tags is not sufficient to do anything.

tayloraswift commented 2 months ago

could SwiftPM not simply download the text content of the manifest from its URL? even if this only works for GitHub repositories, it would still address 95% of use cases.

robinkunde commented 2 months ago

Right, you have to combine ls-remote with another solution to get the manifests.

  1. For Github, this is straightforward using the raw file viewer. Example I don't know if GitHub places any restrictions on it or if authentication works different from the https git interface.
  2. The other solution would be to perform a shallow clone/fetch, followed by sparse checkout, for the desired tags/branches/commits.

Quick and dirty example:

➜  ~ git init sparse_and_shallow_test
Initialized empty Git repository in /Users/robin.kunde/sparse_and_shallow_test/.git/
➜  ~ cd sparse_and_shallow_test
➜  sparse_and_shallow_test git:(main) git remote add origin https://github.com/swiftlang/swift-package-manager.git
➜  sparse_and_shallow_test git:(main) git config core.sparseCheckout true
➜  sparse_and_shallow_test git:(main) echo "/Package.swift" > .git/info/sparse-checkout
➜  sparse_and_shallow_test git:(main) git fetch --depth=1 origin main
remote: Enumerating objects: 2375, done.
remote: Counting objects: 100% (2375/2375), done.
remote: Compressing objects: 100% (1702/1702), done.
remote: Total 2375 (delta 339), reused 1677 (delta 197), pack-reused 0
Receiving objects: 100% (2375/2375), 1.89 MiB | 6.06 MiB/s, done.
Resolving deltas: 100% (339/339), done.
From https://github.com/swiftlang/swift-package-manager
 * branch            main       -> FETCH_HEAD
 * [new branch]      main       -> origin/main
➜  sparse_and_shallow_test git:(main) git checkout main
branch 'main' set up to track 'origin/main'.
Already on 'main'
➜  sparse_and_shallow_test git:(main) ll
total 56
-rw-r--r--  1 robin.kunde  staff    28K Jul 10 15:15 Package.swift

I think this can be further sped up by apply a fetch filter, but I'm not familiar with that feature yet. EDIT: This may be as simple as git fetch --depth=1 --filter=blob:none origin main. I'll look into it some more later. EDIT 2: Just realized git introduced a sparse-checkout command some time ago that should make things easier as well. There are other options aimed at large repos such as sparse-index, manyFiles, untracked-cache, and assume-unchanged that may be worth investigating as well.