wordpress-mobile / gutenberg-mobile

Mobile version of Gutenberg - native iOS and Android
GNU General Public License v2.0
241 stars 55 forks source link

Change CI `checkout` approach for faster repo setup #6719

Closed mokagio closed 5 months ago

mokagio commented 5 months ago

This experimental PR builds on top of a suggestion by @AliSoftware to add a --filter to the git clone command as explained in this blog post by GitHub.

Treeless clones In some repositories, the tree data might be a significant portion of the history. Using --filter=tree:0, a treeless clone downloads all reachable commits, then downloads trees and blobs on demand. [...] Treeless clones are really only helpful for automated builds when you want to quickly clone, compile a project, then throw away the repository.

At first, I tried setting --filter=tree:0 via the env var Buildkite offers. That worked well for checking out gutenberg-mobile but didn't improve the situation for its submodules. I tinkered with options but didn't get far. As far as I could tell, while one can customize some of the checkout behavior, Buildkite still adds certain parameters to the commands which interfered with the approach.

As such, I tried using a custom checkout script:

#!/usr/bin/env bash

# Hooks are sourced, so we need to set flags via `set`, not shebang.
set -eu

echo '[git-partial-clone-plugin] :git: Adding GitHub to known hosts'
ssh-keyscan -t rsa github.com >> "$HOME/.ssh/known_hosts"

# See https://github.blog/2020-12-21-get-up-to-speed-with-partial-clone-and-shallow-clone/
#
# > Treeless clones are really only helpful for automated
# > builds when you want to quickly clone, compile a project,
# > then throw away the repository. In environments like
# > GitHub Actions using public runners, you want to minimize
# > your clone time so you can spend your machine time
# > actually building your software! Treeless clones might be
# > an excellent option for those environments.

echo '[git-partial-clone-plugin] :git: Cloning repo and submodules'
git clone --filter=tree:0 --recurse-submodules --also-filter-submodules "$BUILDKITE_REPO" "$BUILDKITE_BUILD_CHECKOUT_PATH"

cd "$BUILDKITE_BUILD_CHECKOUT_PATH"

git submodule foreach --recursive "git clean -ffxdq"
git clean -ffxdq

echo '[git-partial-clone-plugin] :git: Checking out commit'
git checkout -f "$BUILDKITE_COMMIT"

git submodule sync --recursive
git submodule update --init --recursive

The result was that checking out the repo now takes ~30 seconds where before it took ~5 minutes.

image

image

The improvement is so good and promising that I'm scared I'm missing something here. Like, why wouldn't Buildkite use this approach out of the box?

Still, I'm putting it out here for your feedback and consideration. Let me know what you think.

Note: The iOS steps running on the mac queue don't use the plugin because the Apple Silicon CI optimization for Git @jkmassel built already delivers great speed.

fluiddot commented 5 months ago

@mokagio I noticed in the article you shared that for Treeless clones there's warning related to submodules:

⚠️ Warning: While writing this article, we were putting treeless clones to the test beyond the typical limits. We noticed that repositories that contain submodules behave very poorly with treeless clones. Specifically, if you run git fetch in a treeless clone, then the logic in Git that looks for changed submodules will trigger a tree request for every new commit! This behavior can be avoided by running git config fetch.recurseSubmodules false in your treeless clones. We are working on a more robust fix in the Git client.

I wonder if applying the suggestion of using git config fetch.recurseSubmodules false might speed up the process. Although, I suspect that using the parameter --also-filter-submodules might be an alternative to this, is this accurate?

AliSoftware commented 5 months ago

Like, why wouldn't Buildkite use this approach out of the box?

Maybe it's just an oversight on their part, not having tested partial checkouts vs submodules? I think it's worth opening a GitHub issue on Buildkite's repo to suggest that improvement—or alternatively contacting them by email or their support forums.

mokagio commented 5 months ago

@fluiddot

I wonder if applying the suggestion of using git config fetch.recurseSubmodules false might speed up the process

I thought I tried that, but I went back and look at my commits and the only config I set was submodule.alternateErrorStrategy info 😳

Although, I suspect that using the parameter --also-filter-submodules might be an alternative to this, is this accurate?

I can't say for sure. The docs say

       --also-filter-submodules
           Also apply the partial clone filter to any submodules in the repository.
           Requires --filter and --recurse-submodules. This can be turned on by default
           by setting the clone.filterSubmodules config option.

As far as I can see, the option was added in Git 2.36.0 as 2.35.0 doesn't have it. That version dates 2022 while the GitHub article dates 2020. So I think Git improved in the meantime and the current approach where we tell it to apply the filter to the submodules recursively is the way to go.