Open japborst opened 2 years ago
I think it would be useful if it will not create to much problem for the user. How do you image this to work? 😄 Should the user set a cache timeout themselfs, and if they enable caching expect errors such as merge conflicts which they have to deal with manually.
@lindell admitted I didn't deeply think about this yet, but a first version could implement an algorithm such a the following, given a $CACHE_ROOT
directory (I'm assuming Github terminology):
$CACHE_ROOT/$org/$repo
doesn't exist: check out as usual.$CACHE_ROOT/$org/$repo
does exist, execute a number of commands to get it into a pristine state:
git clean -fdx
git fetch --depth=[the-configured-fetch-depth]
git remote prune origin
git remote set-head origin -a
git checkout [the-configured-base-branch]
git reset --hard origin/[the-configured-base-branch]
# ^ If not configured, could run the semantic equivalent of e.g.:
# git symbolic-ref refs/remotes/origin/HEAD \
# | sed "s,^refs/remotes/origin/,," \
# | xargs git checkout
# git reset --hard refs/remotes/origin/HEAD
git submodule update --recursive # If `multi-gitter` currently handles submodules; didn't check.
(I'm no Git guru, so perhaps there's a more straightforward way to reset the repository into a pristine state, containing the n
most recent commit on the configured target branch, but the over-all gist would be the same: (a) re-use already-downloaded data, (b) update to match the most recent state, (c) clear any local modifications.)
I suppose there should also be a --trust-cached-repositories
flag (better name TBD), so that during rapid prototyping the user can iterate on the script passed to multi-gitter run
without incurring any IO overhead.
@Stephan202 So in that case, multi-gitter would still need to fetch from the remote. I guess this could speed up the process in some cases with very big repos and small changes 🤔 For those usecases it would indeed be useful.
Indeed, we have a number of large repos that would benefit from this.
(Currently we have a repository containing all our other repositories as submodules, with various operations performed using git submodule foreach
. This can be a bit unwieldy, but does have the benefit of repository state updates being decoupled from modification operations, which avoids extensive waiting between trials, even when on a slow network.)
To give a little more flavour to the size of the problem: in our case (and I imagine many other companies) running multi-gitter
against the entire GitHub org means cloning hundreds of repos. Even using the default depth of 1, that still means fetching between a few MB up to - worst case - a GB.
I do agree that this is something that should be added! I will not have the time to look at this any time soon, but if you add it and create a PR, I'm happy to merge it 🙂
Hello!
When using
multi-gitter
I noticed that on every run the respective repos are always pulled.It would be great if this could be cached, to avoid long wait times to pull many repositories (especially when the entire org is specified).