lindell / multi-gitter

Update multiple repositories in with one command
Apache License 2.0
896 stars 65 forks source link

Support caching repositories #235

Open japborst opened 2 years ago

japborst commented 2 years ago

Hello!

When using multi-gitter I noticed that on every run the respective repos are always pulled.

It would be great if this could be cached, to avoid long wait times to pull many repositories (especially when the entire org is specified).

lindell commented 2 years ago

I think it would be useful if it will not create to much problem for the user. How do you image this to work? 😄 Should the user set a cache timeout themselfs, and if they enable caching expect errors such as merge conflicts which they have to deal with manually.

Stephan202 commented 2 years ago

@lindell admitted I didn't deeply think about this yet, but a first version could implement an algorithm such a the following, given a $CACHE_ROOT directory (I'm assuming Github terminology):

I suppose there should also be a --trust-cached-repositories flag (better name TBD), so that during rapid prototyping the user can iterate on the script passed to multi-gitter run without incurring any IO overhead.

lindell commented 2 years ago

@Stephan202 So in that case, multi-gitter would still need to fetch from the remote. I guess this could speed up the process in some cases with very big repos and small changes 🤔 For those usecases it would indeed be useful.

Stephan202 commented 2 years ago

Indeed, we have a number of large repos that would benefit from this.

(Currently we have a repository containing all our other repositories as submodules, with various operations performed using git submodule foreach. This can be a bit unwieldy, but does have the benefit of repository state updates being decoupled from modification operations, which avoids extensive waiting between trials, even when on a slow network.)

japborst commented 2 years ago

To give a little more flavour to the size of the problem: in our case (and I imagine many other companies) running multi-gitter against the entire GitHub org means cloning hundreds of repos. Even using the default depth of 1, that still means fetching between a few MB up to - worst case - a GB.

lindell commented 2 years ago

I do agree that this is something that should be added! I will not have the time to look at this any time soon, but if you add it and create a PR, I'm happy to merge it 🙂