seeraven / gitcache

Local cache for git repositories to speed up working with large repositories and multiple clones.
BSD 3-Clause "New" or "Revised" License
40 stars 8 forks source link

Update mirror repo remote url before each fetch #32

Open Youw opened 7 months ago

Youw commented 7 months ago

The remote url of the same repository may slightly differ between gitcache invocations. The use case: two different CI jobs running on the same machine (not even concurently, or anythign like that), but configured differently. One is using the form of https://${GITHUB_PAT}@github.com/<repo>.git and the second one is using git@github.com:<repo>.git with ssh-agent setup running. Because the remote url of the mirror repo is not updated, only the job who initially configured a mirror would work, and the other one would fail, since for instance having an ssh-agent for https:// url is useless, and the PAT token for ssh url is unusable as well.

seeraven commented 7 months ago

I don't know whether I understood you correctly, but I guess you want to use one mirror regardless of the actual URL. For example, an initial clone from https://<github_pat>@github.com/<repo>.git creates that mirror and a second clone using git@github.com:<repo>.git updates the mirror and clones from it.

I think that can be solved by "normalizing" the URL further, reducing it to "/" and use that part as the index in the database. The interesting part will then be the update of the mirror, as currently that will use the URL initially stored in the bare git repo. This will have to change to use either the URL specified by a git clone call or the URL (actually the push-remote) stored in the currently used checkout (so that a git pull or git fetch works too).

Youw commented 7 months ago

You've understood it correctly.

This will have to change to use either the URL specified by a git clone call or the URL

I've looked up how Jenkins does its native checkout (it doesn't do any node-based caching, but it does re-use the workspace if it already exists, so it is very similar thing) and it does exactly that - always explicitly updates the remote fetch url in the existing local repo with the one currently specified in the clone configuration.

As for gitcache scenario, the "current repo original remote url" is only available during clone operation, and not for pull/fetch operations. I see at least two possible outcomes here: either only have this "update url" only for clone operation, or store the original remote url in some custom git config property of the checked out repo, so it is available not only during clone operation (this second one no longer sounds so trivial as I originally imagined).