seeraven / gitcache

Local cache for git repositories to speed up working with large repositories and multiple clones.
BSD 3-Clause "New" or "Revised" License
37 stars 6 forks source link

Add shallow clone #53

Open Darejkal opened 1 week ago

Darejkal commented 1 week ago

It seems that when using git clone with specified depth (eg: --depth 1), gitcache will still try to mirror the entire repo.

Youw commented 1 week ago

I believe that's pretty much the whole point of gitcache - to have a mirror of the entire repo, which is a potentially heavy operation, but only during the initial clone, so that at any point of time in the future, the checkout/clone/etc. of the repo would be as fast as local copy of the data, and the full history is still available (often needed, e.g. to build a changelog, etc.).

Darejkal commented 1 week ago

This will kind of limit the use of the program on large codebase really, especially the ones where it is necessary to perform shallow clone before fetching the desired commit (This for example). I'm sorry for not taking a deep dive into the gitcache codebase before raising the feature request. If shallow cloning is inconvenient to implement based on the current strategy, please permit me to suggest the following alternatives:

I think the second feature is better since (i believe) it will have much less impact on the existing codebase and still support complete mirroring.

Also, when using gitcache, I encountered cases where a lock before cloning would be necessary (multiple processes does a fresh clone on the same repository). Apparently there is only a locking mechanism when updating an existing clone and any races during cloning will result in error (a local folder found but without the corresponding .lock). Will raise a new issue here if there is no current workaround.

seeraven commented 20 hours ago

Hi and sorry for the delay, I was quite busy lately with other stuff.

Yes, the default behaviour of gitcache is to always perform a full clone of the specified remote repository into the cache directory and then execute the command you have given but on the cached version. I agree that especially with shallow clones this might be quite unexpected. The reason for this behavior is that it is IMHO the only safe way to provide a cache for the specified repository that can serve all further requests.

Nevertheless, I agree that adding a "robust" clone method makes sense (I've also encountered the git RPC errors but was luckily able to circumvent them by switching to ssh). I imagine having another option in the configuration to enable it which would then perform the initial clone of the repository (that is if the repository was not yet cached) always using a git clone ... --depth 1 followed by git fetch --unshallow. I have to test whether that works with bare repositories as well, but if that works out of the box such a feature should be easy to implement.

Regarding the second issue, I'll have to check. I thought it should lock, but I remember having added some features a while ago which might have impacted the locking.