Open jelmer opened 5 years ago
@jelmer You have an idea how to fix this in klaus? Have a cache that updates whenever a change has been made to the repository? (Somehow circumventing Dulwich/"properly" loading the repo in Dulwich to save the time it takes to "properly" load the repo)
Yeah, I think we'd want a cache rather than actually reading the repositories every time. Perhaps we could make FancyRepo a wrapper for dulwich.Repo rather than being derived from it?
OK will look into this soon.
Curious: Do you actually have that kind of use case with 1000+ repos?
Check this out.
I guess there are a lot more ways to do caching but this one of the simplest things to do.
On Thu, Jul 04, 2019 at 12:59:51AM -0700, Jonas Haag wrote:
Curious: Do you actually have that kind of use case with 1000+ repos? Yeah, I'm working on a project to automatically patch Debian Git repositories, and would like to display the delta somehow. There are ~20k of those. :)
Jelmer
I'll have to ramp up my benchmark repository then! Tested it with 1k repos, but let me test and optimize with 20k ;)
If you have the time maybe you could help me think about how we can cache ref listing. I was thinking about checking the stat()
of some Git file or folder for cache invalidation; though I'm not sure there is such a thing as filesystem modification timestamp for "any of the recursive folders or files" that you could use for that. Other caching/cache invalidation ideas?
Of course we can always use simple time-based caching, particularly for information like repository description. But I'd rather use that as a last resort only.
Also inotify etc. but I'm not too keen on integrating that TBH
I've worked around this for now by adding an app that just shows a single repository (a list of 10k repositories is not very usable anyway...) and loads that repository on demand. This works, but is a bit ugly since it has to duplicate some of the logic in klaus (e.g. the route table).
See e.g. https://janitor.debian.net/git/klaus
Filing this mostly to track the work I'm doing in this area. With ~2000 repositories loaded, klaus still works well. However, there are two caveats: