jonashaag / klaus

docker run klaus / pip install klaus — the first Git web viewer that Just Works™.
http://klausdemo.lophus.org
Other
686 stars 101 forks source link

index page is really slow with 1000+ repositories #238

Open jelmer opened 5 years ago

jelmer commented 5 years ago

Filing this mostly to track the work I'm doing in this area. With ~2000 repositories loaded, klaus still works well. However, there are two caveats:

jonashaag commented 5 years ago

@jelmer You have an idea how to fix this in klaus? Have a cache that updates whenever a change has been made to the repository? (Somehow circumventing Dulwich/"properly" loading the repo in Dulwich to save the time it takes to "properly" load the repo)

jelmer commented 5 years ago

Yeah, I think we'd want a cache rather than actually reading the repositories every time. Perhaps we could make FancyRepo a wrapper for dulwich.Repo rather than being derived from it?

jonashaag commented 5 years ago

OK will look into this soon.

Curious: Do you actually have that kind of use case with 1000+ repos?

jonashaag commented 5 years ago

Check this out.

I guess there are a lot more ways to do caching but this one of the simplest things to do.

jelmer commented 5 years ago

On Thu, Jul 04, 2019 at 12:59:51AM -0700, Jonas Haag wrote:

Curious: Do you actually have that kind of use case with 1000+ repos? Yeah, I'm working on a project to automatically patch Debian Git repositories, and would like to display the delta somehow. There are ~20k of those. :)

Jelmer

jonashaag commented 5 years ago

I'll have to ramp up my benchmark repository then! Tested it with 1k repos, but let me test and optimize with 20k ;)

If you have the time maybe you could help me think about how we can cache ref listing. I was thinking about checking the stat() of some Git file or folder for cache invalidation; though I'm not sure there is such a thing as filesystem modification timestamp for "any of the recursive folders or files" that you could use for that. Other caching/cache invalidation ideas?

Of course we can always use simple time-based caching, particularly for information like repository description. But I'd rather use that as a last resort only.

Also inotify etc. but I'm not too keen on integrating that TBH

jelmer commented 4 years ago

I've worked around this for now by adding an app that just shows a single repository (a list of 10k repositories is not very usable anyway...) and loads that repository on demand. This works, but is a bit ugly since it has to duplicate some of the logic in klaus (e.g. the route table).

See e.g. https://janitor.debian.net/git/klaus