mgeier / sphinx-last-updated-by-git

:watch: Get the "last updated" time for each Sphinx page from Git
BSD 2-Clause "Simplified" License
44 stars 9 forks source link

Slowdown due to many invocations of git #33

Closed oharboe closed 2 years ago

oharboe commented 2 years ago

Getting the git dates takes nearly as long as to read in the entire source code.

It should be possible to change the implementation to issue a few git commands for the entire project upfront to avoid invoking git multiple times per folder.

mgeier commented 2 years ago

Getting the git dates takes nearly as long as to read in the entire source code.

And is this a problem? What scale are we talking about, seconds or minutes?

Do you have an example project for benchmarking?

I've tried it on the NumPy docs (https://github.com/numpy/numpy/pull/18268) and it only takes a few seconds. I thought that NumPy is already big, but maybe you have a much bigger project in mind?

If you have hundreds of directories with very few files each, the current implementation is indeed a bit wasteful. However, if you have tens of directories with hundreds of files each, it shouldn't really be a problem.

It should be possible to change the implementation to issue a few git commands for the entire project upfront to avoid invoking git multiple times per folder.

The reason why I'm invoking git on each folder is that each folder could potentially be a separate Git repository or - more realistically - a separate Git submodule. To collect Git calls over subfolders, we would somehow have to find the Git repository (and submodule) boundaries. Any ideas how to do that?

And FYI, originally I was calling git on each and every file, which was really slow for large projects, but I refactored this in #25. This led to a huge improvement in runtime.

If you know how to make it even faster (while still not breaking in the presence of multiple Git repositories), please let me know!

And feel free to make pull requests!

oharboe commented 2 years ago

I will reopen if I can generate data. For now I have disabled this plugin for local builds where peformance matters the most.