Closed oharboe closed 2 years ago
Getting the git dates takes nearly as long as to read in the entire source code.
And is this a problem? What scale are we talking about, seconds or minutes?
Do you have an example project for benchmarking?
I've tried it on the NumPy docs (https://github.com/numpy/numpy/pull/18268) and it only takes a few seconds. I thought that NumPy is already big, but maybe you have a much bigger project in mind?
If you have hundreds of directories with very few files each, the current implementation is indeed a bit wasteful. However, if you have tens of directories with hundreds of files each, it shouldn't really be a problem.
It should be possible to change the implementation to issue a few git commands for the entire project upfront to avoid invoking git multiple times per folder.
The reason why I'm invoking git
on each folder is that each folder could potentially be a separate Git repository or - more realistically - a separate Git submodule.
To collect Git calls over subfolders, we would somehow have to find the Git repository (and submodule) boundaries.
Any ideas how to do that?
And FYI, originally I was calling git
on each and every file, which was really slow for large projects, but I refactored this in #25. This led to a huge improvement in runtime.
If you know how to make it even faster (while still not breaking in the presence of multiple Git repositories), please let me know!
And feel free to make pull requests!
I will reopen if I can generate data. For now I have disabled this plugin for local builds where peformance matters the most.
Getting the git dates takes nearly as long as to read in the entire source code.
It should be possible to change the implementation to issue a few git commands for the entire project upfront to avoid invoking git multiple times per folder.