wfrisch / idlib

Identify embedded C and C++ libraries
GNU General Public License v3.0
0 stars 1 forks source link

Replace (some) git subprocesses with native libgit2/pygit2 #8

Closed wfrisch closed 6 months ago

wfrisch commented 6 months ago

Indexing is already slow, but it becomes a problem if we want to add large libraries like ffmpeg with over 130000 commits. The main culprit is git describe which has to be executed for every commit. libgit2 can do this in-process, and thus much faster.

libgit2 [1] is a portable, pure C implementation of the Git core methods provided as a re-entrant linkable library with a solid API, allowing you to write native speed custom Git applications in any language that supports C bindings.

libgit2's has Python bindings available in pygit2 [2]

[1] https://libgit2.org/ [2] https://www.pygit2.org/

wfrisch commented 6 months ago

A quick test suggests that in-process pygit2.Repository.describe() is about a magnitude faster than spawning git describe subprocesses. This is definitely worth it and necessary for very large repos like ffmpeg.