Closed pmrowla closed 8 months ago
Hi @pmrowla , thanks for the contribution.
Most important is that tests don't pass, apparently it gets stuck in test_blob_filter
wit pypy
Thanks for the review, I'll make the suggested changes and take a look into the pypy issue.
I'll likely keep the PR marked as draft for now, I'm still testing to make sure this works with real-world LFS filtering so it's still possible this PR may change a bit.
After some investigation, it looks like pypy and cffi don't always mix well if an ffi callback is run outside of the main thread, which is what was causing the pypy tests to hang (since the libgit2 filtering can be done in a separate thread).
I've rewritten the underlying filter/writestream code to done entirely in the _pygit2 C-extension (so this PR no longer uses cffi at all)
The flaky pypy tests should now be taken care of. The issue was a difference thread.is_alive()
behavior in CPython vs pypy. The PR now blocks on explicit threading.Event()
to signal that the writer thread is done rather than relying on thread.is_alive()
which should resolve the pypy race condition.
@jdavid we'd appreciate it if we could get a release for this pushed whenever you can, thanks!
yes, this week, just have to finish the ongoing PRs
The release is done
pygit2.filter
API for registering pure-pythongit_filter_*
filters.pygit2.BlobIO()
wrapper for reading raw and filtered blob contents.BlobIO
will stream buffered content (instead of loading the entire blob into memory) if the underlying libgit2 filter(s) support the streaming filter API.Repository.get_attr(..., commit=...)
to support attr lookup by commit IDFilter.check()
but it is useful for anyone implementing filters to lookup commit specific attributes before actually running the filter (i.e. to pre-fetch LFS objects in a batch rather than streaming them one by one upon checkout)This allows the user to subclass
pygit2.Filter
and then register their own Python smudge/clean filters.In libgit2, you normally need to define both a
git_filter
and agit_writestream
implementation for your filter. This PR abstracts this out into a singlepygit2.Filter
class so the user does not need to work directly with thegit_writestream
linked list chain that is invoked when libgit2 does filtering.