libgit2-racket / libgit2

Low-level Racket bindings for the libgit2 C library
https://pkgs.racket-lang.org/package/libgit2
MIT License
3 stars 2 forks source link

git_revwalk_next: handle `GIT_ITEROVER` #1

Closed LiberalArtist closed 1 year ago

LiberalArtist commented 1 year ago

It is not an error: it signals the end of the walk.

Also, move _git_revwalk to revwalk.rkt and allocate the output _git_oid for the user.

Reported by @kengruven

Related to https://github.com/bbusching/libgit2/issues/11

LiberalArtist commented 1 year ago

Re @kengruven / https://github.com/bbusching/libgit2/issues/11#issuecomment-1621008762:

With the new version today (thanks!), I was able to get revwalk to loop the correct number of times from my HEAD, and return hashes. I used git_oid_fromstr to make a dummy OID to use as the out pointer.

One catch, so far: in C, git_revwalk_next returns GIT_ITEROVER to indicate the end of the list, which isn't exactly an error, but Racket's binding doesn't seem to understand the distinction. So when the Racket revwalk hits the end, it writes to stderr:

git_revwalk_next: an error occurred;
 the foreign function returned an error code,
 but git_error_last returned a null pointer
  error code: 'GIT_ITEROVER
  error class: #f
  context...:

The GIT_ITEROVER "error" case doesn't return normally, so there's no opportunity to check the return value to see if it's complete.

Both of these issues should now be fixed: git_revwalk_next now takes only one argument, a _git_revwalk, and returns either a _git_oid (which it has allocated for you) or #f to signal the end of iteration (rather than raising an exception).

This commit doesn't add any tests, but there currently aren't any tests for the revwalk functions. Ultimately I think we should adopt the clever tricks used in the upstream libgit2 test suite to create test repositories, but that will be a bigger project, so I don't think it should block this change.

Re https://github.com/bbusching/libgit2/issues/11#issuecomment-1627527920:

In libgit2, it seems you use revwalk by:

* creating a revwalk object

* adding/removing any globs/oids/tags/etc you want

* looping over the lazy sequence of oids it generates

Ideally, I think what I want as an interface in Racket is something that lets me add/remove all these types of things at once, at creation time:

(in-git-revs my-repo #:sort 'time
             'HEAD
             "foo/bar/*.json"
             (lambda (commit-id payload) ...)
             (not some-specific-oid-to-skip)
             (not "some-tag"))

which I can use with the for/* functions, and it delivers a lazy sequence of oids.

I absolutely agree that an interface along these lines would be more idiomatic for Racket than the deeply imperative and stateful upstream interface.

Nonetheless, I am hesitant to add it to this package, at least for now. I think this package should work at basically the same level of abstraction as the libgit2 C API, just adding a safe interface and automatic memory management. For actually writing Racket programs, it would definitely be desirable to use a higher-level interface, but I think the role of this package is to provide the low-level primitives that would make implementing higher-level interfaces possible.

To use the revwalk APIs as an example, the upstream docs say, "This revision walker uses a custom memory pool and an internal commit cache, so it is relatively expensive to allocate. For maximum performance, this revision walker should be reused for different walks." A high-level library implementing your in-git-revs might want to maintain a pool of revision walkers and lease or allocate one each time a sequence returned by in-git-revs is initiated. Obviously, there would be many design decisions to make. For this package, I think the first priority should be providing a robust and opinionated set of bindings: then we can freely explore different points in the design space for high-level APIs.

I am definitely in favor of higher-level interfaces! Unless someone else does first, I would hope eventually to create a higher-level package under this Github organization, maybe even in this very repository. I'm also not opposed to providing some conveniences when The Right Thing is fairly obvious. I just think there are some prerequisites that need to be addressed first.

In the mean time, here's how I'd use git_revwalk_next in for loops:

#lang racket
(require libgit2)
(let ([rw (git_revwalk_new
           (git_repository_open 
            "/home/philip/code/pkgs/libgit2"))])
  (git_revwalk_push
   rw
   (git_oid_fromstr
    "868bf3a47fbde14eee4b1b639ae21a078dfd9786"))
  (for/list ([oid (in-producer git_revwalk_next
                               #f
                               rw)])
    (git_oid_fmt oid)))
#; ; Result:
'("868bf3a47fbde14eee4b1b639ae21a078dfd9786"
  "21ba31e518e57e435718e630da9794ecada785ff"
  "9771ff3bf6ed1bf8c15d0a25fa2e129341fc4abb"
  "c1c4857041083b6664c1351ca19aba3153025d7d"
  "1a745dbbbc2d3f19c35b7ae680d81623d92fe056"
  "b08e07415a8828343f3747bd41889c017b205783")

I'm not really a Racket programmer, though, so input is welcome. Would this need a more explicit way to declare what each type of added/removed thing is? Can a Racket procedure be used as a callback from C?

  1. I think you would probably need some way to distinguish whether a string should be used with e.g. git_revwalk_push_glob vs. git_revwalk_push_ref.
  2. Yes, see docs here. We do currently provide git_revwalk_add_hide_cb, though I haven't reviewed it (or anything else in this file) yet.