nodegit / nodegit

Native Node bindings to Git.
https://www.nodegit.org/
MIT License
5.66k stars 692 forks source link

Using code-generation to generate #70

Closed nkallen closed 11 years ago

nkallen commented 11 years ago

I have made some breaking API changes on my branch. This is a WIP, but I'm showing early to get feedback early.

https://github.com/nkallen/nodegit/commit/458e08079a8089ba9a75f5046fc4d2b173164f3e

I'm investigating code generation instead of manually writing bindings libgit2. This branch code-gen's an easy class, GitOid, from a JSON description of the libgit2 interface. The JSON is based on the data from here, http://libgit2.github.com/libgit2/v0.18.0.json, but highly modified

IN THEORY you just mark methods as "async", "prototype", "constructor", etc. and it will "just work". It works for GitOid, all tests pass.

The new interface for the raw Oid is what I described:

var oid = git.raw.Oid.fromString("SHA1")
oid.sha() = ...

This API change has repercussions, changing some APIs to sync from async, as in Commit. ATM, the API is inconsistent, as Reference's sha() method is still async, because of indirect references. However, I propose to change the api to have have an IndirectReference class and a DirectReference class, with IndirectReference only having a resolve() method, and sha() being synchronous on DirectReference.

It is no longer possible to construct a raw Oid from scratch, as it would be in an invalid state; and further objects are now immutable.

Along the way I've identified what I believe to be memory leaks. I'm hoping to have the code-gen produce code without these leaks, if they are indeed real.

tbranyen commented 11 years ago

Oh super cool, nice work! I'm absolutely sure there are memory leaks as my early dive into Valgrind indicated.

A sync API does break a lot of the use cases provided by this library, but I'm sure its possible to generate async code as well.

nkallen commented 11 years ago

This is intended to generate both sync and async code -- the async isn't done yet, but hopefully it will be by friday.

A goal of my implementation though is to only make methods that do IO async. If they don't do IO, they are sync. The particular change mentioned here (Oid.fromStr() becoming sync) is due to the fact that the underlying libgit2 implementation does not do IO in these methods.

faceleg commented 11 years ago

This looks wonderful! @tbranyen we could generate async code as well, or just update the use cases, as implied by @nkallen's suggestions here: https://github.com/nodegit/nodegit/issues/68#issuecomment-19598940

tbranyen commented 11 years ago

That's fair and I agree with the changes :+1:

nkallen commented 11 years ago

With this commit:

https://github.com/nkallen/nodegit/commit/42c7fa76e467d1f1dd5a541d14759ddc931f6d8d

I am now code-gening GitBlob. This is noteworthy since GitBlob has real async methods, like lookup(). While I expect further roadblocks, this seems to suggest that most, if not all, of the raw api can be code-gen'd. Over the next week I will try to convert more and more classes.

Note that the API continues to change, for two reasons.

  1. As before, methods that don't perform IO no longer take callbacks. So content() (formerly rawcontent()) doesn't do IO, only lookup() does IO. lookup() remains async. This propagates through the convenience API. (Note also, that lookup() is now a method on the constructor: var blob = git.Blob.lookup(sha)).
  2. The raw API matches more closely to libgit2. For example, content() doesn't return a buffer, as libgit2 returns a void * and you must also call git_blob_size() to figure out what to do with the void *. The raw api thus becomes: blob.content().toBuffer(blob.size()). The convenience API, however, is unaffected by this change.
nkallen commented 11 years ago

I expect within a week I will be able to code-gen nearly everything. Two more things finished today.

Repo: https://github.com/nkallen/nodegit/commit/458d8719707bbd2e2cc5294a03c70a20fc396a58 Reference: https://github.com/nkallen/nodegit/commit/707675a296b1d1c833eefae8d99d8592b8573648

tbranyen commented 11 years ago

Awesome, can't wait for a pull request so we can review and version bump. Once we match the API 1:1 we should bump the version to a minor 0.1.0 imo.

nkallen commented 11 years ago

This latest branch converts TreeEntry, Commit, Signature, Time, and Object.

https://github.com/nkallen/nodegit/commit/cf7d3520e3cb6c6c3e21f46851b6eebb8feaa0f1

At this point it's worth noting that the API has continued to drift a bit. In addition to the fake async methods become sync, putting factory methods on constructors, etc., i've also moved some of the lookup/factory methods to repo so repo doesn't have to be threaded through the code as much.

The next thing I plan to do may be controversial. DiffList, Index, Error, Odb, Revwalk, Tag, Threads, and Tree remain to be converted. Of these, all are trivial except DiffList and Tree, because both use the C callback methods. AFAICT, there is no particular reason to use these, and they complicate the implementation dramatically: E.g., in DiffList

static void WalkWorkSendFile(uv_async_t *handle, int status /*UNUSED*/);
static void WalkWorkSendHunk(uv_async_t *handle, int status /*UNUSED*/);
static void WalkWorkSendData(uv_async_t *handle, int status /*UNUSED*/);
static void WalkWorkSendEnd(uv_async_t *handle, int status /*UNUSED*/);

With these have to a worry about caching and what runs in what threads.

There is a simpler version of the interface, IIUC, better suited for cross-language bindings, namely you can call git_diff_num_deltas and git_diff_get_patch with an index. The only trade-off is explicit memory management, but that's fairly trivial.

The convenience API could preserve the event emitter style for Tree if you want, but why not ditch it for a simpler interface like this?

function dfs(tree, error) {
  tree.entries().forEach(function (entry) {
    if (entry.isDirectory()) entry.getTree(dfs)
    else if (entry.isFile()) entry.getBlob(f)
  });
}

It's a little weird because of the asynchronicity, though not that bad, and it does have the theoretical possibility of a stack overflow, but that's totally unrealistic given a real-world filesystem. In any case, you could build an event emitter on top of this fairly trivially if you wish.

nkallen commented 11 years ago

Everything but DiffList and Tree is now done.

tbranyen commented 11 years ago

Hey @nkallen since you're being such a good steward I added you as an owner to the project :) Really appreciate all the help.

nkallen commented 11 years ago

Thanks @tbranyen !

In general I should warn you I'm not a good steward of open-source projects, but I anticipate being heavily involved with nodegit for the next month or so.

In any case,

https://github.com/nkallen/nodegit/tree/codegen

Now does Tree. I preserved the event emitter api on the convenience code, but you'll notice the C++ implementation no longer uses callback methods. This dramatically simplifies the C++ code. You may want to take a look at https://github.com/nkallen/nodegit/blob/codegen/lib/tree.js#L62 . I intend to perform this same transformation to DiffList and RevWalk.

faceleg commented 11 years ago

Great job @nkallen, this project has sorely needed more contributions. I love all that you're doing.

nkallen commented 11 years ago

Latest commit does Diff https://github.com/nkallen/nodegit/commit/d9ea5a6123736566593a52e726498f2ebf89393c . The convenience API around diff is basically gone, I intend to rebuild it. But all of the basic functionality works, if inconveniently.

nkallen commented 11 years ago

Only class left is RevWalk, a fairly easy one. I will do that tomorrow hopefully, although I'm flying to Croatia so we'll see, could take till the weekend. After RevWalk I expect to make a pass through the raw and convenience API to make them both as nice as possible, with fairly conservative changes. It's possible I'll want to switch from using decorators to inheritance as I think I can make the raw API much nicer with very little work -- the codegen allows me to say which libgit methods are in which classes, so I can put branch.lookup(repo, branchname) on repo.lookupBranch(branchname) just by changing the json, which makes the the "raw" layer seem almost OO, and thus the convenience layer becomes very lightweight. If the raw api becomes nice enough, decorators are a pain because it's lots of boilerplate delegation, whereas inheritance would be preferable if the convenience layer is as thin as I hope it can be.

I need to make sure that raw API has decent documentation, I'm not yet familiar with how we do documentation in C++ bindings but I'll figure it out. After that, I'll rewrite the examples to reflect the changes in the API. Then we'll need to do some extensive testing of the code to make sure this shit actually works. The trickiest part is memory management, making sure 1) there are no memory leaks and 2) we aren't freeing memory prematurely, which leads to segfaults.

Once we're all satisfied that this works fairly robustly, I'll do a minor amount of refactoring, and then we can merge this to master and tick the version #.

nkallen commented 11 years ago

Sorry I disappeared, I've had no internet for a week. It's actually quite difficult to program without being able to copy and paste from stackoverflow. I did get some work done, however.

I'm nearing a release candidate as most things seem to work. A bunch more of libgit is now supported. It's hard to tell, though, since I haven't yet figured out how to generate documentation. I've reworked all the examples in in the examples directory: https://github.com/nkallen/nodegit/tree/codegen/example. In particular take a look at https://github.com/nkallen/nodegit/blob/codegen/example/general.js which is a transliteration of the libigt2 version.

Everything passes jslint etc., unit tests pass, etc. It's worthwhile to start reviewing code now. All that's left is stress testing, bug fixing, documentation, and incorporating your feedback.

tbranyen commented 11 years ago

No worries man. I pulled your branch recently and noticed all the tests pass and it looks really good. Something we need to figure out is our build process. Right now it's too disjointed and I'd like to either consolidate with either a Makefile or Grunt (personal biased on that one).

tbranyen commented 11 years ago

I noticed that the test in repo.js that asserts the proper error message for /etc/hosts fails in Linux, because it is not prefixed with /private. We should either drop this test or ensure it runs properly on other operating systems.

tbranyen commented 11 years ago

@nkallen How are you feeling about a pull request into a wip branch here so we can review and leave comments?

nkallen commented 11 years ago

Created pull request into WIP.

nkallen commented 11 years ago

I've pushed a few more changes today, dealing with support for treebuilder. This code is messy, but fortunately it's in javascript. I think the API is nice-enough, checkout https://github.com/nodegit/nodegit/blob/wip/example/new-commit.js

tbranyen commented 11 years ago

Awesome man. Any thoughts on clone.h? On Aug 11, 2013 1:12 PM, "Nick Kallen" notifications@github.com wrote:

I've pushed a few more changes today, dealing with support for treebuilder. This code is messy, but fortunately it's in javascript. I think the API is nice-enough, checkout examples/new-commit.js

— Reply to this email directly or view it on GitHubhttps://github.com/nodegit/nodegit/issues/70#issuecomment-22461217 .

nkallen commented 11 years ago

AFAICT, the main complexity of clone.h is handling progress callbacks ("75% done") without threading or memory corruption issues. I estimate worst case it's a week, best case 1 day.

My top priority is to get a new project of mine, gitdb https://github.com/nkallen/gitdb , to a stable release candidate. This is what's motivating any nodegit bug fixes and API changes ATM. I have about a full week of work to do to get it to this point. I'm still on vacation (in Leipzig ATM) so one week of work currently takes me 3 calendar weeks.

After this, though, I will have time to work on clone. But at the moment, I think the WIP is mostly blocked on: making sure it's stable and getting adequate documentation. If you could help with the latter I would appreciate it. We have to document every method. It's not a huge amount of work but we can't create a release candidate without it.

tbranyen commented 11 years ago

Absolutely man. Just started my sabbatical and this is a priority for me. Would like to refresh the site, get better up-to-date documentation, and ensure all reported issues are corrected.

nkallen commented 11 years ago

I implemented remote-fetch and clone in https://github.com/nodegit/nodegit/blob/wip/example/clone.js and https://github.com/nodegit/nodegit/blob/wip/example/fetch.js

It works AFAICT but -- I've not yet implemented the progress callbacks. These are the hard part. In node you are supposed to use uv_async_send to do stuff like this, but there are very few examples of how to use it http://nikhilm.github.io/uvbook/threads.html . I think I get the general idea but we'll see. It will take a couple of days.

nkallen commented 11 years ago

Alright -- I've started working on documentation. Some preliminary stuff is up here: http://www.nodegit.org/nodegit/

I hope to get the documentation fairly polished this week. I'd like to merge WIP into master after this. AFAICT, the code is of equal or greater stability than the existing master, and with documentation we're ready for the next step, which is getting broader user feedback.

tbranyen commented 11 years ago

Hey @nkallen I have a few items on my plate before we merge as well. I need to finish the installer rewrite (about 50% there). This will include Windows support. I'll worry about the binary distribution after we merge into master since that will be more work.

I also need to integrate the new logo and do some housekeeping with the examples.

tbranyen commented 11 years ago

Great job on the documentation by the way!

nkallen commented 11 years ago

Sounds good.. but let's keep the blocking tasks to a minimum -- the sooner we merge though the sooner I can fix the most important bugs. I'd rather do them while I have some momentum with this codebase. We can always add more stuff later.

nkallen commented 11 years ago

Hi Tim,

I was thinking that it might be beneficial for the community to merge the nodegit and node-gitteh projects. I know there are some differences in design philosophy, but the codegen provides a new opportunity to think through whether a compromise can be found. In particular I noticed that the maintainer of gitteh is stepping down: https://github.com/libgit2/node-gitteh/issues/68 -- this might be a good opportunity to think this through. I'm chiming in on the discussion shortly.

faceleg commented 11 years ago

:+1 so long as we don't take their name.

tbranyen commented 11 years ago

@nkallen we were asked to join the libgit2 community about two years ago when things weren't all that stable, so I declined. I think after merging the wip branch and putting a little polish we should definitely move to the libgit2 organization. We don't really need this one.

nkallen commented 11 years ago

@tbranyen what do you think of merging to master and improving the install script in another branch? i already have people using wip and it seems to be fairly stable.

tbranyen commented 11 years ago

Sounds good to me. I've got it pretty much working in Windows; hopefully I'll have a pull request at the end of the week.

nkallen commented 11 years ago

Merged to master. Will hold off on publishing a new package till your merge.