ipfs / go-ipld-git

ipld handlers for git objects
MIT License
57 stars 19 forks source link

Update go-ipld-git to a go-ipld-prime codec #46

Closed willscott closed 3 years ago

willscott commented 3 years ago

The existing go-ipld-git codec allows for translation between git file data, and go-ipld-format node representations of the IPLD dag. This work attempts to instead provide a go-ipld-prime representation of that dag.

fix #45

rvagg commented 3 years ago

@willscott @mvdan this codec changes all of the struct fields to camel-case, object -> Object, etc. Would you mind if we change it back for consistency with the old codec and the JS one? It's also closer to the native format's idea of naming as well. I can make the change here if you're both OK with that.

rvagg commented 3 years ago

Some additional thoughts here where the differences in forms can be seen from a pathing perspective: https://github.com/ipfs/go-ipfs/pull/8287

If you're OK with it, I'd like to try and get the forms here back to the original.

mvdan commented 3 years ago

That sounds fine to me - I'm not sure if ipld-prime codegen expects capitalized names, for the sake of exported names in the generated Go code.

willscott commented 3 years ago

I also have no problem with changing name case back for consistency

rvagg commented 3 years ago

0436bc7c4 now has the field names matching the original (lower-case, and some other minor adjustments). tagType is unfortunate since we can't use type in the codegen. Tree is now a map too so you can look up by directory name as per the original codec, which lets us path the same way back in the sharness test.

I think we need to do something about this strict need to match Go conventions on these names since it's not always going to match what you want your data model form to look like.

mvdan commented 3 years ago

Out of curiosity, would it be enough to add "repr" mapping of field names so that the repr layer uses lowercase names?

Aside from that, I wonder if it's a bit weird for the schema to have the field names be lowercase, but the type names be uppercase.

rvagg commented 3 years ago

Out of curiosity, would it be enough to add "repr" mapping of field names so that the repr layer uses lowercase names?

Sadly, no, because it's pathing at the data model layer that we want to care about, not encoding. Having renames is only going to help if we cared about these objects going into a standard codec (dag-cbor, dag-json), otherwise they're kind of useless here.

rvagg commented 3 years ago

I wonder if it's a bit weird for the schema to have the field names be lowercase, but the type names be uppercase.

This seems fine to me, it actually looks pretty reasonable to my eyes (eyes that have been trained on C-family-style and aren't as accustomed to the upper-case first character rule, granted). Have a look at the schemas that Hannah wrote up in https://github.com/ipfs/go-unixfsnode/blob/main/data/gen/main.go (although the schema field names don't match what's being generated, but you can see the same thing going on there).

mvdan commented 3 years ago

SGTM, thanks for confirming on both accounts. I've replied to https://github.com/ipld/go-ipld-prime/issues/206 :)

rvagg commented 3 years ago

@mvdan @willscott I've brought this up to what I think is a mergable state:

and a few other minor things.

The main difference that I'm in two minds about is date handling - we're pulling out date and timezone and keeping them faithfully in their original format so round-trips work properly. The original codec presented them as a nice stringified date but with the original values kept internally. An approach I've toyed with in the Bitcoin codec (in JS at least and I was thinking of doing the same in Go) was to do both - keep originals and present the nice usable forms. It just raises questions about what happens when/if they get out of sync. Two possible approaches would be: (1) error if they're out of sync when serializing, or (2) ignore the usable forms entirely when serializing.

For now I think it'd be fine to just leave this as is and leave it to the user to reform the stringified version if they want it, we could add back in a utility function to do that if needed, or evolve this code further in the future.

IMO this can be merged now.

willscott commented 3 years ago

Making sure we've got all the loose ends covered here:

aschmahmann commented 3 years ago

Yes, and a review for other breaking changes. mergeTag seems like a breaking change that seems like something we can just restore. I'm not sure if there are others at the moment.

Posting what the release notes for this release would look like would also be great since as soon as this is merged we're going to cut a release anyway and that release is going to need release notes 😄.

rvagg commented 3 years ago

Have fixed up tagType to type thanks to Eric's hint, and mergeTag is now mergetag, plus a few more readability and clarity tweaks.

Original code has a MergeTag as well as a Tag with the main difference that one has a text field and the other a message field but otherwise they are the same. This one just has Tag now with message which I think is more in line with the git model. That'll be a breaking change from a DAG traversal perspective so we should note it in the release notes.

Will put together some basic release notes next week.

rvagg commented 3 years ago

Release notes:


BREAKING CHANGES

warpfork commented 3 years ago

Might I suggest we drop that change info into a CHANGELOG.md file right in the diff? PRs disappear from memory quickly and aren't easily indexed over.

(I'm pretty happy with the loose format in go-ipld-prime/CHANGELOG.md#released-changes, if having an outline to riff from greases the wheels. :))

warpfork commented 3 years ago

(Informational note, no action required: I found myself nerdsniped by mergetag. Yes, that's a thing. Yes, keeping it alllowercase is correct. Docs here: http://github.com/git/git/blob/master/Documentation/technical/signature-format.txt .)

rvagg commented 3 years ago

Have added a changelog and fixed the readme bork.

Normally I don't like changelogs because it's too easy for them to get out of sync, but recently I've adopted the release-on-every-merge practice in JS and it includes both tag + github release + changelog updates and all I need to do is make sure my commit messages are informative and standardised, e.g. https://github.com/rvagg/cborg/blob/master/CHANGELOG.md & https://github.com/rvagg/cborg/releases & https://www.npmjs.com/package/cborg all from pressing the merge button. Can recommend, although Go folks have other things to worry about in the release process I suppose.

warpfork commented 3 years ago

Sweet. I think this is officially ready for the mergeparty (later this week? early next week?) where we land a bunch of these similar PRs in other repos at once.

rvagg commented 3 years ago

I have created a draft release & tag for this, v0.1.0: https://github.com/ipfs/go-ipld-git/releases

I'm working on some assumptions here about Go (and PL) release practices that may not be quite correct. So someone correct me in the next day or two before I press merge and create the release & tag. I would prefer to use semver properly and do a v1.0.0 but I don't think we're in the habit of doing that with Go so I'm assuming v0.1.0 is appropriate, a tag is sufficient and a GitHub release is a bonus.