martinvonz / jj

A Git-compatible VCS that is both simple and powerful
https://martinvonz.github.io/jj/
Apache License 2.0
7.24k stars 240 forks source link

lib: add support for `.mailmap` files #3951

Open emilazy opened 1 week ago

emilazy commented 1 week ago

Explanation, motivation, backstory, and design decisions are in the commit message, and I’ve included comprehensive tests and hopefully‐adequate documentation.

Pulling in the people who actively discussed this on Discord for review, hope that’s okay: @nasamuffin @arxanas @PhilipMetzger

Some things I’m unsure about and could use feedback on:

Checklist

If applicable:

emilazy commented 1 week ago

Looks like these ended up being immediately relevant to https://github.com/martinvonz/jj/actions/runs/9644975580/job/26598226134?pr=3951:

That said, this is not exclusive to the Git backend. The .mailmap name and format is perfectly generic, already shared between Git and Mercurial, and applies to all systems that bake names and emails into commits, including the current local backend. The code uses Gitoxide, but only as a convenient implementation of the file format; in a hypothetical world where the Git backend was removed without Jujutsu changing its notion of commit signatures, gix-mailmap could be used standalone, or replaced with a bespoke implementation.

I added pub(crate) to jj_lib::git_backend: signature_{from,to}_git, as they’re used by the mailmap module to interface with gix-mailmap. Since this isn’t a feature specific to the Git backend, it feels wrong to have that dependency, but I’m not sure where would be good to move them to. A new jj_lib::git_util module, perhaps?

I can add a separate direct gix_mailmap dependency, rather than adding it to the main gix dependency, to signal that it’s not being depended on as part of the Git backend and make the check happy. Suggestions for where to move the signature conversion functions would be appreciated though!

emilazy commented 1 week ago

I factored out the conversion functions into a new private jj_lib::signature_util module (bikeshedding on the name etc. welcome), and hopefully the tests will run now. As I explained in the commit message this isn’t specific to the Git backend (gix-mailmap is just an implementation detail), so nothing there is conditional.

emilazy commented 1 week ago

For what it’s worth, the best alternative DSL API I came up with was commit.{author,committer}([canonical=true]) and {author,committer}{pattern, [canonical=true]), but that required adding support for keyword arguments to the template DSL and didn’t seem obviously better than what I ended up with currently.

emilazy commented 1 week ago

(Gah, messed up when splitting up commits; force‐pushed again.)

emilazy commented 1 week ago

Okay, I’ve dropped the canonicalizing functions from Commit and added corresponding convenience helpers on the Mailmap type instead, along with explanatory documentation on the non‐canonicalizing Commit methods, and pointers for obtaining a Mailmap. That seems better‐factored and should probably be sufficient to nudge tooling developers onto the right path. Thank you both for the feedback; hopefully this is better from a code organization POV.

I also separated out the signature conversion, and renamed get_wc_commit_mailmap to get_current_mailmap, which I dislike a lot less. It turns out that Git actually does support empty email addresses, and that real repositories (e.g. Nixpkgs) have such commits in them; it would be a practical use of .mailmap to correct these, e.g. <real@email.example> Author Name <>, so rather than bailing early on empty signatures I just convert them to Gitoxide’s types as‐is. It’s fewer Git‐specific details in backend‐agnostic code, less churn in the rest of the codebase, and overall better behaviour, so it turned out nicely.

I’ve currently left the {author,committer}{,_raw} revset and template functions rather than moving them to a global option. Git has e.g. %an/%aN pretty format pairs which are analogous to this; I assume they only have the option because it historically used to be opt‐in and they don’t have a real query language. It feels more flexible and declarative to encode the intent in the function names rather than having their semantics depend on more global state. But if people feel strongly about just having a global option to disable it I’m fine implementing that instead.

emilazy commented 1 week ago

Thank you for taking a look!

GitHub won't let me comment on the commit message :) but in my opinion, you could say a little more - that the mailmap will introduce a concept of pre- and post-mapped identities. Right now, it's implied here.

That’s not something I hear often :)

I think this is a good idea and I’ve added another few paragraphs (after my editor ate the first draft…); I’ve called them raw signatures and canonical signatures, as that’s the best terminology I can think of currently and it matches the code and comments. I’ve also tweaked a few doc comments to allude to these concept more explicitly, although there isn’t an in‐depth exposition in the code currently. Possibly this should be recorded more permanently in developer documentation somewhere but hopefully that can be left to a follow‐up PR – the design docs section, perhaps? Although I wouldn’t exactly say that this is a stellar piece of de novo design to flaunt, just a pragmatic stop‐gap for arguably‐regrettable decisions the majority of DVCSes have made up to this point…

Are committer and author the only types of identities that are here? What happens to, say, commit message footers like Signed-off-by: (which are mailmapped in Git, IIRC)? Is that handled differently?

As far as I can tell authors and committers are the only places Jujutsu gets and processes signatures from currently, and there is no special support for trailers. It seems like git shortlog --group=… canonicalizes identity information in trailers, but git log doesn’t on display; a strange state of affairs that speaks to the unfortunate whack‐a‐mole you talked about and that I’m trying to avoid repeating. I guess it makes some sense with the nature of trailers as an ad‐hoc interpretation layer on top of unstructured commit messages, which I also think is regrettable. Hopefully Jujutsu can at least do better there, even if it has to lower whatever additional metadata it ends up recording down to trailers for now for Git compatibility! I’ve discussed this in the commit message too as another example of where the raw–canonical signature distinction and guidelines for which to use would be relevant.