FR: `jj blame`/`jj point-finger` a blame layer.

PhilipMetzger commented 7 months ago

Is your feature request related to a problem? Please describe. Jujutsu currently lacks a blame layer which determines who was responsible for which line of code. This is commonly called a blame layer.

Describe the solution you'd like Add a jj blame command and a trait as integration point for large repos, as it may be very expensive to calculate each line for a file which was changed many times.

Additional context

Maybe choosing a approach like Sapling could be useful, as it has support to blame different types of things like commit metadata, the whole file or a incomplete file.

ilyagr commented 7 months ago

blame might be related to absorb (#170, #64).

This thought might be pure distraction, but it'd be nice to not focus on lines here, so that we could later use difftastic-like diff to attribute syntactic hunks.

PhilipMetzger commented 7 months ago

This thought might be pure distraction, but it'dbe nice to not focus on lines here, so that we could later use difftastic-like diff to attribute syntactic hunks.

No, that's actually a great point. If we want to support difftastic-like diffs, having them available in blame would be very cool. This is actually very related to trait I proposed, as Code Review tools may want to use this functionality too.

joyously commented 7 months ago

FYI: Pijul chose to call it credit instead. There is also annotate.

ilyagr commented 7 months ago

FYI: Pijul chose to call it credit instead. There is also annotate.

Reminds me of this song: https://youtu.be/UQHaGhC7C2E?si=n20KACC6SZmK480N&t=235

It's not the best Tom Lehrer song, but it might be worth a snicker depending on the mood.

Ideally, jj blame should only show the lines with bugs.

PhilipMetzger commented 7 months ago

I'm totally onboard with calling it jj annotate or jj credit as long as the blame alias exists, so everyone will be happy.

thoughtpolice commented 7 months ago

Ideally, jj blame should only show the lines with bugs.

It would be funny if we called it jj credit, and then make jj blame different and claim it "Only shows lines of code that have bugs" and then make it only show your own commits (aka mine()). I'm not saying we should do this, just that it would be pretty funny.

steveklabnik commented 7 months ago

I have often said "git blame is an alias for whoami"...

epage commented 7 months ago

So not familiar with git absorb enough to know how it is related or how the difftastic stuff would work but thought I'd chime in with some "blame" UI ideas I've been playing with.

A problem I have with git is it shows too much metadata by default, making it harder to browse

Who made the change doesn't matter as much as the change itself. From there, I can find out who made it
Knowing the time is good but it is hard to tell the time relation between two lines to know which happened first
I'm not going to remember a commit hash for doing things off of; it is only good for copy and paste which slows me down

In git-dive, I only show a reference relative to the reference you specified on the command-line. I'm just now installing jj so i don't know all of the syntax for that but in git terms, this means that by default all of the commits show HEAD~1, HEAD~40, etc. If I then git dive HEAD~30 Cargo.toml, it then will increment from there (HEAD~42, etc). This gives me an easy to remember number for then running git log HEAD~42 and an easy-to-mentally-parse way of comparing relative time of commits. When working on projects that release from main and use tags, I have considered making them relative to the tags but the question sometimes is "which tag".

As I use a merge-commit workflow on multi-commit PRs, I find having the code only annotate for the merge-commit is both more informative (giving me the context of the logical series its attached to and a quick reference for the PR to open it to see the review) and speeds things up.

I also syntax highlight the code using syntactic.

What I would like to add is a TUI that let's you move back and forth through history, pop up a log viewer for the commit of the currently selected hunk, move into the merge-commits, etc.

As mentioned, I'm just now kicking the tires on jj but if it works out, I'll set aside time to help get this implemented.

screenshot

![image](https://github.com/martinvonz/jj/assets/60961/32dfe1bd-1afa-417c-8f6e-ed9710711ac0)

PhilipMetzger commented 7 months ago

A problem I have with git is it shows too much metadata by default, making it harder to browse

Who made the change doesn't matter as much as the change itself. From there, I can find out who made it

Knowing the time is good but it is hard to tell the time relation between two lines to know which happened first

I'm not going to remember a commit hash for doing things off of; it is only good for copy and paste which slows me down

Yes, that's the goal of having a abstract interface for the UI of jj annotate/jj blame as many different systems interact with the commit metadata. This then allows us to build different layers for each use-case, such as commit metadata only, the structural diff and more. Then each effective backend can compose its functionality with the different blame layers, as maybe your internal codesearch tool only wants the author and the related metadata when it landed.

As I use a merge-commit workflow on multi-commit PRs, I find having the code only annotate for the merge-commit is both more informative (giving me the context of the logical series its attached to and a quick reference for the PR to open it to see the review) and speeds things up.

If you can write a blame layer for your use-case, I'd be happy to accept it.

What I would like to add is a TUI that let's you move back and forth through history, pop up a log viewer for the commit of the currently selected hunk, move into the merge-commits, etc.

We already ship a TUI, so if you miss functionality please move the work upstream (scm-diff-editor).

jyn514 commented 7 months ago

that link gives a 404, is the repo private?

PhilipMetzger commented 7 months ago

that link gives a 404, is the repo private?

Should be good now, I used the name of the binary instead of the repo, which is scm-record, sorry.

ilyagr commented 7 months ago

@epage 's comment of two other TUIs for blame that seem carefully designed: https://jonas.github.io/tig/ and maybe https://magit.vc/ (which is tightly integrated with Emacs and the resto of magit; on second thought I don't quite remember whether the blame interface specifically is notable).

For tig, I feel that it has a lot of good functionality, but the key bindings for blame are confusing.

Update: I just tried using tig for blame, and the keybindings are really confusing. Here they are: https://github.com/jonas/tig/issues/1315. Other than that, it's quite good.

epage commented 7 months ago

My big problem with full-featured git TUIs is that I had a hard time figuring out how to navigate it. I didn't want to learn a whole other tool just to accomplish one job that it might do well.

This was one of the ideas for git dive: design a TUI as a fancier pager for individual commands. This keeps you more in the your regular CLI workflow and then can drop in to specialized TUIs that are focused on doing one job very well and you can interact with it as you need.

And yes, discoverable key bindings are a must.

epage commented 6 months ago

Another thing I view as a "git failing" is that git supports files to ignore revs for blame but there isn't a well-known name for this that git auto-picks up. There is a config setting for the file but git doesn't support a committed config file.

I'd propose either

A well known name to auto-detect
Repo config that is per-user and shared across all users

PhilipMetzger commented 6 months ago

Another thing I view as a "git failing" is that git supports files to ignore revs for blame but there isn't a well-known name for this that git auto-picks up. There is a config setting for the file but git doesn't support a committed config file.

I think your wrong here, as you can see both llvm and chromium check-in their .git-blame-ignore-revs file.

I'd propose either

A well known name to auto-detect

Repo config that is per-user and shared across all users

Both options are good a solution. Ideally, we should just read a file that contains either revisions or a revset.

epage commented 6 months ago

I think your wrong here, as you can see both llvm and chromium check-in their .git-blame-ignore-revs file.

You've observed a convention but that doesn't mean the convention is natively supported by git. I just re-confirmed by checking git blames documentation, git config's documentation, and searching git's repo and nothing is coming up to say that this file is natively supported.

ilyagr commented 6 months ago

I think you need to set a config to use the file. This is, of course, not ideal.

https://git-scm.com/docs/git-blame#Documentation/git-blame.txt---ignore-revs-fileltfilegt

git blame is indeed severely under-documented, it would benefit from a tutorial. I'm sure the Git project would happily accept documentation improvements, but this is not a trivial thing to improve.

OTOH, you could add these tips to git-dive docs. I think that the way blame is usually used is, by nature, an interactive process, so interactive tools are really the way to go here. (Oh, I just realized that git-dive is not quite interactive. It is pretty, though.)

I know three tools now I'd recommend for blame: git-dive, tig, and git gui (which is, unfortunately, not perfect; it goes with gitk which gives some paper cuts). The last two don't focus on blame and don't seem to make any reference to features like blame's --ignore-revs-file, but all of them could.

epage commented 6 months ago

(Oh, I just realized that git-dive is not quite interactive. It is pretty, though.)

Interaction is on the roadmap but I might just instead help out with jj once I can get back to learning it...

Similarly, adopting the convention by auto-detecting the file is on the roadmap.

PhilipMetzger commented 6 months ago

You've observed a convention but that doesn't mean the convention is natively supported by git.

I'm sorry, it was only meant for the "There is a config setting for the file but git doesn't support a committed config file." part of your statement, as it is checked-in. It is sad though that is not a convention.

epage commented 6 months ago

I'm sorry, it was only meant for the "There is a config setting for the file but git doesn't support a committed config file." part of your statement, as it is checked-in. It is sad though that is not a convention.

That is referring to .gitconfig, not .git-blame-ignore-revs. If you could commit a .gitconfig then you can point it to your blame-ignore file and be done. There are several other use cases I've run into where this would be helpful with git.

ilyagr commented 6 months ago

Another interesting tidbit: GitHub supports this convention. But not GitLab, according to https://black.readthedocs.io/en/stable/guides/introducing_black_to_your_project.html.

ilyagr commented 1 month ago

For the name, I still like credit, but I don't like annotate as much. I think it's not a terrible option, but it's a bit vague for my taste.

To throw out a few options out there, if we want a neutral tone, I like attribute, though it's annoyingly also a noun. A thesaurus suggests associate, which might be better. There's also trace which makes sense to me.

epage commented 1 month ago

imo the value of blame is people are used to that name and people will easily find it. That can be solved with an alias.

My concern with any name that tries to do a positive spin on blame is that, at least for me, "who made a change" is secondary to "finding the commit/PR that introduced the change"' (see the description, see the conversation, any linked items, etc). In finding the change, you sometimes need to peel back several layers of irrelevant commits (with hints like relative times to know which ones are worth skipping past).

This is why I went with the name dive for my tool. I think trace can also work.

Some phrases that could help inspire a name include

"find the origin of a change"
"show the context of a change"

ilyagr commented 1 month ago

Interesting point. origin is a nice suggestion. context feels a bit better than annotate to me as well.

I think blaming or crediting a commit might make sense. Certainly, this makes sense for blame: "I blame the weather" makes as much sense as (or more than :) ) "I blame the weatherman!".

I was thinking "attribute the line to a commit", "trace which commit this line comes from", "which commit is each line associated with?". The first of these is how I would explain to a person what the command does, but it's annoying that "attribute" is a noun that could make somebody think of setting an attribute.

yuja commented 1 month ago

@ilyagr any thoughts on whether annotate/blame/credit should be top-level or subcommand of file?

I agree with the following rule, but annotate is file oriented, and can be considered a single tree command with historical metadata. https://github.com/martinvonz/jj/issues/3812#issuecomment-2169019978

About naming, I think annotate/blame is good for discoverability. Git calls it blame, and Mercurial (and CVS iirc) use annotate. If we need to invent new name, I like trace.

ilyagr commented 1 month ago

any thoughts on whether annotate/blame/credit should be top-level or subcommand of file?

I'm not really sure, but it's an interesting question. As you point out, this contradicts the rule I invented in the comment you linked, but it does feel very natural in other ways. Should the rule be modified, perhaps?

I also wonder about discoverability: would we want a top-level alias if the command is under "file"? Would people find it regardless?

scott2000 commented 1 month ago

If the command supports a template to render the information being added to each line, something generic like jj annotate (or jj file annotate) might be more accurate than jj blame or jj credit, since it could be used for other purposes besides finding the author or commit which introduced a change. For instance, you might just be wondering how old a function is, so you might want to annotate with the .ago() relative timestamp only, or you might want to see which lines are present in a previous release, so you might use a template to highlight only lines which are in the corresponding branch/tag.

martinvonz / jj

FR: `jj blame`/`jj point-finger` a blame layer. #2988