ymattw / ydiff

View colored, incremental diff in workspace or from stdin, side by side and auto paged.
Other
877 stars 62 forks source link

Highlight just changed characters, not lines. #104

Closed ElectricRCAircraftGuy closed 1 week ago

ElectricRCAircraftGuy commented 3 years ago

Please just highlight changed characters, not lines. This can be done by parsing the output of:

git diff --word-diff-regex=.

See here: https://stackoverflow.com/questions/3231759/how-can-i-visualize-per-character-differences-in-a-unified-diff-file/7870727#7870727

You may also need:

git diff --word-diff-regex='[^[:space:]]|([[:alnum:]]|UTF_8_GUARD)+'

This is how meld gets their char-by-char highlighting information I'm pretty sure, as I just did a cross-comparison and the output from either of those two commands above seems to match nearly perfectly.

If you add char-by-char highlighting, your tool would be much closer to meld and much more functional and useful.

In case someone else is able to add this support, please also drop in some notes here to orient us to your code so we would know where these parsing changes would need to be added.

Also, can you please explain your tool?:

  1. When is text colored, and what colors mean what?
    1. Where is this in your code?
  2. When is text highlighted, and what colors mean what?
    1. Where is this in your code?
  3. When is text underlined, and what means what?
    1. Where is this in your code?

I see some fantastic opportunities for beautification here. Again, I really think this could be made to look very similar to meld, which would be awesome!

ymattw commented 1 week ago

difflib may treats two texts full line changes or partial changes, one example is import foo vs import bar will be treated full line change instead of only foo changed to bar. I also noticed the undesired behavior, which confused me for a long time. Didn't want to dig it out until recently I found the hardcoded "similarity ratio" threshold 0.74999.

I did some experiments, and IMHO highlight every changed characters can make the result too verbose to read. I've decided to only highlight at word level - similar to git diff --word-diff but words are split with camel cased and snaked cased variable names taken into account. I don't think users every need to customize that rule.

Check out the latest version! See also details in split_to_words().