kira7 / google-diff-match-patch

Automatically exported from code.google.com/p/google-diff-match-patch
Apache License 2.0
0 stars 0 forks source link

Uppercase letter boundary in semantic cleanup #72

Open GoogleCodeExporter opened 8 years ago

GoogleCodeExporter commented 8 years ago
Adding an "uppercase letter boundary" to semantic cleanup score will make code 
diffs look nicer.

For example,
cleanupS<ins>emanticS</ins>core
would be changed to
cleanup<ins>Semantic</ins>Score

This might be useful in regular texts too (e.g. with people's names - 
M[cM]illan => [Mc]Millan)

The change seems to be easy and straightforward:

private int diff_cleanupSemanticScore(String one, String two) {
...
  boolean uppercase1 = Character.isUpperCase(char1);
  boolean uppercase2 = Character.isUpperCase(char2);
...
  else if (!uppercase1 && uppercase2) {
    // One point for upper case.
    return 1;
  }
  return 0;
}

Original issue reported on code.google.com by 2sa...@gmail.com on 13 Jun 2012 at 5:00

GoogleCodeExporter commented 8 years ago
(just realized that McMillan example is not valid - the shift will apply due to 
word boundary anyway; but I believe you got the idea..)

Original comment by 2sa...@gmail.com on 13 Jun 2012 at 5:02