Open colin-kiegel opened 7 years ago
Sounds reasonable, but I won't have time to work on this in the near future. Happy to take + review PRs. :)
Is this the reason why I get edit distance 2 on a="9"
, b="99"
when split is " "
or "\n"
?
version 2.0.0
#[test]
fn test_diff() {
init();
let a = "9";
let b = "99";
assert_diff!(a, b, " ", 0);
}
If the edit distance is based on LCS, this should be 1 the same as split=""
'assertion failed: edit distance between "9" and "99" is 2 and not 0, see diffset above'
difference
can currently split via a single character, where" "
is suggested to achieve word-level splits.However other whitespaces, like tabs would not lead to word-splits. And punctuation does also not lead to a word-split.
Example:
Ok("hey")
andOk("hey ho!")
will be split into words like thisOk("hey")
Ok("hey
+ho!")
Therefore
difference
will treat these strings as completely different, since no word is identical.Suggestion:
Ok
+(
+"
+hey
+"
+)
Ok
+(
+"
+hey
+ho
+!
+)
So, difference would be able to detect some overlap in the given example and only treat
ho!
as an insertion! :-)The
difference
crate could also export a reasonable default regexp to split words. However what you want will most likely very much depend on the context.This suggestion is based on
git diff --word-diff-regex=<regex>