Closed danieljl closed 2 years ago
That's intentional or rather that's just how the system works, but it can be annoying. The solution to this is using the TextDiffRemapper
:
use similar::TextDiff;
use similar::utils::TextDiffRemapper;
fn main() {
let old = "";
let new = "á";
let diff = TextDiff::from_chars(old, new);
let remapper = TextDiffRemapper::from_text_diff(&diff, old, new);
let changes: Vec<_> = diff.ops()
.iter()
.flat_map(move |x| remapper.iter_slices(x))
.collect();
dbg!(changes);
}
Thanks for the pointer!
Hi,
First, thank you for this great crate.
I expected the length field of
DiffOp
to be the same asstr::len
, i.e. the length of the resulting bytes if the text is encoded in UTF-8. They turned out to be different. The former is instead the same as the number of Unicode scalar values (~ code points). Is this a bug or expected? If it's expected, is there a way to get the "bytes-length" from aDiffOp
?Minimal working example:
The code above will output:
Tested on v2.1.0 and main branch (236a299ff01b8d4bdfc95c6439c1302c8422ae13).