mitsuhiko / similar

A high level diffing library for rust based on diffs
https://insta.rs/similar
Apache License 2.0
949 stars 31 forks source link

Mapping diff changes to a stream #26

Closed kirawi closed 3 years ago

kirawi commented 3 years ago

I'm trying to integrate similar into https://github.com/helix-editor/helix/pull/228, however I've been having quite a lot of difficulty in mapping at which index in the &str the Change should be applied. I'm trying to use Change::old_index() and Change::new_index() as it sounds like that's what would help me, but it keeps panicking because it unwraps a None value.

The text is a ropey::Rope, and I'm iterating over it with ropey::iter::Chunks, and then decoding it to a user-selected encoding through encoding_rs, though it is UTF-8 the entire way through.

/// (from, to, replacement)
pub type Change = (usize, usize, Option<Tendril>); // https://docs.rs/tendril/0.4.2/tendril/struct.Tendril.html

let iter = self.text.chunks(); // ropey::Rope::chunks()
let iter_len = iter.clone().count();
let mut decoder = encoding.new_decoder(); // encoding_rs::Encoding::new_decoder()
let mut changes: Vec<Change> = Vec::new();

for (i, chunk) in iter.enumerate() {
    // Check if this is the last element in the iterator.
    let is_last = i == iter_len - 1;
    let capacity = Self::calculate_decode_capacity(&mut decoder, chunk.as_bytes());
    let mut buf = String::with_capacity(capacity);
    let mut total_read = 0;

    // Loop until the entire chunk has been decoded into `buf`.
    loop {
        let (result, read, ..) =
            decoder.decode_to_string(chunk[total_read..].as_bytes(), &mut buf, is_last);
        // Track how many bytes we have read so far, in case we need to allocate more
        // capacity to `buf`.
        total_read += read;

        // Check if we need to allocate more capacity to `buf`, otherwise append
        // to `changes`.
        match result {
            encoding_rs::CoderResult::InputEmpty => {
                debug_assert_eq!(total_read, chunk.len());
                let diff = similar::TextDiff::from_unicode_words(chunk, &buf);
                let diff_ops = diff.ops();
                let diff_changes = diff_ops
                    .iter()
                    .flat_map(|x| diff.iter_changes(x))
                    .filter_map(|x| {
                        let index = x.old_index().unwrap_or(x.new_index().unwrap());
                        let value = x.value();

                        match x.tag() {
                            similar::ChangeTag::Delete => {
                                Some((index, index + value.chars().count(), None))
                            }
                            similar::ChangeTag::Insert => {
                                Some((index, index + value.chars().count(), Some(value.into())))
                            }
                            similar::ChangeTag::Equal => None,
                        }
                    });
                changes.extend(diff_changes);

                break;
            }
            encoding_rs::CoderResult::OutputFull => {
                debug_assert!(buf.len() > total_read);
                let needed_capacity =
                    Self::calculate_decode_capacity(&mut decoder, chunk[total_read..].as_bytes());
                buf.reserve(needed_capacity);
            }
        }
    }

    if is_last {
        break;
    }
}

The code is logically incorrect, such as not keeping the index relative to the overall ropey::Rope rather than the Chunk, but I don't think it matters in regards to the unwrapping problem.

kirawi commented 3 years ago

Nevermind, I figured it out. I'll explain it in the code for that PR through // comments if anyone is interested.