mitsuhiko / similar

A high level diffing library for rust based on diffs
https://insta.rs/similar
Apache License 2.0
949 stars 31 forks source link

Support for `Bytes` from `bytes` crate #45

Open xfbs opened 1 year ago

xfbs commented 1 year ago

Hey all,

Awesome crate! I'm using this in diff.rs to diff crates in the browser.

I'm using the Bytes crate to keep crate files in memory because it lets me keep stuff around in a reference-counted way and it also lets me create views into that. I have to use a bit of an ugly hack right now to make this work with similar because you guys currently only support byte slices &[u8] , but I want to turn these back into Bytes structs.

Looking at the code, it should be fairly simple to add native support for Bytes into the crate by implementing the DiffableStr trait for Bytes, perhaps as a feature flag.

Is this something you guys would be interested in? If so, I might implement it and create a merge request.

Cheers, Patrick!

mitsuhiko commented 1 year ago

Very happy to accept this behind a feature flag.

xfbs commented 1 year ago

Awesome to hear!

So, I looked into it a bit. Looks like it is not quite as simple as I had thought, for one simple reason: the DiffableStr trait returns references to the structure that it is implemented on. That makes sense for str and [u8], since those would just be slices (&str and &[u8]). But for Bytes, this does not make sense: we want these to return Bytes as well.

pub trait DiffableStr: Hash + PartialEq + PartialOrd + Ord + Eq + ToOwned {
   fn tokenize_lines(&self) -> Vec<&Self>;
   ....

I think to make this work, there's two options:

  1. Introduce a type Output = ... for the DiffableStr. So for example the Output for be &[u8] for [u8]:

    impl DiffableStr for [u8] {
        type Output = &[u8];
        fn tokenize_lines(&self) -> Vec<Self::Output>;
    }

    and &str for str:

    impl DiffableStr for str {
        type Output = &str;
        fn tokenize_lines(&self) -> Vec<Self::Output>;
    }

    and Bytes for Bytes:

    impl DiffableStr for Bytes {
        type Output = Bytes;
        fn tokenize_lines(&self) -> Vec<Self::Output>;
    }
  2. Change the trait to return Self (not a reference, but the Self type) and implement it for &str, &[u8] and Bytes instead of for [u8] and str and Bytes. I'm not sure if this will work the way I thin it will.

Either of these would be breaking changes (changing a public trait). Do you think getting this working is important enough to warrant making a breaking change to the API?

mitsuhiko commented 1 year ago

I think changing the traits is fine. I think the odds that someone implements them is not very high. I just worry that you might run into limitations actually introducing a different output type in practice because you will need GATs and similar supports rust versions that don't have them yet. Potentially there is a way to make this work by going through a layer of indirection like implementing it for &str instead.

For what it's worth I already ran into similar issues before which is why there is DiffableStr and DiffableStrRef. That's how for instance String turns into &str.