Use DiffPlex to produce better diffs

drhumlen commented 4 years ago

Hi! I've long been wishing for better diffs when comparing bigger objects/records or strings. I've searched around for a library to do better string diffing, and found a library called DiffPlex that does just that. I've posted some example screenshot below of the current diff rendering vs DiffPlex's. Imo it becomes a lot easier to see exactly what is changed in bigger object.

I've added some failing tests to the Sample project (not sure if it's the right place or not) to illustrate the new rendering.

This is still work in progress, but I rather show it off now before I put more time into it in case there's some reasons we can't do it this way. (Perfomance, compatability, preference..?). (Note also that I temporarily removed net461 because I couldn't get net461 to run on my machine before my patience ran out. (I'm on MacOS and every installer from microsoft seems to be .exe files))

What do you think?

Rendering examples:

haf commented 4 years ago

@drhumlen Thank you for your suggestion and contribution! I hope you stick around and do more great stuff like this.

I really like the improved outputs, but I don't like taking on new dependencies. I see two alternatives;

porting the algorithm here which is possible under apache 2.0
creating a separate package, so that the diff is not as good by default, and making the diff algorithm configurable by adding a package

drhumlen commented 4 years ago

I really like the improved outputs, but I don't like taking on new dependencies.

Yeah, I understand that. It's always good to keep dependencies to a minimum to avoid being ... well dependent.

I was actually thinking about making the diffing function a user-configurable thing – so you can pass whichever differ function you want to use to Expecto. That way, you as a user can opt in on DiffPlex – or some other diffing library if you want to.

Porting it is also interesting. At first I thought this diffing algorithm was very complex, but it doesn't look too daunting looking at the code you linked. Perhaps just porting it over is the easiest and best result? 🤔

It's possible to do both also. Make the differ a configuration for the expecto user and port over the algorithm as a "better default" (perhaps with the option to use the current differ).

drhumlen commented 4 years ago

It's also possible to add a gutter with linenumber & color coding the change: blue=modified, green=whole line added, red=whole line deleted, yellow="imaginary" – just like many git tools support. Not sure if it's too much with numbers or not 🤔perhaps make it configurable? And of course don't display line numbers when there's only one line.

If this feature is worth having, it also needs to be part of the abstraction. I think it's pretty cool, but perhaps highlighting words will suffice?

haf commented 4 years ago

diffing function a user-configurable thing

This is not really necessary if we make it good by default; sometimes choice is a curse, too, and I've never seen any other testing library that lets the user change the diff function; or even seen anybody asking for that feature in the other libs I've used.

Porting it is also interesting. At first I thought this diffing algorithm was very complex, but it doesn't look too daunting looking at the code you linked. Perhaps just porting it over is the easiest and best result? 🤔

Well, you'd have to port those 400-something LoC, but you could run the same tests that are already written in C# against your implementation, so you wouldn't have to port those. It would probably take an hour or four to port the code. It's probably a fun exercise and it would drastically improve the stability of this code base, compared to taking a dependency.

It's also possible to add a ...

I'll let you decide how it should look for now; what you've done is much better than what we already have.

haf commented 4 years ago

As for building with net461/mono; you first install mono, and then do source .env in the repo. The .env file sets up the path to the .Net 4.6.1 ref asms.

drhumlen commented 4 years ago

Well, you'd have to port those 400-something LoC, but you could run the same tests that are already written in C# against your implementation, so you wouldn't have to port those. It would probably take an hour or four to port the code. It's probably a fun exercise and it would drastically improve the stability of this code base, compared to taking a dependency.

Yea, after reading some more through the library, it seems my initial gut-feeling was more right. There is significant work done by the DiffPlex, and it would be almost a little silly to rewrite all of that in F# – even if parts of it is trivial conversion.

What if we just grabbed the C# code directly and places it inside this repo instead of converting it to F#? I could go ahead and delete those (few) files we don't need to keep it small/simple. No reason to re-invent the wheel..? I guess it's possible to rewrite it in F#, but it's tedious work...

It's either that, or making improved diffs an addon with its own repo. But I agree with you, maybe that's making it too flexible, and also a bit harder for a newcomer to have to add several packages just to get a nice diff.

haf commented 4 years ago

What if we just grabbed the C# code directly and places it inside this repo instead of converting it to F#? I could go ahead and delete those (few) files we don't need to keep it small/simple. No reason to re-invent the wheel..? I guess it's possible to rewrite it in F#, but it's tedious work...

Sure, let's go for this then; but then it would be a dependency named Expecto.Plugins.Diffs (or the like), and we have to design a way such that it gets used if it's available. So C# code in this repo; a copy (hi there @mmanela), which we can update from upsteam when needed. Including tests, which also should run as part of the CI pipeline. And no direct dependency on Diffs from the main Expecto; but only a replacement mechanism there. How does this sound?

mmanela commented 4 years ago

Hi there, I am curious why the hesitancy of taking one dependency? You are free to copy the code, but it is often nicer for the community to take a lib dependency and the contribute changes and improvements back as you find things to enhance.

haf commented 4 years ago

@mmanela I'm sure we'll repatriate code if we find improvements.

Why not a dependency:

I've taken "one dependency" before in Suave's source code, on FsPickler, believing it would be semver and schema compatible in minor releases, but it turned out it was not. The author of FsPickler thought this was fine, and we had to do-it-ourselves.

The exact same happened for Argu in this lib, which caused us to build our own argument parser in this library. Logging; we're importing my Logary Facade lib, so that we can freeze its API, and Logary itself has a large test suite to avoid ABI incompatibilities and make sure Logary.Facade.Adapter doesn't go out of sync. There's an explicit version field in the source code, too.

In this case, looking at the list of releases https://www.nuget.org/packages/DiffPlex/ there's not a single major version bump, which means that you've either kept the ABI/API breaking changes to zero the last 8 years, or there have been breaking changes that haven't been versioned as such. I haven't analysed the repository though.

Further, as this project is integrated into FAKE, Rider and VSCode, I think they deserve a stable API.

Taking a dependency also means we'll have to #r "DiffPlex" as well as #r "Expecto" in any and all script files that use Expecto; this is why there's e.g. Expecto.Hopac ; just so that we can have an abstract core that open for extension and closed for modification.

@drhumlen Thinking like above, perhaps we'll just have a 'sattelite' assembly and reference DiffPlex from it, like we do with Hopac; however, we'd still need the simple integration to happen once it's referenced.

mmanela commented 4 years ago

I understand and at least for diffplex, you are right I have kept the api very stable since there is not much need to change it

drhumlen commented 4 years ago

@drhumlen Thinking like above, perhaps we'll just have a 'sattelite' assembly and reference DiffPlex from it, like we do with Hopac; however, we'd still need the simple integration to happen once it's referenced.

@haf: Yeah, I'm trying to figure out the best way to do it now. On the top of my head, we could introduce a new function in Expect.fs:

let equal' diffPrinter (actual : 'a) (expected : 'a) message =
  ...
   if actual <> expected then
      failtestf "%s.\n%s" message (diffPrinter actual expected)

let equal = equal' printVerses  // <- the current behaviour/diff printer

and then let users import

open Expecto
open Expecto.DiffPlexDiffs

let expectEqual = Expect.equal' betterDiffPrinter

and then go on with their tests:

test "This is my test as an Expecto user" {
  expectEqual (1 + 1) 2 "One plus one is two"
}

instead of the usual Expect.equal (1+1) 2 "One plus one is two"

...

But that doesn't feel quite optimal if the rest of your tests starts with Expect._____

Alternatively, we could make Expect more of a class

type Expecto(diffPrinter) =
  member __.equal actual expected message =
    ....
  member __.isNone ......

and then let users do

open Expecto open Expecto.DiffPlexDiffs

let Expect = Expecto(betterDiffPrinter)

test "This is my test" { Expect.equal (1+1) 2 "Business as usual here" }

That also opens up rooms for inserting more user-configurable things into the Expect-module in the future.

But then again, I do like how the current Expect-module works without the need for instantiating a class. Hmm...

drhumlen commented 4 years ago

A bit off-topic: When writing my own tests, I find that Expect.equal is the function is use by far the most. I also rather use more time to describe the test-case well instead of using the message-field – it's almost always redundant. I rather spend more time perfecting the name of helper functions and local values to make the test understandable.

...Which eventually led to me introduce a custom operator for assertions in my own test code base:

let (==>) actual expected = Expect.equal actual expected "" // <- I don't need this message field

Then I can write my tests simply like so:

test "1+1 should be 2" {
   1 +1 ==> 2
}

test "When a user is created, it appears in the list of users with status Created." {
  let event: UserCreated = {Name = "Peter"}

  applyEvent [] event
  ==> [ {Name = "Peter"; Status = Created} ]
}

(We're doing event sourcing, so we have a lot of tests for our reducers (event appliers) & command handlers.)

I find this way of writing asserts lets me drop a lot of parentheses, while also being very visually guiding with the "arrow syntax". I initially had many tests written very nestedly with Expect.equal (......) (..........) "", but I've slowly rewritten many tests to use the arrow and found that they become a more readable.

The reasons I'm mentioning it is because I think the Expect.equal function is by far the most useful one. You could do almost all the others things with it: user.IsVerified ==> true instead of Expect.isTrue user.IsVerified "user must be verified".

I'm not sure if it's wise to pollute the namespace with a lot of symbolic operators, but perhaps one is not too unreasonable – for example ==> or some other suggestions if you have one? Maybe put it inside a module for symbols so it doesn't appear in scope when open Expecto, but enforce open Expecto.OperatorSyntax?

Just thougth I'd put it out there. Maybe this belongs in its own thread though.

drhumlen commented 4 years ago

On-topic again: the easiest would be to just depend on DiffPlex (like i've done initially) and not bother with all the hassle of either porting code over to F# or confusing the end user with a plugin-like system.

...And if DiffPlex's API turns out to be unstable, we can always just go back to the old differ, or copy over the code that worked into this repo. Also, as long as we depend on a specific version of DiffPlex, I don't see why it should ever stop working? Also, @mmanela himself confirms that DiffPlex's api has been very stable and will probably continue to be since there's not much need to change it (ever)?

drhumlen commented 4 years ago

Ok, I just went ahead trying out the Expecto.Hopac approach – making a seperate module and keeping Expecto (core) depedency-free.

I'm new to paket, and had to do a lot of wrestling with my IDE and Paket along the way, so some of the Paket-code/dependency tree maybe unnecessary/wrong. Please give feedback if there's something strange there.

Since the Colorization code is an internal/private module for Expecto, I have to pass a colorization function to the Expecto.Differ module's function – which looks very strange, but is is necessary to keep the color scheming consistent I think?

I'm still not quite done yet. String diffing doesn't use the provided differ function in every case, so that needs to be straightened out. But I'm awaiting feedback from you for now in case you've changed your mind on the dependency situation, or have a better solution :)

haf commented 4 years ago

BTW; DVar https://github.com/haf/DVar/blob/master/src/DVar.Tests/Program.fs — it's also present in Logary (and its Facade in this repo)

haf commented 4 years ago

I'm new to paket, and had to do a lot of wrestling with my IDE and Paket along the way, so some of the Paket-code/dependency tree maybe unnecessary/wrong. Please give feedback if there's something strange there.

Looks fine to me!

drhumlen commented 4 years ago

Thanks for the feedback. I'll refactor my code. It was never meant to be in a finished state; I just needed some feedback on the general direction before putting too much time into it.

I'm still the most concerned with: 1) How the user will "plugin" this and have it work without having to do anything weird. 2) How to pass the colourisation function to the new plugin. The way I've done it now works, but it's not too pretty 🤔 DVar is new to me, but looks interesting

drhumlen commented 4 years ago

Here I use a mutable variable to set the defaultDiffPrinter. I'm usually against mutable variables, but this gives us the un-intrusive plugin experience with only a single small mutable variable. Only catch is the user have to remember to set the defaultDiffPrinter before the tests run. See the commit above^

drhumlen commented 4 years ago

...Or, I guess the Expecto.Diff could just have the side-effect when running. But that's really dark magic: An import that modifies the program..?

drhumlen commented 4 years ago

@haf : Here's a big object diff

drhumlen commented 4 years ago

Feel free to checkout the branch and do the necessary steps to make this become mergeable. I'm not sure how to use DVar, or check if a module is available/loadable like you described. But I think the mutable variable does a pretty good job; it's just a hassle to remember to set it before running one's tests.

drhumlen commented 4 years ago

@haf : What will it take to get this merged now? The only thing that I feel is a bit "unsolved" now is the use of a variable to switch the differ function. It's possible to use a ref too, but that's essentially the same. Do you have any opinion on this? Is it acceptable?

Regarding tree-walking records, I think that's not really needed when you have a really good diff. You immediately see what the diff is, and the whole diff, so ... is it needed? Tree-walking also only works for records afaik, so if you have an seq of records, or a record that contains a seq, then it won't help you anyway.

If I could wish for one thing (from F# in general?), it would be better pretty-printing. The default, %A, could've been so much nicer without the bloated semicolons and sometimes weird array indentation.

jzabroski commented 4 years ago

This is pretty sweet and I'm going to steal this for my Xunit asserts.

haf commented 4 years ago

@drhumlen No there's just the coordination of the config; we can merge but then I lock down master for releases until we have a nicer public reconfiguration API... Are you sure you don't want to take a stab at solving the DVar integration? :) There are examples of its usages in Logging.fs

haf commented 4 years ago

🙌 @drhumlen And thank you very much for the contribution :)

lydell commented 3 years ago

Wow, this sounds amazing! I’m so happy I found this, and a bit sad I didn’t find it sooner. Would you mind explaining how to enable this? I can’t figure it out. 🙏

@jzabroski Did you end up doing something similar for Xunit? I’d love to steal that for projects not using Expecto 😄

baronfel commented 3 years ago

@lydell there's a code sample here in this PR demonstrating it, but you have to

add the Expecto.Diff package, and
configure it using the code I linked above

lydell commented 3 years ago

@baronfel Thank you so much! 🥇

Also, I tried using Expecto.Diff.equals in Xunit and it seems to work! Edit: Actually, all Expect.foo functions seems to work just fine in Xunit. That’s awesome! I get great Rider integration via Xunit, and great assertions via Expecto! (Though it feels a little bit wasteful to essentially have two test runners installed, but I guess it doesn’t matter in practice.)

haf / expecto

Use DiffPlex to produce better diffs #346

Rendering examples: