lambdaisland / deep-diff2

Deep diff Clojure data structures and pretty print the result
Eclipse Public License 1.0
296 stars 18 forks source link

don't perform unnecessary work when diffing maps #45

Closed latacora-paul closed 1 year ago

latacora-paul commented 1 year ago

I was seeing some pretty bad performance when performing a large diff and tracked down the primary cause. I'm certain there are other improvements that can be made but I'll keep the change small for now :). This change runs ~15x faster for my use case.

Math identity illustrating why this change is okay:

(A ∩ B) ∪ (B - A) = B

Before:

Screenshot 2023-06-05 at 11 10 20 AM

After:

Screenshot 2023-06-05 at 11 23 14 AM
alysbrooks commented 1 year ago

Thanks for digging in and profiling!

I didn't intuitively understand the identity (perhaps due to unfamiliarity with the notation) until I thought about it more.

It's basically saying the intersection (i.e. what's in A and B):

Venn diagram with only the overlap of two circles highlighted

Combined with the difference (i.e., what's only in B):

Venn diagram showing only the right circle and not the overlap highlighted

is just B:

Venn diagram showing the entire right circle highlighted, including the part that overlaps

The code looks good and it passes our tests (including property-based tests), so this seems like a solid change. I have a nagging worry there's some implementation detail that would make these two not equivalent.

latacora-paul commented 1 year ago

I also had some uneasiness (why was it like that in the first place if it could be so simple?).

However, as far as I can tell the order of act-ks doesn't matter since it's diffing maps that make no guarantees about entry order anyways. The passing tests gave me increased confidence in the safety of the change. @alysbrooks if there's something additional you think needs to be done just let me know :)

plexus commented 1 year ago

I think enough due diligence has been done here. @latacora-paul could you still add a CHANGELOG entry? then this is ready to go out.

latacora-paul commented 1 year ago

could you still add a CHANGELOG entry?

done!

alysbrooks commented 1 year ago

I will note that I didn't notice tests running any faster after this change, but there's a lot of noise on our tests and I don't think they involve many large maps. I did a quick test using criterium on my computer on a 100-element hash-map, and it was several times faster and somewhat faster for a small hash-map.