r3labs / diff

A library for diffing golang structures
Mozilla Public License 2.0
895 stars 84 forks source link

Performance guidance #27

Closed nehbit closed 4 years ago

nehbit commented 4 years ago

Hi there,

Lovely library, thanks for making it! We have started using it in Aether (https://aether.app - github.com/nehbit/aether) for the past week, and it's been working splendidly so far.

While I realise the best way to figure out suitability is to test, I would love to know if you have any guidance on the use expectations of this library from a performance context. In essence, what order of magnitude of elements would you be comfortable in diffing in, say, a for loop? 10, 100, 1000, 10000 or more? Can we rely on this as a component in a stream pipeline that handles hundreds of thousands of messages on a continuous basis, for example?

The way we're doing diffing right now is to convert the object to JSON with a fast JSON parser, get the SHA256 hash, and compare. We don't actually need to figure out in most cases what changed, just that the object has changed. Would this library offer a better, faster way to do this? Essentially, a way to stop processing once any difference is found, and return that there is a diff.

I know compared to the Go's own JSON implementation, this is likely faster, because Go's own implementation also relies on reflect. Likely this means your library is faster than it. However, I think the specific JSON library we use does not.

I'd love to know if you have a feel for what order of magnitude you'd feel comfortable using this library in re: performance. In the meanwhile, I'll probably run my own tests and post back with the results.

purehyperbole commented 4 years ago

Hey, glad you're liking the library! :)

While i haven't personally done any performance testing, this library, like encoding/json makes heavy use of reflection, and thus performance is probably not going to be great. Without knowning the structure of your data, its hard to say for sure.

Right now, using diff.Changed() actually runs a full diff and doesn't return immediately when a change is detected, as its mostly just there for convenience. This is something we could change, however.

Is your current approach spending more time encoding to json or hashing? It probably unlikely, but if its the latter, there are a number of much faster hash functions out there that may be worth a look if your usecase doesn't need to be collision resistant.

nehbit commented 4 years ago

Thanks! Yeah, I ended up doing some testing on my own and found that if you don't need the diffs directly, you are far better off with reflect.DeepEqual().

I also tried to add an early cutoff to the library so that diff.Changed() returns as early as possible. To my surprise it only made a 30% difference, even if the first change was pretty much on the first item in a struct. That was a bust.

Other than that, I tried to convert to JSON and then get a SHA256 hash, and compare them. This turned out to be marginally the fastest.

I ended up keeping this library in and start to make better use of diffs, so the problem solved itself — I now actually need the diffs. 🙂