lishunli / kryo

Automatically exported from code.google.com/p/kryo
BSD 3-Clause "New" or "Revised" License
0 stars 0 forks source link

Request new feature/API addition: Kryo.equals() #131

Open GoogleCodeExporter opened 9 years ago

GoogleCodeExporter commented 9 years ago
A common need when dealing with graphs of objects is to compare two object 
graphs for equality.

Just as it does with the copy() operation, Kryo already has all the machinery 
necessary to implement an equals() method that calculates whether two objects 
are equal with respect to Kryo serialization.

The new Kryo API additions could be simple:

  public <T> boolean equals(T obj1, T obj2);
  public <T> boolean equals(T obj1, T obj2, Serializer ser);

Obviously, users can write their own version of this method by serializing the 
two objects and comparing the resulting bytes, but this has several 
disadvantages:

* Requires user to write their own code
* Requires entire object graph to be serialized, whereas a built-in equals() 
method can stop at the first difference
* Built-in equals() can omit comparisons between any identical nested objects 
in the graph (e.g., no need to compare the "foo" property when obj1.getFoo() == 
obj2.getFoo()).
* Inefficient use of memory - built-in equals() does not need to store any 
serialized bytes in memory, user code requires storing two of them

Please add a Kryo.equals() method (or whatever you'd like to call it).

Original issue reported on code.google.com by archie.c...@gmail.com on 3 Sep 2013 at 8:07

GoogleCodeExporter commented 9 years ago
Hi,
This proposal sounds interesting. And it can be really useful. We need to think 
about it.

One think I'm wondering about is the exact semantics of "equals":
- if classes of arguments passed to Kryo.equals have a dedicated "equals" 
method, then I guess this one should be invoked and do all the job, right? 

- If there is a serializer for a given user-defined class T and this serializer 
has a dedicated "equals" method, then this one should be used for comparison 

- And if a class does not provide its own "equals" method (and therefore 
derives it from Object) and serializers do not provide it either, then Kryo 
should make of use of the meta-information it has collected about the types and 
check for structural equivalence.

Does it cover all possible cases? Do we see any issues with it such as 
incompatibility with standard equals methods or something like this?

-Leo 

Original comment by romixlev on 4 Sep 2013 at 12:07

GoogleCodeExporter commented 9 years ago
BTW, there are enough libs that can compare any object graphs, e.g.
https://code.google.com/p/deep-equals/  (the whole logic in a single class!)
http://www.unitils.org/tutorial-reflectionassert.html

And here is a related StackOverflow question:
http://stackoverflow.com/questions/1449001/is-there-a-java-reflection-utility-to
-do-a-deep-comparison-of-two-objects

Original comment by romixlev on 4 Sep 2013 at 1:05

GoogleCodeExporter commented 9 years ago
Regarding your questions...

I think this new "equals" should not attempt to implement the existing 
definition of Object.equals(). As you point out, we already have libraries that 
can do that.

Instead, it should implement "equals with respect to Kryo serialization".

This is actually a more useful definition in many cases. Here's my particular 
use case: suppose you are persisting an object graph to disk after every change 
(or on some trigger, etc.). An obvious optimization is: "Don't persist the 
object graph to disk if nothing has actually changed". What does that mean? 
That really means: don't persist the object graph to disk if, when the 
persisted object is later deserialized, nothing in the resulting object graph 
will be different from what we would have gotten from the previous serialized 
version. Similar use cases arise when you replace the phrase "persist to disk" 
with "broadcast to all other nodes in a cluster", etc.

So two object graphs are "equals with respect to Kryo serialization" if, when 
serialized, they generate the "same output", or more precisely, the two object 
graphs that you would get by deserializing the two serialized outputs are 
indistinguishable in any meaningful way. The definition of "meaningful" should 
be clear, e.g., different system hash codes is not a meaningful difference, but 
a different topology of object references is (i.e., differently shaped object 
graph).

How do we determine this version of "equals"? You would think that "same stream 
of bytes" would suffice, but not necessarily. There may be embedded 
identifiers, or non-deterministic ordering effects in the serialized data, for 
example, due to different iteration order of a HashMap based on objects' system 
hash codes (which are random). Another example is back-references in the output 
may have different internal ID's, but they should be considered equal if they 
refer to the same earlier object. Etc.

I am not an expert on the details of how Kryo serialization works to identify 
all of these cases. However, it can simply be the Serializer's job to figure 
this out. There can be a default strategy which is just to compare 
byte-for-byte the output from serialization. But for cases where this is too 
strict a test, the Serializer can override the logic as necessary.

One interesting case is handling of Sets. Because Set contents iterate in 
indeterminate order, how would an equals() implementation know which item from 
Set #1 to compare with which item from Set #2? You would have to compare the 
first item in Set #1 with successive items in Set #2 until you find a match, 
then repeat. The converse comparison could be done at the same time. This would 
require maintaining a list of unmatched items from each Set. SortedSets are not 
a problem however, nor are Maps where the keys can be sorted.

An alternative strategy is just to punt and use Set.equals() to compare Sets. 
However, this would generate a false negative e.g. for this class:

  public class Foo {
     private int value;   // even if foo1.value == foo2.value, !foo1.equals(foo2) 
  }

Though one could argue the real problem is that Foo should be overriding 
equals() and hashCode().

In any case, these questions are not show-stoppers, because by letting the 
Serializer be responsible for doing the comparison, all options can be 
available and we can push these definitional questions onto the user where they 
belong.

Original comment by archie.c...@gmail.com on 4 Sep 2013 at 1:55

GoogleCodeExporter commented 9 years ago

Original comment by romixlev on 2 Oct 2013 at 3:08