Open GoogleCodeExporter opened 9 years ago
Hi,
This proposal sounds interesting. And it can be really useful. We need to think
about it.
One think I'm wondering about is the exact semantics of "equals":
- if classes of arguments passed to Kryo.equals have a dedicated "equals"
method, then I guess this one should be invoked and do all the job, right?
- If there is a serializer for a given user-defined class T and this serializer
has a dedicated "equals" method, then this one should be used for comparison
- And if a class does not provide its own "equals" method (and therefore
derives it from Object) and serializers do not provide it either, then Kryo
should make of use of the meta-information it has collected about the types and
check for structural equivalence.
Does it cover all possible cases? Do we see any issues with it such as
incompatibility with standard equals methods or something like this?
-Leo
Original comment by romixlev
on 4 Sep 2013 at 12:07
BTW, there are enough libs that can compare any object graphs, e.g.
https://code.google.com/p/deep-equals/ (the whole logic in a single class!)
http://www.unitils.org/tutorial-reflectionassert.html
And here is a related StackOverflow question:
http://stackoverflow.com/questions/1449001/is-there-a-java-reflection-utility-to
-do-a-deep-comparison-of-two-objects
Original comment by romixlev
on 4 Sep 2013 at 1:05
Regarding your questions...
I think this new "equals" should not attempt to implement the existing
definition of Object.equals(). As you point out, we already have libraries that
can do that.
Instead, it should implement "equals with respect to Kryo serialization".
This is actually a more useful definition in many cases. Here's my particular
use case: suppose you are persisting an object graph to disk after every change
(or on some trigger, etc.). An obvious optimization is: "Don't persist the
object graph to disk if nothing has actually changed". What does that mean?
That really means: don't persist the object graph to disk if, when the
persisted object is later deserialized, nothing in the resulting object graph
will be different from what we would have gotten from the previous serialized
version. Similar use cases arise when you replace the phrase "persist to disk"
with "broadcast to all other nodes in a cluster", etc.
So two object graphs are "equals with respect to Kryo serialization" if, when
serialized, they generate the "same output", or more precisely, the two object
graphs that you would get by deserializing the two serialized outputs are
indistinguishable in any meaningful way. The definition of "meaningful" should
be clear, e.g., different system hash codes is not a meaningful difference, but
a different topology of object references is (i.e., differently shaped object
graph).
How do we determine this version of "equals"? You would think that "same stream
of bytes" would suffice, but not necessarily. There may be embedded
identifiers, or non-deterministic ordering effects in the serialized data, for
example, due to different iteration order of a HashMap based on objects' system
hash codes (which are random). Another example is back-references in the output
may have different internal ID's, but they should be considered equal if they
refer to the same earlier object. Etc.
I am not an expert on the details of how Kryo serialization works to identify
all of these cases. However, it can simply be the Serializer's job to figure
this out. There can be a default strategy which is just to compare
byte-for-byte the output from serialization. But for cases where this is too
strict a test, the Serializer can override the logic as necessary.
One interesting case is handling of Sets. Because Set contents iterate in
indeterminate order, how would an equals() implementation know which item from
Set #1 to compare with which item from Set #2? You would have to compare the
first item in Set #1 with successive items in Set #2 until you find a match,
then repeat. The converse comparison could be done at the same time. This would
require maintaining a list of unmatched items from each Set. SortedSets are not
a problem however, nor are Maps where the keys can be sorted.
An alternative strategy is just to punt and use Set.equals() to compare Sets.
However, this would generate a false negative e.g. for this class:
public class Foo {
private int value; // even if foo1.value == foo2.value, !foo1.equals(foo2)
}
Though one could argue the real problem is that Foo should be overriding
equals() and hashCode().
In any case, these questions are not show-stoppers, because by letting the
Serializer be responsible for doing the comparison, all options can be
available and we can push these definitional questions onto the user where they
belong.
Original comment by archie.c...@gmail.com
on 4 Sep 2013 at 1:55
Original comment by romixlev
on 2 Oct 2013 at 3:08
Original issue reported on code.google.com by
archie.c...@gmail.com
on 3 Sep 2013 at 8:07