oasp / oasp4j

The Open Application Standard Platform for Java
Apache License 2.0
60 stars 303 forks source link

Strategy for equals() and hashCode() of TOs #241

Closed hohwille closed 8 years ago

hohwille commented 9 years ago

In #55 we discussed if equals() and hashCode() should actually be implemented for TOs. I still say so and others said we should just keep default implementation of Object.

hohwille commented 9 years ago

@mmatczak suggested a smart concept that we just discussed: Simply add an additional property of type UUID that acts as additional unique key but does not replace the primary key. The UUID is generated at construction time (both on client and server) when a new instance of the entity object is created. This also helps to reference transient objects. Then implementation of equals and hashCode is trivial based on that.

hohwille commented 9 years ago

One thing to consider is that UUID generation is expensive (at least type 4 that is used by UUID.randomUUID()). But if the implementation to generate them is abstracted and can be replaced this can be an interesting option. Should at least be considered for further discussion and evaluation.

maybeec commented 9 years ago

The UUID is generated at construction time (both on client and server) when a new instance of the entity object is created.

Comparing the UUID for equals method is not the meaning we want to express with equals methods. Following this UUID approach you can also simple compare the Object#toString() return values as they are unique for each object instance.

Or how should the implementation of equals should look like referencing the UUID? With equals it should be possible to compare different object instances to have the same properties!

hohwille commented 9 years ago

With the UUID approach you can have multiple "clones" of the same object that are considered equal. Using the PK (id field) is not possible as the PK is only present when the object has been persisted. However, this would still lead to a different "semantic" of equals as then a modified entity would be equal to the original entity before it was modified. That is different in the current approach. However, the main reason is having an entity in a set, list or map and therefore the suggested UUID would be a KISS solution. So besides general clutter the equals method would be return Objects.equals(this.uuid, other.uuid). However, each instance will always have a uuid assigned and each "clone" would share the same UUID.

I have tested the performance of UUID.randomUUID() and this is far beyond being neglect-able. So I would not hardwire this as the standard solution. From all other aspects, I would agree that this is a good approach.

With "clone" I do not mean "Object.clone" - you also need to consider an object serialized (to XML, JSON, Java-Serialized-Data, etc.) and deserialized back (otherwise we could still use Object.equals default impl.).

maybeec commented 9 years ago

With the UUID approach you can have multiple "clones" of the same object that are considered equal. Using the PK (id field) is not possible as the PK is only present when the object has been persisted.

Ok, I agree this is quite a benefit.

However, the main reason is having an entity in a set, list or map and therefore the suggested UUID would be a KISS solution.

Yes it is KISS on implementation side. Nevertheless, in my opinion there are several drawbacks you have to consider establishing such a solution in a project.

  1. equals is a central function in Java, which is not only used by collections to insert and remove entries. It will be also used several times by the programmer with the semantics defined by the Java language. Breaking with this semantics might disagree with the programmers mental model. Thus there might arise different programming errors caused by this semantic changes.
  2. As stated by yourself UUID generation has a non neglect-able performance impact.

It might be an intersting solution for a small piece of projects.

I would suggest the standard way of implementing equals and hash methods and I have found a nice an easy maintainable solution for that. What do you think about this?:

EqualsBuilder HashCodeBuilder

hohwille commented 9 years ago

@may-bee for 1. no contract about equals or hashCode says anything that would be inconsistent with the UUID approach. Read the Javadocs of Object.

No 2. is the major issue for me as well. We could probably use other impls of UUID or just drop this idea.

HashCodeBuilder ist obsolete since java7 introduced the Objects class that also supports simplifying equals but not as compact as EqualsBuilder. However, I prefer the pure java7 way. Unfortunately I could not find a way to make eclipse generate equals/hashCode in this style but I could think about an eclipse plugin that could solve this based on templates :)

maybeec commented 9 years ago

HashCodeBuilder ist obsolete since java7 introduced the Objects class that also supports simplifying equals but not as compact as EqualsBuilder. However, I prefer the pure java7 way.

Nice, I have not known it. So what about the following

@Override
    public int hashCode() {
        return Objects.hashCode(this);
    }

@Override
    public boolean equals(Object obj) {
        return Objects.deepEquals(this, obj);
    }

Easy but maybe also too generic, what do you think? I am quite concerned about the deepEquals. But it may be worth a try.

hohwille commented 9 years ago

deepEquals relies on equals so you created an infinity loop. Same for your suggestion on hashCode :) You need to explicitly call Objects.hashCode(this.field1, this.field2, ...) and accordingly for equals. Otherwise reflection would be required what is a performance issue and these methods need to be rather fast. It is a pity that java does not solve this problem natively. Lombok does though. Simon stricly voted against it but it can make such things so much easier and avoid a lot of boilerplate code.

maihacke commented 9 years ago

I still do not get the point why should find a implementation for equals and hash code. As I already said I had never problems with sticking to the default implementation. This approach is obviously very KISS. To the Id approach why should object a.equals(b) be true if their uuid match? What does it mean? Both objects are interchangeable? But that's not true in most case.

MarcoRose commented 8 years ago

What do we have today? A high number of TOs with the usual lengthy field-by-field calculation of hashcode and equals. My proposal: Check restaurant for equals and hashcose usage and implement with class Objects:

    @Override
  public int hashCode()
  {
      return Objects.hash(
         this.state, this.waiterId);
  }

This is short, works on copies of objects, works on objects that incidently have the same content.

MarcoRose commented 8 years ago

didn't want to close -> reopen

maihacke commented 8 years ago

As @MarcoRose requested a decision for 1.5.0 release. Here is my point of view:

hohwille commented 8 years ago

If that is the summary here, then I would not include this ticket into 1.5.0 and move it to discussion. Or if it is just me who cares about comparing TOs with equals then overvote me.

hohwille commented 8 years ago

I had a dicussion with @maihacke about this. In general there are differen options that have been discussed above. The main problem is that we currently have an inconsistent state in the code what is the worst option so we should react now instead of waiting even longer. As the approach suggested by @maihacke is the most simplest and does not cause any problems with the sample application we go for that. Meanwhile we can discuss further if we come up with a new strategy that convinces all and then rework again.

hohwille commented 8 years ago

To sum this up as technical task:

amarinso commented 8 years ago

I don't fully get the details, I've been reading to be able to comment on this issue and found some interesting posts like this one: http://www.ideyatech.com/effective-java-equals-and-hashcode/

But in practice we haven't had problems with default implementation on our projects. What has been your real experience? Maybe we have been doing it wrong in the first place because we've been using openSessionInView (anti)pattern that leads to less identity problems on entities...

What I propose is that we add an explanation to the documentation alerting of the problems that can arise and suggesting solutions.

hohwille commented 8 years ago

From the posting that @amarinso quoted:

Some programmers do not override the equals and hashCode methods until such time comes that they need the instances of their Class to serve as map keys or set elements. As a good practice, always override the equals and hashCode methods right after writing or designing your Class. Always remember, “No Class is an Island”.

@maihacke suggested that this time to override the methods never came in their large project (with many sub-projects). Has someone an argument why this would really be required and there is no other solution? I failed to give a reasonable scenario and therefore agreed on the most simple approach as a first step even though I was suggesting to implement all equals and hashCode methods properly in our sample. We also had long discussions about putting objects in maps and then doing changes to such an instance what causes their hash if hashCode is manually implemented, etc.

hohwille commented 8 years ago

@maihacke does the example in the post with entities from different sessions/transactions convince you to rethink this? IMHO this may only be relevant for batches if at all. In such case it might be fully sufficient to create a solution outside the entity. See e.g. http://m-m-m.sourceforge.net/apidocs/net/sf/mmm/util/lang/api/EqualsChecker.html http://m-m-m.sourceforge.net/apidocs/net/sf/mmm/util/lang/api/HashCodeFunction.html http://m-m-m.sourceforge.net/apidocs/net/sf/mmm/util/collection/base/HashKey.html etc. Wrapping the entities that all have an ID to be consistent with equals meaning ID equality is very easy.

hohwille commented 8 years ago

See also #321

hohwille commented 8 years ago

PR #320 has been merged. All equals and hashCode Methods have been removed. We rely on Object default. Further discussion and arguments about reintroducing these methods on discussion channels (google groups or Yammer) please. Last TODOs in #321.

hohwille commented 8 years ago

For the record: A common case is that you write unit tests and assert that a result is equal to a given object. To make a clear example: I may want to write a test that is checking that JSON mapping is working fine. So I create an instance of a transfer object, populate with some test data and serialize it to JSON and then from the JSON I deserializer back to the TO. If can not use equals how would I check that the deserialized object holds the same data as the initial test object? This would result in the fact that I would implement the code of equals in my test case. If I do that in multiple test-cases this is getting really bad.

I also went through our use-cases and also find a lot of cases where equals is used.

I still do not buy @maihacke his arguments that properly implementing equals is pointless.

maihacke commented 8 years ago

For your mentioned exampled I would write a generic assertion which compares to object hierarchies. This is quite simpler and less code as implementing/generating equals. In general I you should not implement your "equals-logic" in the testcases. We did this in many of my projects and it worked quite well. As you already that this will lead to bad code duplication. But thats nothing special. You should write assertions for your testcases which could be reused in many tests. Sorry - I'm still not convinced that implementing equals is a good idea at all.

hohwille commented 8 years ago

So you are saying that instead of writing:

assertThat(resultRollection).contains(expectedEntries);

You suggest to write

assertThat(resultRollection).hasSize(expectedEntries.size());
for (ElementType element : resultCollection) {
  boolean equals = false;
  for (ElementType otherElement : expectedEntries) {
    if (EqualsUtil.equals(element, otherElement)) {
      equals = true;
    }
  }
  assertThat(equals).isTrue();
}

Or would you rewrite all tools where AssertJ ist just one of many?

maihacke commented 8 years ago

Our example was a mapping test - I would write something like this:

person = new Person(.... personTo = dozer.map(personObj, PersonTo.class); assertThat(this.dozer.map(personTo, Person.class)).isEqualToComparingFieldByField(person);

maihacke commented 8 years ago

And for your case I would either write:

assertThat(resultRollection).contains(expectedEntries);

or something with more business meaning:

assertThat(resultRollection).containsPersons("Marta", "Peter", "Uwe");

...but thats just a matter of taste and how you exactly implement your assertions.and what exactly do you wanna test. The point is that finding a element in a list and the question if two elements are equal are not that simple to answer. If you think of technical attributes like id, version should they be part of the equality or not? Probably you need both variants in your application, depending on the context. So this can't be implemented in a single equal method. On top there are serious issues with the corresponding hashcode implementation which can't be applied on mutable field or all your collections containing those elements are possibly corrupted. We did this before and had serious trouble. Now we skip the equals implementation and there is not a single problem at all.

ivanderk commented 8 years ago

Not to add to an already long discussion, but I would like to suggest that this is not a matter of "style" nor "taste". And performance criteria of these kinds should not even enter the discussion in a web app.

Writing this kind of logic again and again in order to avoid the limitations of the Java type system, i.e. just providing for reference types, is error-prone. it will lead to bugs.

It is not just a matter of "convenience" to be able to treat a class instance as an instance of a Value type (Value Class in Scala; unfortunately a bit limited). It will avoid unnecessary code and therefore mistakes. Especially when you are talking about Value Objects (as opposed to Entities) from a DDD point of view.

Typically a value object should be immutable and equality should not be based upon identity. So, I, avoiding the also contentious issue of immutability for the moment. I would suggest that equals() and hashCode() should be implemented for TOs.

maihacke commented 8 years ago

I'm not sure if I get your point correctly. Our TOs are not immutable, I don't think that you suggested that, don't you? And why should they implement equals/hashcode then? I'm little but confused now ;-)

maihacke commented 8 years ago

When you all vote for implementing hashcode and equals, how would you exactly do this? How do you cope with the problem that A) you must not changed values, which are part of hashcode? Otherwise you sets will break... B) If two objects are equal, then they must have the same hash code. So you must implement hashcode when implement equals (with a subset of the "equality properties". But none of this attributes is allowed to change?

maihacke commented 8 years ago

I have antoher point here given the following example. Say we have simple data model as follows:

class Course { String title Collection members; ... }

class Student { long id; long version; String enrolmentNumber; String firstname; String lastname; ... }

A student must not be multiple times in a class. How would you implement equals and hashcode? How would you implement the following methods?

public void addStudentToClass(Student student, Course course) { ... }

public boolean isStudentInClass(Student student, Course course) { ... }

What if the enrolmentNumber is the business key for students. When applying for a class students enter their name and enrolment number.

amarinso commented 8 years ago

For future reference, I've found these resources very interesting on the subject

Equals and HasCode https://developer.jboss.org/wiki/EqualsandHashCode?_sscc=t

Don't Let Hibernate Steal Your Identity http://www.onjava.com/pub/a/onjava/2006/09/13/dont-let-hibernate-steal-your-identity.html?page=1

Joshua Bloch chapter 3 of Effective Java http://courses.cs.washington.edu/courses/cse373/12sp/homework/6/ch3_effectivejava-excerpt.pdf

\ the JPA hashcode equals dilema on StackOverflow** http://stackoverflow.com/questions/5031614/the-jpa-hashcode-equals-dilemma

All in all I think that is a matter of purity vs convenience. For normal use cases having to deal with equals/hashCode is cumbersome for the programmers. If it has to be implemented there should be a way to automate it event at a cost of performance (if performance is then a real problem then it should be addressed specifically)

Maybe there are utilities like pojomatic or other that can be useful...

hohwille commented 8 years ago

If you think of technical attributes like id, version should they be part of the equality or not? Probably you need both variants in your application, depending on the context.

  • Equality can always be checked via the identity operator ==.
  • For comparing IDs you can do Objects.equals(a.getId(), b.getId())
  • The regular equals method should always compare all fields.
maihacke commented 8 years ago

The regular equals method should always compare all fields. -> That's not true, see those links @amarinso provided

I thinks there is no progress in our discussion. Nobody did answer my questions, so the arguments are still in place. If we follow @amarinso arguments "For normal use cases having to deal with equals/hashCode is cumbersome for the programmers." I think we should not change anything and do not implement equals and hashcode as long as nobody presents a solution which adresses all problems we identified so far.

amarinso commented 8 years ago

What if...

we document properly the problem in the architecture document and let decide the engagements based on their needs.

soyrochus commented 8 years ago

+1 for comment by Angel. Let´s document this clearly and offer a possible solution for the use case in which all fields must or may be incorporated in the determination of value & hascode.

Could be implemented from scratch or, prefereably in my view, through the @EqualsAndHashCode annotation as offered by Project Lombok https://projectlombok.org/features/EqualsAndHashCode.html

hohwille commented 8 years ago

For Lombok see #24 and continue discussion there if you like. To focus the discussion here: We should consider all comments about optimized equals and hashCode implementations using IDs oder UUIDs as reverted. Let us just discuss about the generic implementation that compares all fields. I am also with @amarinso and @soyrochus. In case we would agree and having equals/hashCode we could next discuss how to implement and maintain them (CobiGen, Pojomatik, Lombok, etc.). That is a later step.

hohwille commented 8 years ago

A student must not be multiple times in a class. How would you implement equals and hashcode?

Like we use to do before we changed due to this issue and how CobiGen did generate these methods: By comparing all fields. We used Objects.equals(x,y) and Objects.hash() for compactness.

How would you implement the following methods?

@Entity
public class Course {
  private Set<Student> students;
  public Set<Student> getStudents() { if (this.students == null) { this.students = new HashSet<>(); } return this.students; }
  ...
}

public void addStudentToClass(Student student, Course course) {
   course.getStudents().add(student);
}

public boolean isStudentInClass(Student student, Course course) {
   course.getStudents().contains(student);
}

This would work with hibernate, etc. You can consider using deleteOrphaned or maintain the relation individually. Still it is a business constaint if we have to check for duplicates (two students with same names, address, phone, etc.). This would not yet be prevented by my simple code. But this is nothing to address by a technical equals method. I think this is where our discussion went wrong in the first place when talking about "optimizations" of equals using UUIDs, etc. Simply forget these ideas and keep discussing here. If I want the default of equals, I use explicitly == and that is a stronger signal to the code-reader what is going on here.

hohwille commented 8 years ago

FYI: The major problem is that Java did not build the default of using all fields into the language and the compiler.

hohwille commented 8 years ago

So I tried to open my mind and revisited what can be done. I tried isEqualToComparingFieldByField from AsserJ and that really works well. It is even better than using isEqualTo as in case of a mismatch you get the detailed info what field does not match.

Further, I did some evil stuff and put objects with custom hashCode() methods into a HashSet. Then I modified the object and tried to remove it. The fun thing is that the JDK is then unable to ever remove the element from the set. Even iterator.remove() is unable to remove the element - what actually means that the implementation of the JDK is very poor as it recalculates the hash and then uses this to search the object to remove again. After debugging into the deep spots I would agree that we should probably really go for what Simon suggested. Be aware that JPA/hibernate will also tweak objects after they have been added to sets and your code is completed when the TX gets commited (e.g. increases @Version).

maihacke commented 8 years ago

:+1:

hohwille commented 8 years ago

I went through the code of AssertJ and discovered that isEqualToComparingFieldByField is not working recursive. So if you really need a deep equal comparison this is not working. In other words a deep clone of a complex object would not be isEqualToComparingFieldByField with its original source object. That still is a problem in testing.

However, after the pain points that we illustrated we can still think of providing a deep equal comparison where this is really needed.

hohwille commented 8 years ago

Further insights and investigation: If you do not override equals and hashCode you will get the default implementation from java.lang.Object. That returns an hash of the internal object-ID assigned by the JVM on object creation. This again is more or less a clear random. All this again implies that you have more or less random values returned by your hashCode() method (the value does not change if invoked multiple times though). Now if you use Set implementations such as HashSet then the hashCode indirectly implies the order of your Set. So the Iterator is returning the elements in a different order for the structurally same elements with every launch of your JVM (e.g. your test-case). This can cause some confusion. Especially it makes a solution for a deep equals comparison of container objects really tricky as you can

as everything relies on hashCode() what then is a random value and behaves indeterministic. In other words comparing two Set instances this way implies O(n²).

hohwille commented 8 years ago

If you have an indeterministic order you can not even build the serialized representation (JSON or serialization byte[]) of each of the objects to compare and compare that instead.

maihacke commented 8 years ago

What Jörg said is obviously true, but I since Sets are unordered per definition it would be bad to design to rely on any order. If you need determinism you have to use a different data structure List, SortedSet, ... or sort the set by yourself.

hohwille commented 8 years ago

Still the problem is that you need to compare two objects of a class that has a field of type Set. How would you consider that they are structurally deep equal to each other? It is possible to implement such an assertion but this is rather complex to implement as well and complex to compute (O(n²) already for each Set).