google / error-prone

Catch common Java mistakes as compile-time errors
https://errorprone.info
Apache License 2.0
6.82k stars 740 forks source link

Raison d'être of EqualsGetClass is debatable #1144

Closed jcayzac closed 5 years ago

jcayzac commented 5 years ago

What version of Error Prone are you using?

2.3.2

Does this issue reproduce with the latest release?

Yes

What did you do?

I wrote an Foo#equals(Object) override with the following implementation:

  @Override
  public boolean equals(final Object o) {
    if (this == o) return true;
    if (o == null || getClass() != o.getClass()) return false;

    final Foo that = (Foo) o;
    return field1.equals(that.field1)
        && field2.equals(that.field2)
        && field3.equals(that.field3);
  }

What did you expect to see?

No error prone warning.

What did you see instead?

An EqualsGetClass advocating to replace my perfectly-valid code with a bogus implementation breaking symmetry, a strong requirement of Object#equals(Object).

cushon commented 5 years ago

Implementations using instanceof can be symmetric as long as subtypes don't add additional attributes and include them in their equals implementations, which they shouldn't.

The getClass approach violates the Liskov substitution principle if it's used on non-final classes.

For more information, see Effective Java Item 8 and this interview with Josh Bloch.

jcayzac commented 5 years ago

Thanks. I remember reading J.Bloch comment on that part and I agree with his (more nuanced) take on the problems with getClass(). I think he also makes it clear there are pros and cons to both approaches and that it's a matter of opinion. I think "fishy" was a poor choice of word, sorry about that. Since I can silence the warning with -Xep:EqualsGetClass:OFF I'll close the issue.

Thanks for the quick reply!

ejona86 commented 5 years ago

I'm surprised Error Prone is taking a position on this.

Basically, the problem with that the Liskov substitution principle is that it generally doesn't apply. A subclass isn't equal: you couldn't replace one instance with the other, because then you would lose information. Someone created the subclass instead of the base class for a reason. Collections are the only common case it seems to hold in code that I write.

And this is evident in the examples and Josh's discussion. It says "innocuous". What is an innocuous change to a subclass, such that you could replace one with another? Is there a non-container example, in practice, you could reference?

as long as subtypes don't add additional attributes and include them in their equals implementations, which they shouldn't.

But it sounds like that is broken when comparing the subtypes; two instances are different and shouldn't be used interchangeably (if you could use them interchangeably, then you don't need the additional attribute).

I had thought we had all agreed that equals is basically broken when there's inheritance because it means different things at different times, and we all went off and will do our own thing.

ejona86 commented 5 years ago

This came up in https://github.com/grpc/grpc-java/pull/4965 , where the case in question is a class that has a private constructor (and is extended by two private classes).

kevinb9n commented 5 years ago

Pasting from @jbduncan

@kevinb9n, thank you very much for writing up an initial explanation for why instanceof is preferred over getClass() for custom equals() implementations, over in commit b9078a9 (which I assume was written in response to issue #1144)!

However, I admit that I'm rather confused by the explanation given, because from what I've understood so far, it goes into why inheritance and custom equals() impls should never be mixed, rather than why instanceof is preferable to getClass() when inheritance and equals() are mixed.

Furthermore, it's a bit late in the evening for me to read through my copy of Effective Java to understand why Joshua Bloch thinks that instanceof is preferable, so I'm lacking the necessary context.

Thus I wonder if you have any plans to improve the documentation w.r.t. all this? :)

kevinb9n commented 5 years ago

Well, you see why it was labeled "my first pass". Unfortunately, the entire argument is structured weirdly.

The deep and complex point we need to get across is this: once any type (class or interface) chooses its equals() behavior, as something distinct from Object.equals(), then all subtypes of that type are stuck with that exact behavior. They can override it to optimize, but if they try to change the meaning in any way, it will not pan out because it will either fail to function as a well-behaved instance of the supertype, or break the equals() contract, or maybe some other terrible fate. All this should be uncontroversial and imho should be taught as the basic consequence of the equals() contract and the meaning of subtyping.

This leaves you having to make a decision about whether you want anything but your leaf types to specify/implement equals behavior at all, and Eric is of course right that Collection types are some of the rare examples where this is done. It is done with the conscious decision that all types below that point are implementation-detail types - we choose not to care too much about the "information lost" if I put an ArrayList into a set and then find a LinkedList there in its place later. That doesn't happen much outside of collections.

Now, if someone accepts the above reasoning, then I think the need to use instanceof over getClass follows closely from it?

The current text now is flawed because it makes it appear that we're arguing "here's what to do when altering the equals() behavior in a subtype" when it's really trying to say "look how that just doesn't work at all and you should go with composition."

kevinb9n commented 5 years ago

As for being surprised that Error Prone takes a position on this: every case we have seen of getClass() being used in this way has appeared to us to be either a bug happening or a bug waiting to happen. It also trips people up who forget to cover the null case. So this doesn't feel to me like taking a philosophical stance, it feels like taking a "bugs are bad" stance.

kevinb9n commented 5 years ago

@jcayzac "I think he also makes it clear there are pros and cons to both approaches and that it's a matter of opinion."

I am curious how you formed that impression, because I see the words "broken" and "unacceptable". (Which I believe are correct, and don't worry, it's not that I simply agree with EJ 100% of the time. It's really 98% at best. :-))

Anyone is welcome to disable the check, but I think the effect of disabling it will be to allow a certain category of bugs into your system. What would be the benefit to doing that, I wonder?

ejona86 commented 5 years ago

I am curious how you formed that impression

Josh said in linked interview:

Because of these problems, I didn't even bother discussing the getClass approach. But it turns out that because it does let you add aspects while preserving the equals contract, some people favor it. So I just want to get the information out there that it has disadvantages too. The biggest disadvantage is the fact that you get two objects that appear equal (because they are equal on all the fields) but they are not equal because they are of different classes. This can cause surprising behavior.

To me using getClass() is similar to Object.equals(), as far as subclasses are concerned. The subclass can't make itself equal to the parent class, but it can still define its own equals() for its own class. This provides the most freedom to subclasses to choose equality with their own instances, but it prevents them from being equal to their base class. However, the system may end up depending on equality in such a way that causes problems for subclasses.

Using instanceof dictates subclass's equals. Although a subclass could try to define equality specially with its own class, it is guaranteed to make equals() no longer transitive. It also requires care from subclasses when implementing hashCode(). It seems you should 1) make your equals() final or 2) have the leafs define equality. Note that (2) means you may be required to only extend abstract classes.

So as far as this warning of "equals method on non-final class," it seems these are the options in such a scenario: 1) use getClass(), 2) use instanceof and make equals() final, or 3) "don't do that". Those options are all able to be proved via ErrorProne, and ErrorProne could also detect the NPE case when using getClass().

Even if you disagree with (1) as being an option, it doesn't seem just "use instanceof instead of getClass()" is appropriate.

As far as the Liskov substitution principle being broken for getClass(), that's not strictly true, depending on the class's semantics of equals. The requirements in Object.equals() can still be met (reflexive, symmetric, transitive, consistent). If the class in question is not a value type, then the equality is comparing behavior and every non-trivial subclass (so only excluding, X extends Y {}) would behave differently in some way.

kevinb9n commented 5 years ago

Ah, sorry @jcayzac, I misunderstood which source you were referring to. Your characterization of the interview excerpt is accurate. Thanks @ejona86.

kevinb9n commented 5 years ago

On the other hand, the interview is from 2002, whereas Effective Java has been revised up to last year, and "broken"/"unacceptable" is the wording Josh has gone with for the long haul.

Something I want to clarify. I have claimed that "once any type (class or interface) chooses its equals() behavior, as something distinct from [identity equality], then all subtypes of that type are stuck with that exact behavior."

This is technically incorrect, because the supertype does have the option of specifying weak equality. To do this, they would write Javadoc that looks like this

/**
 * <b>May</b> return {@code true} if {@code other} is a {@code Point} instance whose
 * {@code x} and {@code y} coordinates are equal respectively to those of this point.
 * <b>Warning:</b> this method is <b>not guaranteed</b> to return {@code true} in this
 * case. <b>Important:</b>. This method <b>must</b> return {@code false} if either
 * coordinate differs.
 */

Weak equality is better than nothing. It is the kind we specified for com.google.common.base.Function (though, side note, you are now heavily dissuaded from depending on its equals behavior at all).

If you do this, then you will leave some freedom to your subtypes in how they want to define equality.

All that said, virtually no one ever wants this behavior, and absolutely no one ever actually bothers to specify it. Weak contracts are a pain. So this is ultimately an uninteresting footnote, and I stand by my claim.

Breaking substitutability permits bugs. Breaking the equals contract permits bugs. It is impossible to allow freedom of equals behavior to subclasses without doing one or the other. Therefore that freedom does not truly exist. Therefore getClass confers zero advantages over instanceof in equals.

ejona86 commented 5 years ago

On the other hand, the interview is from 2002, whereas Effective Java has been revised up to last year, and "broken"/"unacceptable" is the wording Josh has gone with for the long haul.

The discussion still seems only regarding value types. Also, he does not call out the issue with transitivity at all.

As an API designer, I feel I have the right to disallow things like CounterPoint, in the similar way I have the right to make my class final. I completely agree that CounterPoint doesn't work with getClass(), but I don't see why I must make it permitted.

Breaking substitutability permits bugs.

I don't understand the hang-up on this. Object itself does not permit substitutability because it considers all things unequal. I don't understand why you can say that my equals function must allow different classes to be equal when Object itself doesn't.

"permits bugs" is a really weak bar, as writing code permits bugs. The question is how likely are such bugs to occur, and for something like Function, it would seem a non-issue. Any time someone is comparing one Function to another it is not weird to expect they are 100% interchangeable. (I don't disagree that it is generally better to use composition instead of inheritance, but to say it must always be so feels like designing in a vacuum; other constraints may prevail and that has little to do with equals().) I also go further to say that any subclass of Function must be different from its parent, otherwise the base class should have been used.

Breaking the equals contract permits bugs.

The only thing discussed so far that would be virtually guaranteed to break the equals contract is "instanceof without final", as it breaks transitivity. I know that people may not care about transitivity, but that is much more obviously "breaking equals" vs "you should follow principal X and I prefer that principal X in your situation means Y."

lowasser commented 5 years ago

Okay. Stepping in to take a shot at this:

The only thing discussed so far that would be virtually guaranteed to break the equals contract is "instanceof without final", as it breaks transitivity.

My understanding of the claim being made here is that if you have Point and ColoredPoint, that the sane approaches are:

I read @ejona86 to be discussing the scenario where ColoredPoint extends Point, Points are equal to Points, and ColoredPoints are equal to ColoredPoints, and a Point cannot be equal to a ColoredPoint. To be clear: this is an equivalence relation; it is reflexive, symmetric, and transitive; I believe it to technically fulfill the equals contract.

I understand Kevin to be claiming that despite technically fulfilling the contract, this behavior is surprising and bug-prone. Consider a Set<Point>: if ColoredPoints appear in this Set, I think it's reasonable to say that users are likely to be confused about what elements appear in that set and what deduplication happens. If, as an implementation detail change, some method changes from returning a Point to a ColoredPoint, and other code adds the result of that method to a Set, the behavior will change drastically, and I can readily imagine highly difficult-to-trace bugs resulting from that change.

That potential bug is not an issue if either ColoredPoint inherits equals from Point without modification, or if we replace inheritance with composition and ColoredPoint does not extend Point. In the former case, the behavior is preserved; in the latter case, the change to replace Point with ColoredPoint breaks the compile.

tbroyer commented 5 years ago

Fwiw, I agree with you @lowasser, I'd also add the case of using the Point as a key in a Map, which is actually the exact situation in grpc.

ejona86 commented 5 years ago

I read @ejona86 to be discussing the scenario where ColoredPoint extends Point, Points are equal to Points, and ColoredPoints are equal to ColoredPoints, and a Point cannot be equal to a ColoredPoint.

Yes, that implementation is technically valid (although may be confusing, in this case). And I was also considered the opposite case: if a Point can be equal to a ColoredPoint, then ColoredPoint must not use "color" in its equals function (for transitivity). That also means you should be able to pass a Point instead of a ColoredPoint and expect nothing to break (which doesn't sound true).

If, as an implementation detail change, some method changes from returning a Point to a ColoredPoint...

Yes. I agree there's issue with this transition and instanceof may be most appropriate (although it seems we all agree that composition is superior). I'm not arguing against this case. I'm mainly arguing that there are cases different from that one.

That potential bug is not an issue if either ColoredPoint inherits equals from Point without modification, or if we replace inheritance with composition and ColoredPoint does not extend Point.

Yes, which I had mentioned with '2) use instanceof and make equals() final, or 3) "don't do that"'.

I have come up with a more concrete example of ExponentialBackOffPolicy. Assume for the moment that it has an equals method, say, to make testing easier. (We test some code that parses configuration and we want to verify the policy was created correctly.) And then someone wants to extend it to change isBackOffRequired(). The subclass should not be substituted for the parent class.

I've discussed some with @kevinb9n and it seems we came to a partial understanding (not to say agreement!). But we're letting things sit for the moment since we have "real work to do."

Some things that seemed interesting from our discussion (and I hope I'm not putting words in Kevin's mouth; none of this is binding and was in a quick discussion):

  1. getClass() really doesn't matter as much if people didn't abuse inheritance as much as they do and instead favored composition and interfaces. This impacts the ExponentialBackOffPolicy case, for example. But we also live in the real world
  2. When talking about Liskov substitution principle, Object.equals() seems somehow "special" that provides it with an exemption. But without much thought it was hard to say why
  3. The only real use case for overriding an equals() method that is using instanceof (like Point) is for performance, as it has to provide the same output as its superclass