google / error-prone

Catch common Java mistakes as compile-time errors
https://errorprone.info
Apache License 2.0
6.82k stars 740 forks source link

DistinctVarargsChecker produces a lot of false positives #4544

Open pkoenig10 opened 1 month ago

pkoenig10 commented 1 month ago

DistinctVarargsChecker only compares arguments using their textual representation, which is an unreliable way to determine if two arguments are equivalent.

For example, DistinctVarargsChecker emits a warnings for this code:

Set.of(UUID.randomUUID(), UUID.randomUUID())
pkoenig10 commented 1 month ago

It looks like this has been the behavior since DistinctVarargsChecker was introduced, but I started seeing a lot more false positives after the change from https://github.com/google/error-prone/pull/4449. I guess we frequently use Set.of and Map.of with these types of helpers methods.

cushon commented 4 weeks ago

Thanks for the report.

I think there are a few places where checks make assumptions that methods are idempotent in situations like this. Another example is IdentityBinaryExpression where it discourages foo() && foo() even though foo() could be non-idempotent and return different results, and suggests refactoring for clarity if it's deliberate. I think it's reasonable to disagree about whether that's worth discouraging, I'm just trying to provide some background on the thinking behind the check.

Was the example you encountered literally Set.of(UUID.randomUUID(), UUID.randomUUID()), or is there any more context you can share about the false positives you encountered? Maybe there are heuristics that could be added to the check to cover some of the common cases.

pkoenig10 commented 4 weeks ago

The examples I encountered weren't exactly that, but were effectively the same thing. This was primarily in test code where we were generating IDs and entities to for tests.

Set.of(createUserId(), createUserId())

I don't have any great ideas for a heuristic, I can see how you might want this check if the method being invoked here was something like an accessor.

graememorgan commented 6 days ago

I think it's reasonable to disagree about whether that's worth discouraging, I'm just trying to provide some background on the thinking behind the check.

I think the other consideration (possibly the main one?) is that we'd have a spectacular quantity of false negatives if we only flagged provably-identical results.

We have some logic in ConstantExpressions to identify expressions which are guaranteed to produce the same result, but it requires listing known pure methods, which is a bit of a challenge.