google / built_collection.dart

Immutable Dart collections via the builder pattern.
https://pub.dev/packages/built_collection
BSD 3-Clause "New" or "Revised" License
275 stars 53 forks source link

`BuiltList` won't compute hash as part of `==`. #269

Open lrhn opened 1 year ago

lrhn commented 1 year ago

The implementation of BuiltList.operator== used this.hashCode != other.hashCode as a short-cut in operator ==. However, computing hashCode can be as expensive as doing equality, since it has to visit all the same objects (otherwise the hash code won't be consistent with equality) and do a computation on each.

This change makes the hashCode only be used if it's already computed by both this and other, in which case it should be free.

It's a change in performance-behavior. A single == won't do double work, but repeated ==s on the same objects might have been more efficient if the hash code was computed eagerly. For elements in a set or map, the hash code will be computed anyway, so it should not affect those. Having a list of BuiltLists and doing repeated indexOf with the same values on the list might suffer.

I believe the trade-off is worth it, because most lists are not compared for equality, and those that are, are either one-off comparisons, or used as map keys or set elements. (But be free to disagree.)

davidmorgan commented 1 year ago

Thanks Lasse!

The use case I had in mind when I wrote this was for built_value models. Although built_value classes have the option to cache hashCode, they don't by default. So I thought, if there is some model e.g.

abstract class Model {
  BuiltList<Account> get accounts;
}

abstract class Account {
  BuiltList<Card> cards;
}

then I thought it might be a win if there is some caching of the list hash codes. It's not that the lists are expected to be large--rather that they may contain complex objects that don't cache their hash codes.

But, this was just a guess, I didn't (and still don't) have data to back it up.

I guess a primary use case for built_collection is in UIs, where change detection between very similar data models is probably important. But then it's actually "quickly return true" that's important, not "quickly return false". So I don't think it's particularly convincing.

I thought about calculating a more robust hash to allow quickly returning "almost certainly true" but did not actually explore that option :) in the end we would need real data to know if it's worth it.

Did you have any particular app/codebase in mind for this PR? If you have any real data that can be quite compelling against my lack of data ;)

Thanks.

lrhn commented 1 year ago

No specific use, I just ended up in the method when doing a code review and wondered about the design.

It seemed like overkill to check hash codes first, and then do equals afterwards anyway, precisely because the true path is usually the longest already. Having cached hash codes makes it an efficient quick-check, but if the hash code is not cached, computing it is likely as expensive as doing the equals check. That can still be valuable if it's expected that the hash code will be used again anyway, but for something like adding to a hash set or map, the hash code will be computed before the == check, so checking whether there is a cached hash seemed like the best compromise.