vavr-io / vavr

vʌvr (formerly called Javaslang) is a non-commercial, non-profit object-functional library that runs with Java 8+. It aims to reduce the lines of code and increase code quality.
https://vavr.io
Other
5.72k stars 636 forks source link

Make equality checks configurable for Hash based collections #2041

Open chb0github opened 7 years ago

chb0github commented 7 years ago

Currently, TreeSet allows for the definition of a Comparator and I have often felt like being able to define the hashing/equals function in the same way for HashSet etc. would be seriously useful. It came up today in discussion

Eg.

Example 1: public static <T> HashSet<T> of(Function<? super T,?> keyProperty, T... values)

This would use the function keyProperty to get a property value to be used with a default hash/equals strategy which currently delegates to HashArrayMappedTrie So, the current public static <T> HashSet<T> of(T... values) could set it's hashing function to Function.identity() by default.

Then, when supplying the value to be added/removed, etc to the the underlying Trie, it would always delegate to the Hashing Function

This would be a handy addition that would be easy enough to implement. Thoughts?

chb0github commented 7 years ago

I may have already answered myself with HashSet.distinctBy(Function<? super T> keyExtractor)

Which makes for a fluent function like HashSet.distinctBy(MyObject::id).add(new MyObject(1));

ruslansennov commented 7 years ago

I may have already answered myself with HashSet.distinctBy(Function<? super T> keyExtractor)

If I understood you correctly, it will not help you. For your use case you should use wrapper class which holds the object and the hashing function. Or you can overwrite hashCode method in your object.

It is possible to add requested functionality to HashSet and HashMap, but I'm not sure this would be seriously useful as you said... @danieldietrich ?

chb0github commented 7 years ago

@ruslansennov Something close already exists: HashSet.distinctBy(MyObject::getId).of(new MyObject()); But it doesn't appear that the resultant set from that call actually returns a HashSet where the hashing is based on the result of the function.

As for hash/equals - Yes, I could do that; it's Java 101. To do that would require me modifying the original object or creating a new object. And, if we are following the java equals/hash bible, then equals must reject any subclass.

This would be an easy implementation. HashSet would contain a Function that would extract the key with every call and pass it along for the hash/equals compute.

In the constructor you would have

private HashSet(Function<? super T, ?> keyExtractor) {
   this.keyExtractor = keyExtractor
}

private HashSet() {
   this(Function.identity());
}
@Getter
@AllArgsConstructor
@EqualsAndHash 
public class MyObject {
     private Integer id;
     private String something;
}
Set<MyObject> mySet = HashSet.keyedBy(MyObject::getId).add(new MyObject(1,"foo"));
mySet.contains(new MyObject(1,null); // true, second argument is ignored

Set<MyObject> oldSet = HashSet.keyedBy(MyObject::getId).add(new MyObject(1,"foo"));
oldSet.contains(new MyObject(1,null)); // false, assuming it doesn't blow up with a null

I am looking for functional parity with TreeSet

Basically, a method to override identity without defining a new object. Just like TreeSet

It seems like HashSet.distinctBy comes close to filling the role but it's like a view.

danieldietrich commented 7 years ago

I think we should not bake a key-extractor functionality into HashSet. I see several cons:

We could still achieve the behavior by

chb0github commented 7 years ago

I concede these points but see them as not detrimental.

You're alternatives counter points:

When I look at how it's currently done the hashCode is determined based on the key input and thus this should/would be a transparent change.

Using a HashMap is essentially the same but more verbose (IMO) - I also thought of a downside to my own thoughts (just to prove I am reasonable): Once the set is created with such a function it would be entirely lost that it's not behaving as the default set. Then again, this case exists for TreeMap too.

The whole idea is for sake of convenience and code brevity. Since it's your show @danieldietrich if you're entirely against the idea then just close the ticket.

danieldietrich commented 7 years ago

I like the idea. We could do it by introducing

@FunctionalInterface
interface Equal<T> {

    static <T> Equal<T> naturalEquality() {
        return Objects::equals;
    }

    boolean areEqual(T t1, T t2);
}

Our HashSet and HashMap are backed by HashArrayMappedTrie (HAMT) nodes, similar to TreeSet and TreeMap are backed by RedBlackTree (RBT). Currently RBT nodes own a reference to the underlying Comparator. We should remove that ref and add the comparator as argument to each method of RBT. Then we are able to keep only a single ref to the comparator in the wrapper TreeSet or TreeMap.

For HashSet/HashMap it would work the same way. Only these wrappers store a reference to Equal. The methods of the HAMT types need an additional argument of type Equal.

nfekete commented 7 years ago

I would recommend EqualityChecker as name for such an interface.

danieldietrich commented 7 years ago

Equal is a common name. For example it is known from Scalaz (see here and here).

Equal is a predicate a type is attributed with. EqualityChecker is more like a noun, a subject that does s.th. on its own. This is not the way we think. In fact Equal (resp. areEqual) is a function that is called from the outside. A function is used, it does not act by itself.