hrldcpr / pcollections

A Persistent Java Collections Library
Other
765 stars 79 forks source link

HashTreePMap perf: compare only keys (not values) for plus & minus operations #83

Closed AndreasVoellmy closed 4 years ago

AndreasVoellmy commented 4 years ago

https://github.com/hrldcpr/pcollections/blob/d88d5a52dd53b8ce0f4a824412b2878876dae8ea/src/main/java/org/pcollections/HashPMap.java#L121

I may be mistaken, but it looks to me like, when doing plus(key, value), the logic will first get the sequence of items whose keys have the same hash as the given key, and then the code will (due to line 121 in HashPMap and eventually the code for minus() in ConsPStack ) attempt to delete the element with the same key and value (the line first.equals(e) in ConsPStack), which requires applying equals() on values.

This seems like it will be quite slow for some use cases where values are complex and may have expensive equals methods.

Since this is a map, why not have plus/minus only compare keys?

hrldcpr commented 4 years ago

Hi, thanks for the report!

I don't actually see where values are ever being compared in this code, only keys (as desired).

The line you referenced is just removing a specific index from the entries list, so that doesn't do any comparing at all. And that index comes from keyIndexIn which just loops through the entries and compares keys (not values) https://github.com/hrldcpr/pcollections/blob/d88d5a52dd53b8ce0f4a824412b2878876dae8ea/src/main/java/org/pcollections/HashPMap.java#L149

(same story with minus)

But definitely let me know if I'm missing something! And apologies this code is a bit old and hard to follow 🙃

hrldcpr commented 4 years ago

oh I think maybe the confusion is that you were looking at the implementation of ConsPStack.minus(Object) as opposed to ConsPStack.minus(int)

AndreasVoellmy commented 4 years ago

Oh, I see the issue. I was using version 2.1.2, whose implementation in ConsPStack.minus(int) is

    public ConsPStack<E> minus(final Object e) {
        if(size==0)
            return this;
        if(first.equals(e)) // found it
            return rest; // don't recurse (only remove one)
        // otherwise keep looking:
        ConsPStack<E> newRest = rest.minus(e);
        if(newRest==rest) return this;
        return new ConsPStack<E>(first, newRest);
    }

    public ConsPStack<E> minus(final int i) {
        return minus(get(i));
    }

So you see that it calls minus with the object, and that contains the value, and equals on this entry is invoked in the expression first.equals(e).

But I see that in the latest version this implementation is changed. Now it is

public ConsPStack<E> minus(final int i) {
    if (i < 0 || i >= size) throw new IndexOutOfBoundsException("Index: " + i + "; size: " + size);
    else if (i == 0) return rest;
    else return new ConsPStack<E>(first, rest.minus(i - 1));
  }

which does not have the problem. So we can close this issue.

hrldcpr commented 4 years ago

Ah interesting! I forgot that change ever happened (glad it did though! in #30 apparently)

Thanks for the report though, glad we double-checked.