vigna / fastutil

fastutil extends the Java™ Collections Framework by providing type-specific maps, sets, lists and queues.
Apache License 2.0
1.81k stars 199 forks source link

Feature Request: Implement Object2DoubleOpenHashMap.mergeDouble to avoid double-hashing #336

Open mhansen opened 4 days ago

mhansen commented 4 days ago

Hi, I was recently optimising some hashmap heavy code. We got a big speedup from moving from HashMap<Object, Double> to Object2DoubleOpenHashMap, so first let me say thank you for fastutil, it's great.

We noticed an opportunity in the profiling afterwards though. I expected mergeDouble to have an optimisation to hash the input once, then find the position of the data, then replace the data in that position. Like HashMap.compute.

But in the profile I see that mergeDouble is implemented only in the interface as Object2DoubleMap.mergeDouble, which doesn't know about hashing, and is implemented in terms of calling double getDouble(Object) then put(Object, double) This ends up hashing the Object twice, and finding the hashmap position twice (including calling .equals to see if it's the right position).

It's not a big deal or anything; we're more than happy with the performance improvements delivered. But I thought I'd report it in case it's an idea you like. Here's a flamegraph:

image

vigna commented 3 days ago

Mmmmh. There is a merge implementation for hash maps, so I'm wondering why it isn't called. Don't you see it in a stack trace? In any case, that wouldn't solve your problem, in the sense that fastutil internal functions find and insert do not carry the hash. You must understand fastutil was designed more than 20 years ago, when compound operations such as merge and compute were not in the interface. It is right that in the present situation it would be better to have find and insert accepting a hash, so that compound operations compute it just once. It's a good suggestion, I just don't know when I'll find the bandwidth to implement it (but your are free to send PRs 😂; just joking, I know it's complex code).

mhansen commented 3 days ago

Thank you for this background!

I'll go off what I see in the javadoc as it's a little hard to link to the generated code.

There is a merge implementation for hash maps, so I'm wondering why it isn't called. Don't you see it in a stack trace?

I see there's an implementation for Object2DoubleOpenHashMap.merge(K k, double v, BiFunction<? super Double,? super Double,? extends Double> remappingFunction): https://javadoc.io/static/it.unimi.dsi/fastutil/8.5.15/it/unimi/dsi/fastutil/objects/Object2DoubleOpenHashMap.html#merge(K,double,java.util.function.BiFunction)

That's the boxing function that takes Double.

But the primitive mergeDouble(K key, double value, DoubleBinaryOperator remappingFunction) function is listed under "Methods inherited from class it.unimi.dsi.fastutil.objects.AbstractObject2DoubleMap". So there isn't a specialization/override of mergeDouble for Object2DoubleOpenHashMap.

It is right that in the present situation it would be better to have find and insert accepting a hash, so that compound operations compute it just once.

I'm OK with this way to do it; but I was thinking that we could keep the current definition of find what returns an int, and use that returned position to update in-place. insert looks like it takes a position too?

I have no expectation that you get around to it, soon or ever. I might be able to suggest a PR if we can figure out that it's feasible.