Open mhansen opened 4 days ago
Mmmmh. There is a merge implementation for hash maps, so I'm wondering why it isn't called. Don't you see it in a stack trace? In any case, that wouldn't solve your problem, in the sense that fastutil internal functions find and insert do not carry the hash. You must understand fastutil was designed more than 20 years ago, when compound operations such as merge and compute were not in the interface. It is right that in the present situation it would be better to have find and insert accepting a hash, so that compound operations compute it just once. It's a good suggestion, I just don't know when I'll find the bandwidth to implement it (but your are free to send PRs 😂; just joking, I know it's complex code).
Thank you for this background!
I'll go off what I see in the javadoc as it's a little hard to link to the generated code.
There is a merge implementation for hash maps, so I'm wondering why it isn't called. Don't you see it in a stack trace?
I see there's an implementation for Object2DoubleOpenHashMap.merge(K k, double v, BiFunction<? super Double,? super Double,? extends Double> remappingFunction)
: https://javadoc.io/static/it.unimi.dsi/fastutil/8.5.15/it/unimi/dsi/fastutil/objects/Object2DoubleOpenHashMap.html#merge(K,double,java.util.function.BiFunction)
That's the boxing function that takes Double
.
But the primitive mergeDouble(K key, double value, DoubleBinaryOperator remappingFunction)
function is listed under "Methods inherited from class it.unimi.dsi.fastutil.objects.AbstractObject2DoubleMap". So there isn't a specialization/override of mergeDouble
for Object2DoubleOpenHashMap
.
It is right that in the present situation it would be better to have find and insert accepting a hash, so that compound operations compute it just once.
I'm OK with this way to do it; but I was thinking that we could keep the current definition of find
what returns an int
, and use that returned position to update in-place. insert
looks like it takes a position too?
I have no expectation that you get around to it, soon or ever. I might be able to suggest a PR if we can figure out that it's feasible.
Hi, I was recently optimising some hashmap heavy code. We got a big speedup from moving from
HashMap<Object, Double>
toObject2DoubleOpenHashMap
, so first let me say thank you for fastutil, it's great.We noticed an opportunity in the profiling afterwards though. I expected
mergeDouble
to have an optimisation to hash the input once, then find the position of the data, then replace the data in that position. LikeHashMap.compute
.But in the profile I see that
mergeDouble
is implemented only in the interface asObject2DoubleMap.mergeDouble
, which doesn't know about hashing, and is implemented in terms of callingdouble getDouble(Object)
thenput(Object, double)
This ends up hashing theObject
twice, and finding the hashmap position twice (including calling.equals
to see if it's the right position).It's not a big deal or anything; we're more than happy with the performance improvements delivered. But I thought I'd report it in case it's an idea you like. Here's a flamegraph: