vigna / fastutil

fastutil extends the Java™ Collections Framework by providing type-specific maps, sets, lists and queues.
Apache License 2.0
1.81k stars 199 forks source link

LongOpenHashSet.removeAll(LongCollection) too slow #50

Closed schabe77 closed 7 years ago

schabe77 commented 7 years ago

If I call LongOpenHashSet.removeAll(LongCollection), then AbstractLongCollection.removeAll(LongCollection) ist called.

For each long of the given collection, this method iterates over the whole Set until it finds a match. If a match is found, it's removed.

That may be useful for Lists, but for Sets it doesn't make sense, because it's extremely slow on large Sets.

For Sets it should just call remove for each given collection element.

schabe77 commented 7 years ago

For AbstractLongCollection.removeAll(Collection<?>) it works correctly (calls remove(Object), which is defined in AbstractLongSet). it.unimi.dsi.fastutil.longs.LongOpenHashSet.remove(long):446it.unimi.dsi.fastutil.longs.AbstractLongSet.remove(Object):106 it.unimi.dsi.fastutil.longs.AbstractLongCollection.removeAll(AbstractLongCollection.java:332)

for AbstractLongCollection.removeAll(LongCollection) the AbstractLongCollection.rem(long) method is called, which iterates over the whole Set for each element to remove. it.unimi.dsi.fastutil.longs.LongOpenHashSet$SetIterator.remove(LongOpenHashSet.java:628) it.unimi.dsi.fastutil.longs.AbstractLongCollection.rem(AbstractLongCollection.java:265) it.unimi.dsi.fastutil.longs.AbstractLongCollection.removeAll(AbstractLongCollection.java:170)

vigna commented 7 years ago

OK. I now understand the problem. It's somewhat complicated—the rem() method was born out of necessity, as remove(int) has two meanings for non-lists and for lists. So a different method was needed to remove type-specific elements.

Given the last few lines of AbstractSet.drv, I think my idea (fifteen years ago :cough:) was that every implementor of Set should have implemented rem() of a specific type, not remove(). This idea clearly got lost somewhere in the process—all Set implementations implement remove() on primitive types.

As a result, rem() is called in removeAll(). The solution is retouching all non-list classes so that they override rem().

vigna commented 7 years ago

Try this beta: http://vigna.di.unimi.it/fastutil-7.0.14.jar and let me know if it works for you.

vigna commented 7 years ago

BTW, can you send me your name by email? I usually add bug reporters to the changelog.

schabe77 commented 7 years ago

Hi Sebastiano,

I tried it and it works like a charm.

LongSet set = new LongOpenHashSet(); for (long i = 0L; i <= 1_000_000L; i++) { set.add(i); } LongSet removals = new LongOpenHashSet(); for (long i = -20_000L; i <= 20_000L; i++) { removals.add(i); } long start = System.currentTimeMillis(); set.removeAll(removals); System.out.println(System.currentTimeMillis() - start);

I tested set.removeAll(removals); and set.removeAll((Collection<Long>) removals);

with 7.0.13 the first is extremely slow, in 7.0.14 it works perfectly as expected.

Thanks for the fast bugfix and for your wonderful software!