Examine why HashMap is slow

danieldietrich commented 9 years ago

@ruslansennov I made some simple tests and it seems that HashMap is relatively slow compared to other collections when inserting elements:

Inserting 1000000 elements...
Java HashMap test took 0.119 sec.
HashMap test took 28.979 sec.
Java TreeSet test took 0.235 sec.
TreeSet test took 0.82 sec.

This was my test:

package javaslang.collection;

import java.util.function.Function;

public class Test {

    static final int WARMUP_COUNT = 1_000;
    static final int COUNT = 1_000_000;

    public static void main(String[] args) {

        final Function<Integer, Runnable> javaMapTest = count -> () -> {
            java.util.Map<Integer, Integer> map = new java.util.HashMap<>();
            for (int i = 0; i < count; i++) {
                map.put(i, i);
            }
        };

        final Function<Integer, Runnable> javaSetTest = count -> () -> {
            java.util.Set<Integer> set = new java.util.TreeSet<>();
            for (int i = 0; i < count; i++) {
                set.add(i);
            }
        };

        final Function<Integer, Runnable> mapTest = count -> () -> {
            Map<Integer, Integer> map = HashMap.empty();
            for (int i = 0; i < count; i++) {
                map = map.put(i, i);
            }
        };

        final Function<Integer, Runnable> setTest = count -> () -> {
            Set<Integer> set = TreeSet.empty();
            for (int i = 0; i < count; i++) {
                set = set.add(i);
            }
        };

        // warmup
        measure(javaMapTest.apply(WARMUP_COUNT));
        measure(mapTest.apply(WARMUP_COUNT));
        measure(javaSetTest.apply(WARMUP_COUNT));
        measure(setTest.apply(WARMUP_COUNT));

        // test
        System.out.printf("Inserting %s elements...\n", COUNT);
        test("Java HashMap", javaMapTest);
        test("HashMap", mapTest);
        test("Java TreeSet", javaSetTest);
        test("TreeSet", setTest);

    }

    static void test(String collection, Function<Integer, Runnable> test) {
        System.out.printf("%s test took %s sec.\n", collection, measure(test.apply(COUNT)) / 1000.0d);
    }

    static long measure(Runnable unit) {
        final long start = System.currentTimeMillis();
        unit.run();
        final long time = System.currentTimeMillis() - start;
        gc();
        return time;
    }

    static void gc() {
        try {
            for (int i = 0; i < 10; i++) {
                System.gc();
                Thread.sleep(100);
            }
        } catch(InterruptedException x) {
            // nothin' to do
        }
    }
}

ruslansennov commented 9 years ago

Inserting 1000000 elements...
Java HashMap test took 0.106 sec.
HashMap test took 22.126 sec.
Java TreeSet test took 0.204 sec.
TreeSet test took 0.773 sec.
HashSet test took 21.423 sec.

I believe problems are memory allocation and GC It's so sad

danieldietrich commented 9 years ago

Mmhhh, we will see - I will investigate it with the profiler - there is always room for improvement :)

ruslansennov commented 9 years ago

Replacing

private final Array<AbstractNode<K, V>> subNodes;

by

private final Object[] subnodes

in HAMT will solve most of problems

danieldietrich commented 9 years ago

Ok - can you do it?

Is the subnodes array fixed size / has it to grow? If it is fixed size (which is typical for Tries), then we should instantiate it once without growing it later. Mmhh, but on write access a full copy has to be made in O(n)... don't know - have to look at the impl

ruslansennov commented 9 years ago

ok

ruslansennov commented 9 years ago

No, it was bad idea. It will be mutable object. We need for another trick

danieldietrich commented 9 years ago

I will investigate it...

danieldietrich commented 9 years ago

What!? Seems to be the sum operation (maybe when calculating the node size, taking all children into account...) Will have a closer look now...

screen shot 2015-09-07 at 11 23 04 pm

danieldietrich commented 9 years ago

Oohh yes - it is most probably the usage of Match (within TraversableOnce.sum et al.) because it uses lambda reflection.

There are two types of Match: MatchMonad and MatchFunction. In performance-critical cases, like here, one should use the MatchFunction, because the lambda serialization is only done once and then it should be ultra-performant!

I will check that now!

danieldietrich commented 9 years ago

// before: Java HashMap bench took 22.126 sec.
this.size = subNodes.map(HashArrayMappedTrie::size).sum().intValue();

// after: Javaslang HashMap bench took 9.753 sec.
this.size = subNodes.map(HashArrayMappedTrie::size).fold(0, (i, j) -> i + j);

Will now profile & improve further...

danieldietrich commented 9 years ago

Primitive boxing/unboxing is evil. I will provide a simple size function for the nodes... easy!

screen shot 2015-09-07 at 11 43 28 pm

danieldietrich commented 9 years ago

calculating size with primitives only brought:

Javaslang HashMap bench took 5.761 sec.

Question: Why we still use List, e.g. in IndexedNode? LeafNode should be ok but there are cases where we use list.get(index). Using Array here would be better. Later (with 2.0.1) we may use Vector (backed by bitmapped vector trie)

danieldietrich commented 9 years ago

I think we should not calculate the size in general. Instead it should move from the HAMT to HashMap, maybe as Lazy... This will boost the performance.

danieldietrich commented 9 years ago

Without calculating the size: Javaslang HashMap bench took 3.711 sec.

danieldietrich commented 9 years ago

Now Array.update is eating all the time. Copying the internal array is O(n). We need to exchange Array with the Vector impl mentioned about. Then it will really fly!!

screen shot 2015-09-08 at 12 03 22 am

danieldietrich commented 9 years ago

We need to replace occurrences of

List.insert(int, Object)
Array.update(int, Object)

within HashArrayMappedTrie (HAMT).

We accomplish this by implementing Vector (see #568) and use it as defaul Seq implementation throughout HAMT.

danieldietrich commented 9 years ago

@ruslansennov btw - the Array implementation is absolutely fine! It is exactly the same as in Scala.

danieldietrich commented 9 years ago

Mmmhh, Vector.prepend and Vector.insert are also O(n). Therefore Vector is no viable alternative to the exisiting HAMT subNodes types Array and List. I think, we are fine here and the HashMap stays as is for now.

l0rinc commented 8 years ago

Vector.prepend and Vector.insert are also O(n)

We can make prepend to be effectively constant via the bit-mapped vector trie by providing an offset, as Scala does it also (I think), and insert to be O(n/2) worst case (not sure how Scala does this).

danieldietrich commented 8 years ago

@paplorinc This sounds great! We should definitely take a look a the Scala source. Daniel Spiewak did a great job contributing to the Scala collection library. Please view his talk starting at minute 45:00. You can find the source code here.

Daniel Spiewak said in his talk that the bit-mapped vector trie even performed better than the mutable Java version for very big collections (Map?). The bit-mapped vector trie is great for indexed/random access reads. We should keep our initial bit-mapped vector trie as simple as possible (along the existing Scala versions) and apply specific optimizations later.

l0rinc commented 8 years ago

We should keep our initial bit-mapped vector trie as simple as possible (along the existing Scala versions) and apply specific optimizations later.

I already have an optimized create, iterate, get, update, and am working on drop, which will provide tail and init also. I'm hoping that I can achieve effectively constant time for drop(right), prepend - and maybe even merge and random insert- also, but it gets quite complicated (there's no real documentation), I'm basically reinventing everything, as the optimized Scala and Clojure code is too criptic for me (and Clojure doesn't even provide prepend or slice).

danieldietrich commented 8 years ago

I'm hoping that I can achieve effectively constant time for (...) I'm basically reinventing everything (...)

That's a noble goal. But for the health of the code-base it is most important that the code stays simple, understandable and maintainable - by more than one person ;)

Please keep in mind that it is not our goal to have an universal collection that performs overall in O(1) (said exaggerated). Every algorithm has its strengths and weaknesses. We should embrace these as developers. The user has to choose the right collection for the right purpose.

If you are able to achieve quick wins by tweaking the base implementation of a bit-mapped vector trie, that's ok.

Please keep also in mind that often performance comes at cost of memory. Such improvements are candidates to not make it into the code base. Examples of such improvements we rejected are the storage of additional variables per node (like lazy hash code) - or the recent drop optimizations etc.

Just saying.

vavr-io / vavr

Examine why HashMap is slow #571