Trove4j is an order of magnitude faster

vigna / fastutil

fastutil extends the Java™ Collections Framework by providing type-specific maps, sets, lists and queues.

Apache License 2.0

1.8k stars 197 forks source link

Trove4j is an order of magnitude faster #141

Closed tomerz90 closed 4 years ago

tomerz90 commented 4 years ago

I ran this code:

FastRandom random = new FastRandom();
    Long2ObjectLinkedOpenHashMap map = new Long2ObjectLinkedOpenHashMap(100_000_000);

    for (int i = 0; i < 100_000_000; i++) {
      map.put(random.nextLong(), Arrays.asList(1));
    }

    System.out.println("Created");

    long start = System.currentTimeMillis();
    for (long i = 0; i < 100_000_000_0; i++) {
      map.get(i);
    }
    System.out.println(System.currentTimeMillis() - start);

It took 140 seconds, just the get part. The same code with a trove equivalence of a Long to Object hash map takes 51 seconds with the same load factor. This sounds weird to me since I understood fastutil is supposed to be significantly faster, am I doing something wrong?

vigna commented 4 years ago

I don't know, you're not providing the trove code. Maybe it has faster negative queries.

tomerz90 commented 4 years ago

The classes are generated so I cant link them, the dependency is:

<dependency>
      <groupId>net.sf.trove4j</groupId>
      <artifactId>trove4j</artifactId>
      <version>3.0.3</version>
    </dependency>

Insertion seems to be a lot faster as well, however it seems trove is not maintained anymore so it would be awesome if this lib could give comparable performance.

tomerz90 commented 4 years ago

The performance for the same above code when all keys are in the map is 5.2 seconds for trove and for 52 seconds fastutil

vigna commented 4 years ago

No. Not the Trove source code. YOUR TEST CODE.

tomerz90 commented 4 years ago

Oh, like I said above, its the same just replacing the map:

FastRandom random = new FastRandom();
    TLongObjectHashMap map = new TLongObjectHashMap(100_000_000);

    for (int i = 0; i < 100_000_000; i++) {
      map.put(random.nextLong(), Arrays.asList(1));
    }

    System.out.println("Created");

    long start = System.currentTimeMillis();
    for (long i = 0; i < 100_000_000_0; i++) {
      map.get(i);
    }
    System.out.println(System.currentTimeMillis() - start);

vigna commented 4 years ago

You're replacing with a different map. The other map is linked. It keeps track of a doubly linked list. You're comparing apples and oranges.

tomerz90 commented 4 years ago

I tried it with Long2ObjectOpenHashMap as well, same performance... Is there any specific implementation I should try?

vigna commented 4 years ago

Frankly I don't know. It cannot have the same performance because it is accessing much more memory and doing much more stuff. 🤷🏻‍♂️

I would check twice everything you're doing a maybe use JMH instead of a loop. It also appears you're not repeating the test, which means you might get garbage collection in the middle. Moreover, testing all negative keys in a sequence is going to trigger different behaviour, but it's a quite unlikely scenario.

vigna commented 4 years ago

And, you're not letting the JIT compiler do its job.

tomerz90 commented 4 years ago

Yes but its the same for both, meaning if one doesnt have time to be jitted, neither is the other impl. Same for the GC im guessing, but I might be wrong. OK, thanks anyway!

vigna commented 4 years ago

Many people have tested these implementations, and the results are all the same. I'm just pointing out the obvious shortcomings of your "benchmark".

tomerz90 commented 4 years ago

"...and the results are all the same", what are the results? That fastutil should be faster?

incaseoftrouble commented 4 years ago

Yes but its the same for both, meaning if one doesnt have time to be jitted, neither is the other impl. Same for the GC im guessing, but I might be wrong.

This is very dangerous to assume. For example, allocation strategies might be different (-> triggers GC behaviour at different times). Similarly, JIT highly depends on how the code is structured. Even things like string annotations in asserts can make or break JIT optimizations (as I have learned the hard way ...). Whenever you want to compare performance in a synthesized benchmark, definitely use JMH (or similar). It is so so easy to get misleading benchmarks. Or use them both in "real world" applications (i.e. algorithms) and measure differences. This is the most interesting measurement anyway, in my opinion.

What vigna (probably) is saying that if you measure equal perfomance for linked and non-linked maps then its likely that your way of measuring is flawed.