nfergu / hashtableperf

Tests hash table performance across 3 different languages: C++, Java, and Python
Apache License 2.0
5 stars 0 forks source link

Introduction

This project tests hash table performance across 3 different languages: C++, Java, and Python (including Cython). For C++, the STL's unordered_map is used. For Java, the java.util.HashMap class is used. For Python, the dict type is used.

Each test repeatedly generates numbers between zero and 9999 and puts these into a hash table. Another random number is then retrieved from the hash table. This is repeated for one billion iterations. There is a warm-up period of one hundred million iterations before the main test.

Note that the test makes no attempt to isolate has table performance from the other parts of the test, so it also tests the performance of loop iteration, and some integer arithmetic. My assumption when designing the tests was that loop iteration and integer arithmetic would be cheap compared to the hash table operations. However this does not appear to be the case for some languages (like Python, for example). See the "Observations" section for further details.

Results

The following results were obtained by running the tests:

The results are presented in terms of iterations per second. In other words, how many times we can be update an entry in the hash table, then retrieve another entry, in one second (higher is better).

So, compared to the fastest Java test, the fastest C++ test is about 1.05 times slower, and the Python 2 test is around 30 times slower. Cython is around 4 times slower than the fastest Java test.

The following graph shows these results:

Performance Graph

Observations

Some observations on the results:

Configuration

Platform

All tests were performed on an Intel E5-2609 v2 @ 2.50GHz running Ubuntu 14.04.

GCC

GCC version 4.9.3 was used for the GCC 4 tests. GCC version 5.3.0 was used for the GCC 5 tests. The following options were used in both cases:

g++ -otesttableperf -std=c++11 -O3 src/test_table_perf.cpp

This compiles with an optimization level of 3. Different optimization levels were tried. Optimization levels 2 and 3 were the fastest (but there was not a significant difference levels 2 and 3).

Clang

Clang version 3.8.0 was used for the Clang tests. The following options were used:

clang++ -otesttableperf -std=c++11 -O3 src/test_table_perf.cpp

This compiles with an optimization level of 3. Different optimization levels were tried. Optimization levels 2 and 3 were the fastest (but there was not a significant difference levels 2 and 3).

Java

The Java 8 tests were performed (compiled and executed) using Oracle JDK 1.8.0._74. The Java 7 tests were performed using Oracle JDK 1.7.0_79. The tests were executed using the following options:

java -mx512m -XX:NewSize=128m -XX:MaxNewSize=128m -XX:-UseParallelOldGC -XX:ParallelGCThreads=1 TestTablePerf

These options have the effect of forcing frequent garbage collection (about 4 times per second) in the test. This means that the majority of the garbage is collected by the end of the test loop (but just to be sure we call System.gc() before completing the test). A single thread is used for garbage collection, since all of our tests are single-threaded.

Python

The Python 2 tests were performed using Python 2.7.11. The Python 3 tests were performed using Python 3.5.1. No options were specified for the python interpreter.

Cython

The Cython tests were performed using Cython version 0.24, along with Python version 2.7.11 and GCC version 5.3.0. The following options were used for Cython and GCC:

cython --embed -o src/test_table_perf_pyx.c  src/test_table_perf_pyx.pyx
gcc -O3 -I /usr/include/python2.7 -o src/test_table_perf_pyx src/test_table_perf_pyx.c -lpython2.7 -lpthread -lm -lutil -ldl

This compiles with an optimization level of 3. Other optimization levels were tried, but level 3 proved to be the fastest.

Running the Tests

GCC

g++ -otesttableperf -std=c++11 -O2 src/test_table_perf.cpp
./testtableperf

This assumes you have GCC installed.

Clang

clang++ -otesttableperf -std=c++11 -O2 src/test_table_perf.cpp
./testtableperf

This assumes you have Clang installed.

Java

javac -d . src/TestTablePerf.java
java -mx512m -XX:NewSize=128m -XX:MaxNewSize=128m -XX:-UseParallelOldGC -XX:ParallelGCThreads=1 TestTablePerf

This assumes you have a JDK installed.

Python 2

python2 src/test_table_perf.py

This assumes you have python 2 installed.

Python 3

python3 src/test_table_perf-python3.py

This assumes you have python 3 installed.

Cython

cython --embed -o src/test_table_perf_pyx.c  src/test_table_perf_pyx.pyx
gcc -O3 -I /usr/include/python2.7 -o src/test_table_perf_pyx src/test_table_perf_pyx.c -lpython2.7 -lpthread -lm -lutil -ldl
src/test_table_perf_pyx

This assumes you have Cython and GCC installed.

Feedback

I would appreciate feedback on the tests, particularly on possible performance improvements. Please open an issue or pull request in Github.