Closed danieldietrich closed 9 years ago
Inserting 1000000 elements...
Java HashMap test took 0.106 sec.
HashMap test took 22.126 sec.
Java TreeSet test took 0.204 sec.
TreeSet test took 0.773 sec.
HashSet test took 21.423 sec.
I believe problems are memory allocation and GC It's so sad
Mmhhh, we will see - I will investigate it with the profiler - there is always room for improvement :)
Replacing
private final Array<AbstractNode<K, V>> subNodes;
by
private final Object[] subnodes
in HAMT will solve most of problems
Ok - can you do it?
Is the subnodes array fixed size / has it to grow? If it is fixed size (which is typical for Tries), then we should instantiate it once without growing it later. Mmhh, but on write access a full copy has to be made in O(n)... don't know - have to look at the impl
ok
No, it was bad idea. It will be mutable object. We need for another trick
I will investigate it...
What!? Seems to be the sum operation (maybe when calculating the node size, taking all children into account...) Will have a closer look now...
Oohh yes - it is most probably the usage of Match (within TraversableOnce.sum et al.) because it uses lambda reflection.
There are two types of Match: MatchMonad and MatchFunction. In performance-critical cases, like here, one should use the MatchFunction, because the lambda serialization is only done once and then it should be ultra-performant!
I will check that now!
// before: Java HashMap bench took 22.126 sec.
this.size = subNodes.map(HashArrayMappedTrie::size).sum().intValue();
// after: Javaslang HashMap bench took 9.753 sec.
this.size = subNodes.map(HashArrayMappedTrie::size).fold(0, (i, j) -> i + j);
Will now profile & improve further...
Primitive boxing/unboxing is evil. I will provide a simple size function for the nodes... easy!
calculating size with primitives only brought:
Javaslang HashMap bench took 5.761 sec.
Question: Why we still use List, e.g. in IndexedNode? LeafNode should be ok but there are cases where we use list.get(index)
. Using Array here would be better. Later (with 2.0.1) we may use Vector (backed by bitmapped vector trie)
I think we should not calculate the size in general. Instead it should move from the HAMT to HashMap, maybe as Lazy
Without calculating the size: Javaslang HashMap bench took 3.711 sec.
Now Array.update is eating all the time. Copying the internal array is O(n). We need to exchange Array with the Vector impl mentioned about. Then it will really fly!!
We need to replace occurrences of
List.insert(int, Object)
Array.update(int, Object)
within HashArrayMappedTrie (HAMT).
We accomplish this by implementing Vector (see #568) and use it as defaul Seq implementation throughout HAMT.
@ruslansennov btw - the Array implementation is absolutely fine! It is exactly the same as in Scala.
Mmmhh, Vector.prepend and Vector.insert are also O(n). Therefore Vector is no viable alternative to the exisiting HAMT subNodes types Array and List. I think, we are fine here and the HashMap stays as is for now.
Vector.prepend
andVector.insert
are also O(n)
We can make prepend
to be effectively constant via the bit-mapped vector trie by providing an offset, as Scala does it also (I think), and insert
to be O(n/2)
worst case (not sure how Scala does this).
@paplorinc This sounds great! We should definitely take a look a the Scala source. Daniel Spiewak did a great job contributing to the Scala collection library. Please view his talk starting at minute 45:00. You can find the source code here.
Daniel Spiewak said in his talk that the bit-mapped vector trie even performed better than the mutable Java version for very big collections (Map?). The bit-mapped vector trie is great for indexed/random access reads. We should keep our initial bit-mapped vector trie as simple as possible (along the existing Scala versions) and apply specific optimizations later.
We should keep our initial bit-mapped vector trie as simple as possible (along the existing Scala versions) and apply specific optimizations later.
I already have an optimized create
, iterate
, get
, update
, and am working on drop
, which will provide tail
and init
also.
I'm hoping that I can achieve effectively constant time for drop(right)
, prepend
- and maybe even merge
and random insert
- also, but it gets quite complicated (there's no real documentation), I'm basically reinventing everything, as the optimized Scala and Clojure code is too criptic for me (and Clojure doesn't even provide prepend or slice).
I'm hoping that I can achieve effectively constant time for (...) I'm basically reinventing everything (...)
That's a noble goal. But for the health of the code-base it is most important that the code stays simple, understandable and maintainable - by more than one person ;)
Please keep in mind that it is not our goal to have an universal collection that performs overall in O(1) (said exaggerated). Every algorithm has its strengths and weaknesses. We should embrace these as developers. The user has to choose the right collection for the right purpose.
If you are able to achieve quick wins by tweaking the base implementation of a bit-mapped vector trie, that's ok.
Please keep also in mind that often performance comes at cost of memory. Such improvements are candidates to not make it into the code base. Examples of such improvements we rejected are the storage of additional variables per node (like lazy hash code) - or the recent drop optimizations etc.
Just saying.
@ruslansennov I made some simple tests and it seems that HashMap is relatively slow compared to other collections when inserting elements:
This was my test: