pingidentity / ldapsdk

UnboundID LDAP SDK for Java
Other
334 stars 81 forks source link

Undersized maps and sets (load factor not taken into account) #50

Closed frankk3 closed 6 years ago

frankk3 commented 6 years ago

Hi everyone!

May I start a discussion about pre-sizing of maps and sets: according to openJDK8's javadoc, HashMaps are rehashed if more entries are stored than the product of the load factor and the current capacity:

When the number of entries in the hash table exceeds the product of the load factor and the current capacity, the hash table is rehashed (that is, internal data structures are rebuilt) so that the hash table has approximately twice the number of buckets.

This applies e.g. for HashMaps / LinkedHashMaps / HashSets and LinkedHashSets (but not for Lists): you can store only 0.75 * initialCapacity of elements.

Having a look at unboundId LDAP SDK sources, it looks like these collections are most often undersized if the number of elements is known in advance and all entries are stored immediately after creating the map, e.g. com.unboundid.ldap.sdk.Entry lines 234, 283, etc:

this.attributes = new LinkedHashMap<>(attributes.length);

or com.unboundid.ldap.sdk.Attribute lines 1734ff:

final HashSet<ASN1OctetString> unNormalizedValues = new HashSet<>(values.length);
Collections.addAll(unNormalizedValues, values);

For best performance, rehashing should be avoided, i.e. the initial capacity of maps or sets should take the load factor into account, e.g.

final int initialCapacity = (int) (expectedNumberOfElements/.75f) + 1

resulting in

this.attributes = new LinkedHashMap<>((int) (attributes.length/.75f) + 1);

To illustrate the rehashing algorithm (internal map size is always a power of two, rehashing takes place if load factor based threshold is exceeded), I've attached a short test class:

InitialCapacity.java.txt

Usually rehashing shouldn't make a big difference if only a few elements are stored, but if there are many different ones, it's worth to avoid rehashing if possible.

What's your opinion about that?

Best regards

Frank

dirmgr commented 6 years ago

This should be addressed by the changes in commit 56f3d1315e5289a82929b3699d923cc4a4faf53c

frankk3 commented 6 years ago

Many thanks that you took care of my comment and that you realized the changes with your commit 56f3d13 (a really impressive one) so quickly.

I'm looking forward to the next release 4.0.9.