vigna / fastutil

fastutil extends the Java™ Collections Framework by providing type-specific maps, sets, lists and queues.
Apache License 2.0
1.8k stars 197 forks source link

Modularized version (of smaller size, e.g. without Big collections) #1

Closed kno10 closed 8 years ago

kno10 commented 9 years ago

At 16 MB, fastutil is a pretty large package.

It would be nice to have a modularized version. For example, many users may not need the Big variants. I imagine that the Big variants add a considerable amount of size to fastutil.

I can also imagine that a lot of users only need double, int and long; but it seems difficult on where to draw a line for a "core" subset that most people use.

Right now, I'm still using Trove which is just 2.5 MB. I like trove iterators more than Java-compatible iterators (in particular for iterating over primitive maps, it.key()+it.value() are both readable and avoid boxing "Entry" objects), and whenever I tried to switch to fastutil, gsc, Koloboke/HFT I found a lot of code to be less elegant because of Java-style iterators. The other thing that is holding me back is the size. Fastutil for example would double the size of our project instantly.

vigna commented 9 years ago

Trove is deadly slow. If you're happy with it, just use it :-). Many people find consistency with the java interfaces an asset rather than a liability...

Sent from my Android device with K-9 Mail. Please excuse my brevity.

kno10 commented 9 years ago

One may argue that Java APIs are dead slow, too. And there are a number of design errors that cannot be undone because of backwards compatibility, such as Iterator.remove(), or Map.Entry. If you want maximum performance, it may be necessary to break with them, or make them optional.

Indeed, Trove is slower than fastutil in many benchmarks (but I wouldn't call it "dead slow"); which is why I have been looking at alternatives. But fastutil and Koloboke are too fat by now in my opinion. At the <100k range of entries, the differences are not that big; and the popular benchmark did not include iteration over entries at all. So the cost of "new MapEntry" in fastutil (and others) hasn't been benchmarked there.

I just wanted to let you know why I am currently not using fastutil so far, despite it shining in some benchmarks.

vigna commented 9 years ago

Hash maps have a fast entry set that returns a fast iterator that does not create any object. It is a, safe, java-compatible way of avoiding insect creation. Did you try it?

If you think that the difference is small I think you should review your benchmarks :-)

Sent from my Android device with K-9 Mail. Please excuse my brevity.

smack42 commented 9 years ago

While I don't like big library jar files either, I don't see them as real problems. Just use a tool like ProGuard (skip the obfuscator if you don't like to use it) in your build process. It includes only those classes (and methods / fields) that your application actually uses, in the output jar file.