magro / kryo-serializers

More kryo serializers
Apache License 2.0
381 stars 120 forks source link

Guava Table support and smaller fixes #108

Closed maraswrona closed 5 years ago

maraswrona commented 5 years ago

TL;DR;

I noticed that the Guava's Table structure is not directly supported by this library. I need this in my project, so I decided to share my work.

What's inside:

Details

So I'm using kryo-serializers + subzero + hazelcast + guava. I noticed I couldn't serialize the ImmutableTable properly so I decided to investigate why. Surprisingly some of the Table implementations are serialized just fine (e.g. HashBasedTable) but that is only because they are handled by the generic FieldSerializer, so their private fields are basically inferred and serialized. The reason why this doesn't work for ImmutableTable implementation is that it internally uses some private implementation of ImmutableMap (e.g. Column or Row) which in turn are serialized by the FieldSerializer as regular Map, and fail during deserialization.

This PR fixes both sides of this problem: it adds direct support for Table implementations (including ImmutableTable) as well as registers additional private implementations of ImmutableMap with ImmutableMapSerializer.

Additionally I noticed that the TreeMultimapSerializer didn't cover non-default comparators. I added support for those in TreeBasedTableSerializer, so decided to fix it for TreeMultimapSerializer too.

Please take a look and review before merging. I'll appreciate any feedback!

coveralls commented 5 years ago

Coverage Status

Coverage increased (+1.2%) to 87.158% when pulling e7d7ce79dda098e3544879e79f832d7d318584e7 on maraswrona:master into c8cc4f84856810642925babd11659325d7a1d928 on magro:master.

coveralls commented 5 years ago

Coverage Status

Coverage increased (+1.3%) to 87.256% when pulling 655686f057e1f72f2ad25126cc6a8eb9f2313789 on maraswrona:master into c8cc4f84856810642925babd11659325d7a1d928 on magro:master.

magro commented 5 years ago

Awesome! Can you please also update the readme?

maraswrona commented 5 years ago

I updated the readme.

I also changed the implementation of Table serialization strategy. The first attempt was to serialize a collection of cells as R + C + V triplets. I've run some benchmarks and the output produced by this approach is 10% bigger than what you get from the default FieldSerializer. It's especially inefficient for very dense tables which are filled e.g. 100%. Then basically we serialize rows and columns multiple times unnecessarily.

Now the strategy goes through the rows and through the columns for each row in nested loop and saves row first, and then its colums and values. It is 5% more efficient than the FieldSerializer and 12% more efficient than the previous implementation.

I keep quite a lot of these Table objects in my Hazelcast cluster, so it matters to me if they don't occupy to much memory, so I wanted to measure this. I hope I didn't make any mistakes.

magro commented 5 years ago

Great, thanks! I just pushed version 0.44 to maven central.