user definable mapping: FQNC's --> integer, for serialized stream size-reduction

GoogleCodeExporter commented 8 years ago

Request for additional functionality that allows to define a mapping between 
FQNC's and int's on a per schema basis.

One then could give this mapping along to protostuff for reducing the size of 
the serialized stream.

See this thread for details:
http://groups.google.com/group/protostuff/browse_thread/thread/472b52c56556c207?
hl=en_US

Original issue reported on code.google.com by hpgisler...@gmail.com on 2 Dec 2011 at 7:16

GoogleCodeExporter commented 8 years ago

Attached is v5.diff

Its around 30kb for the 3 files (added a base class for both num strategies)
It'll be on a new module.

If you can do perf tests, that would be great.
I'll look to sweep the other issues and then make a release by the end of the 
week.

Original comment by david.yu...@gmail.com on 28 Mar 2012 at 10:21

Attachments:

v5.diff

GoogleCodeExporter commented 8 years ago

@rev 1453

Original comment by david.yu...@gmail.com on 29 Mar 2012 at 5:09

GoogleCodeExporter commented 8 years ago

Original comment by david.yu...@gmail.com on 29 Mar 2012 at 5:09

Changed state: Fixed

GoogleCodeExporter commented 8 years ago

Hi David,

I did some quick&dirty benchmarking, very limited though. I attach the test 
here. It is based on the ObjectSchemaTest from protostuff-runtime JUnit tests. 
I had to comment our some checks that are assuming on proper serialization of 
graphs with shared nodes. Kryo had problems with it (either because it cannot 
handle it, or because I don't know how to properly configure Kryo).

Used Kryo version is 1.0.4:
        <dependency>
            <groupId>com.googlecode</groupId>
            <artifactId>kryo</artifactId>
            <version>1.04</version>
        </dependency>

Here are my numbers:
Size of serialized representation using JavaSerializer: 3348
Speed of JavaSerializer: 100000 iterations in 77885 ms; 1.2839442768183862 
iterations/ms
Size of serialized representation using KryoSerializer: 2367
Speed of KryoSerializer: 100000 iterations in 79588 ms; 1.2564708247474494 
iterations/ms
Size of serialized representation using protostuff-default-id-strategy: 2381
Speed of protostuff-default-id-strategy: 100000 iterations in 39516 ms; 
2.530620508148598 iterations/ms
Size of serialized representation using protostuff-strict-num-id-strategy: 679
Speed of protostuff-strict-num-id-strategy: 100000 iterations in 29536 ms; 
3.3856988082340194 iterations/ms
Size of serialized representation using protostuff-auto-num-id-strategy: 677
Speed of protostuff-auto-num-id-strategy: 100000 iterations in 30702 ms; 
3.2571168002084554 iterations/ms

So, the new implementation seems to win both in speed and in size against 
others. But this of course depends on the user-defined classes, their 
complexity and complexity of object graphs.

The test itself can be easily applied to anything else by exchanging the 
user-defined classes and complex structures built using them. If you have 
something more realistic and may be used for benchmarking other frameworks, try 
it out or let me know.

Original comment by romixlev on 30 Mar 2012 at 12:58

Attachments:

TestSerializationUsingLatestProtostuff.java

GoogleCodeExporter commented 8 years ago

BTW, Kryo problem with shared nodes is known. There is a patch submitted on the 
corresponding issue just today or yesterday ;-)
http://code.google.com/p/kryo/issues/detail?id=38

Original comment by romixlev on 30 Mar 2012 at 2:43

GoogleCodeExporter commented 8 years ago

Great stuff!  
Note that you can also share the schemas from DefaultIdStrategy to 
ExplicitIdStrategy.
// snip
    public static class Share implements IdStrategy.Factory
    {
        final ExplicitIdStrategy.Registry r = new ExplicitIdStrategy.Registry();
        final IdStrategy share = RuntimeEnv.ID_STRATEGY;

        <T> Share registerPojo(Class<T> clazz, int id)
        {
            HasSchema<T> wrapper = share.getSchemaWrapper(clazz, true);
            r.registerPojo(wrapper.getSchema(), wrapper.getPipeSchema(), id);

            return this;
        }

        public IdStrategy create()
        {
            return r.strategy;
        }

        public void postCreate()
        {
            registerPojo(Foo.class, 1)
                .registerPojo(Bar.class, 2);
        }
    }

This will yield even better performance because the schema is no longer lazily 
loaded.

I'll put some new docs (after the release) regarding the new features and 
performance tips.
1.0.5 will be released today.  Let me know if you have any other concerns.

Original comment by david.yu...@gmail.com on 30 Mar 2012 at 4:04

GoogleCodeExporter commented 8 years ago

Ignore what I said regarding sharing :-)
I forgot it was intentionally made isolated ...

Original comment by david.yu...@gmail.com on 30 Mar 2012 at 4:28

GoogleCodeExporter commented 8 years ago

I attach a bit improved benchmark. It uses Kryo's possibility to register 
classes before serializing and compression support. Plus, it uses 
multithreading to do serialization in parallel. 

With these changes, Kryo does a bit better. It produces smaller results and 
with compression enabled even smallest. But when it comes to performance, 
protostuff-runtime still beats it. I guess protostuff would beat 
Kryo+compression, if protostuff would support compression as well.

Another interesting finding was that protostuff has no problems with 
concurrency and the same serializer/strategy can be used with different 
threads. But Kryo is very sensitive to multi-threading and each thread should 
better use its own copy of serializers, otherwise some problems occur.

Original comment by romixlev on 30 Mar 2012 at 4:55

GoogleCodeExporter commented 8 years ago

Forgot to attach the file :-)

Original comment by romixlev on 30 Mar 2012 at 4:56

Attachments:

TestSerializationUsingLatestProtostuff.java

GoogleCodeExporter commented 8 years ago

The weird thing with kryo is that it trades off forward-backward compatibility 
for smaller serialized size.  Its actually a lot like protobuf with variable 
length encoding but it took away the biggest feature.

Compression is easily pluggable.
See 
https://github.com/jboss-switchyard/core/blob/master/runtime/src/main/java/org/s
witchyard/internal/io/SerializerType.java

Original comment by david.yu...@gmail.com on 30 Mar 2012 at 5:14

GoogleCodeExporter commented 8 years ago

> Compression is easily pluggable.
> See 
https://github.com/jboss-switchyard/core/blob/master/runtime/src/main/java/org
> /switchyard/internal/io/SerializerType.java

Thanks. I quickly hacked something along these lines. The result is:

- protostuff with compression (gzip, highest compression level) is still faster 
than Kryo without compression

- But the size of compressed protostuff's serialized representation is about 2 
times bigger than the size of Kryo's compressed representation. This probably 
indicates that protostuff format is not so well suited for compression, because 
it already tries to pack bits as tight as possible.

Original comment by romixlev on 31 Mar 2012 at 8:31

GoogleCodeExporter commented 8 years ago

What's the dataset?  If there a lot of strings, the compression ratio will be 
good.  But if its mostly numbers, you won't be gaining a lot from compression. 
(This is also stated in the official protobuf docs)

If you want a better comparison, use the same compression algorithm (and level) 
for both.  That would give you a better understanding once you see the results.

Original comment by david.yu...@gmail.com on 31 Mar 2012 at 9:00

GoogleCodeExporter commented 8 years ago

David, I used the same JUnit test that I attached before. So the dataset is 
according to that unit test and the same in both cases. I just added GZIP 
compression for protostuff.

This is what I get:

Size of serialized representation using JavaSerializer: 3438
Speed of JavaSerializer: 2000 iterations in 1794 ms; 4.459308807134894 
iterations/ms
Size of serialized representation using KryoSerializer: 1510
Speed of KryoSerializer: 2000 iterations in 1217 ms; 6.57354149548069 
iterations/ms
Size of serialized representation using GZIP-KryoSerializer: 204
Speed of GZIP-KryoSerializer: 2000 iterations in 1497 ms; 5.344021376085505 
iterations/ms
Size of serialized representation using KryoCompressedSerializer: 212
Speed of KryoCompressedSerializer: 2000 iterations in 1467 ms; 
5.453306066802999 iterations/ms
Size of serialized representation using protostuff-default-id-strategy: 2597
Speed of protostuff-default-id-strategy: 2000 iterations in 1076 ms; 
7.434944237918216 iterations/ms
Size of serialized representation using protostuff-strict-num-id-strategy: 679
Speed of protostuff-strict-num-id-strategy: 2000 iterations in 733 ms; 
10.914051841746248 iterations/ms
Size of serialized representation using protostuff-auto-num-id-strategy: 677
Speed of protostuff-auto-num-id-strategy: 2000 iterations in 640 ms; 12.5 
iterations/ms
Size of serialized representation using GZIP-protostuff-auto-num-id-strategy: 
407
Speed of GZIP-protostuff-auto-num-id-strategy: 2000 iterations in 999 ms; 
8.008008008008009 iterations/ms

Original comment by romixlev on 31 Mar 2012 at 10:03

GoogleCodeExporter commented 8 years ago

Can you attach the updated version?  The size of the defualt id strategy is 
already 2597 .. so ideally that is saying that its already sufficiently bigger 
compare to the others.

Btw, try taking a look at https://github.com/eishay/jvm-serializers/wiki and 
look for the deflate compression (last column).  You'll see how much the 
compression have taken off for all serializers.

With that dataset and compression algorithm, they're within 1% of each other.
(dfl-size/original-size) lesser is better.

Original comment by david.yu...@gmail.com on 31 Mar 2012 at 1:37

GoogleCodeExporter commented 8 years ago

Hi David,

> Can you attach the updated version? 

Sure, I attach the file to this message. I added one more serializer class for 
GZIP compression. Since everything is in a single file it is getting really, 
really ugly ;-)

Original comment by romixlev on 31 Mar 2012 at 4:37

Attachments:

TestSerializationUsingLatestProtostuff.java

GoogleCodeExporter commented 8 years ago

David, I  did some benchmarks using the same test, but using ProtostuffIOUtil 
instead of GraphIOUtil. The interesting observation is:
Class.isAssignableFrom consumes almost 20%, mostly in the 
Object.writeObjectTo(). 

If we look at that method and take into account that we mostly use the same 
small set of classes when serializing, we see that we do the same expensive 
check (isAssignableFrom) for the same class over and over again. May be it can 
be optimized somehow? E.g. we compute it only once and then cache the result of 
such a check somewhere, e.g. in the schema or in some sort of a mapping from a 
class to an integer mask representing properties interesting for us?

But of course, it could be that this problem looks more serious than it is in 
reality, because this test does only serialization/deserialization in a loop 
and has no other business logic. I.e. it is an extremal case. And in most real 
world scenarios, serialization is not the most time consuming factor anyway...

Original comment by romixlev on 2 Apr 2012 at 10:58

GoogleCodeExporter commented 8 years ago

"If we look at that method and take into account that we mostly use the same 
small set of classes when serializing"

What particular type of classes?

Note that this is necessary with dynamic objects (java.lang.Object/interface).  
Even on a List<?> or Map<?,?>, anything type can be assigned, therefore you 
cannot tell whether the next element still is the same type as the previous one.

DerivativeSchema on the other hand, is better optimized.  To take advantage of 
it, use abstract classes rather than interfaces.

Btw in your test, try adding a java.lang.Serializable field and assign an enum. 
 Or a new interface that the enum implements (Similar to 
AbstractObjectSchemaTest, bottom part).

Original comment by david.yu...@gmail.com on 2 Apr 2012 at 11:50

GoogleCodeExporter commented 8 years ago

> What particular type of classes?

Sorry. I was not clear. It is the same classes that were used in my tests 
attached before. No changes. So, there are let's say 10-15 classes (including 
some of JDK collection classes) that are really used to construct the Bean 
objects and later used during serialization/de-serialization. 

There are no doubts, we need to use isAssignableFrom during serialization. The 
question is how often. In the performance test attached before, we essentially 
serialize the same object many, many times in a loop. It means that we perform 
isAssignableFrom many times for the same classes as arguments, right? And 
isAssignableFrom seems to be a rather expensive operation in such a setup. So, 
I just asked if the results of invocation of x.isAssignableFrom(y) could be 
cached somehow, to avoid computing it next time and used the cached value 
instead. I.e. some sort of a predicate myCustomIsAssignableFrom(x,y) which 
could be used instead of isAssignableFrom (after it was computed once) and is 
cheaper then the standard isAssignableFrom. But may be isAssignableFrom is the 
cheapest possible implementation or may be this optimization is not worth the 
effort.

Original comment by romixlev on 2 Apr 2012 at 12:06

GoogleCodeExporter commented 8 years ago

The problem with caching is if there are too many runtime objects.  The 
overhead of resizing the cache map will be a problem.

I'm still open to improving the ObjectSchema performance. 
Can you check if using "instanceof" yields better results? (i just might)

Original comment by david.yu...@gmail.com on 2 Apr 2012 at 4:53

GoogleCodeExporter commented 8 years ago

Hi David,

> The problem with caching is if there are too many runtime objects.  The 
overhead 
> of resizing the cache map will be a problem.

I suggest to cache classes and their properties, not objects. And the number of 
classes is much smaller.

> I'm still open to improving the ObjectSchema performance. 
> Can you check if using "instanceof" yields better results? (i just might)

I did it. Please find attached a small JUnit test. It measures the speed of 
instanceof, isAssignableFrom and a cached version of isAssignableFrom. The 
cached version is faster than others in my setup:

Speed of instanceof: 100000000 iterations in 9387 ms
Speed of isAssignableFrom: 100000000 iterations in 22308 ms
Speed of cached isAssignableFrom: 100000000 iterations in 6026 ms

Original comment by romixlev on 3 Apr 2012 at 6:39

Attachments:

TestIsAssignableFrom.java

GoogleCodeExporter commented 8 years ago

Results when run on java 1.6u26, ubuntu 10.04, intel core2quad 2.66ghz:

Speed of instanceof: 500000000 iterations in 23192 ms
Speed of isAssignableFrom: 500000000 iterations in 19044 ms
Speed of cached isAssignableFrom: 500000000 iterations in 13417 ms

What environment was your test run on?

Original comment by david.yu...@gmail.com on 3 Apr 2012 at 1:48

GoogleCodeExporter commented 8 years ago

Interesting!

This is what I have:

java version "1.6.0_30"
Java(TM) SE Runtime Environment (build 1.6.0_30-b12)
Java HotSpot(TM) Client VM (build 20.5-b03, mixed mode, sharing)

Windows Vista 32bit, Intel Core i5, M520 2.40 GHz Dual Core, Hyper-Threading

But aside from the fact that the speed of instance of and isAssignableFrom is 
very different on different platforms, we see that caching actually helps in 
any case ;-)

Original comment by romixlev on 3 Apr 2012 at 3:55

GoogleCodeExporter commented 8 years ago

The problem is that we have 3 isAssignableFrom checks (Message, Map, 
Collection) in that order.  To cache classes, we need 3 maps for that.
I don't think its worth it for the complexity it brings.

Btw, a bug was recently fixed that is related to NumericIdStrategy (Issue 111).

Original comment by david.yu...@gmail.com on 11 Apr 2012 at 7:24

GoogleCodeExporter commented 8 years ago

Hi David,

Just FYI, please have a look at https://github.com/jankotek/JDBM3/issues/71 and 
https://github.com/jankotek/JDBM3/pull/72

I contributed a bit of code for JDBM3, to make its custom serialization a bit 
better. They have implemented their own serialization by means of just 3-4 
serialization-related classes. The interesting bit is: they produce a very 
compact representation (even smaller than Kryo) and with my changes (which 
improved their serialization speed almost by a factor of 100) they a faster 
than both Kryo and protostuff with FQCN improvements. I used the same tests 
that I did here and just added JDBM3 into the mix.

Their framework also uses the FQCN trick and provides specialized serializers 
for built-in data types, often used collections and arrays. Overall, they are 
not as mature as protostuff, but their implementation is much smaller and 
covers most of the typical usage scenarios. 

I write it here, because you may be interested to have a look at their 
implementation with my changes and see if protostuff could benefit from 
anything done there.

Original comment by romixlev on 24 Apr 2012 at 1:59

GoogleCodeExporter commented 8 years ago

I created a small standalone library from the JDBM3 implementation of 
serialization. It is called quickser and can be found here: 
https://github.com/romix/quickser

It is now even faster than what I reported before. Have a look, if you're 
interested.

Original comment by romixlev on 26 Apr 2012 at 10:57

GoogleCodeExporter commented 8 years ago

Sorry, presumably very stupid question: What does FQNC mean?

(PS. Thanks for making protostuff available. Just getting started, think I'm 
going to like it.)

Original comment by rennie.p...@gmail.com on 6 Aug 2013 at 1:08

GoogleCodeExporter commented 8 years ago

Never mind, I see now that it's FQCN misspelled, and Google can find FQCN for 
me.

Original comment by rennie.p...@gmail.com on 6 Aug 2013 at 1:18

telenths / protostuff

user definable mapping: FQNC's --> integer, for serialized stream size-reduction #96