Closed GoogleCodeExporter closed 8 years ago
Attached is v5.diff
Its around 30kb for the 3 files (added a base class for both num strategies)
It'll be on a new module.
If you can do perf tests, that would be great.
I'll look to sweep the other issues and then make a release by the end of the
week.
Original comment by david.yu...@gmail.com
on 28 Mar 2012 at 10:21
Attachments:
@rev 1453
Original comment by david.yu...@gmail.com
on 29 Mar 2012 at 5:09
Original comment by david.yu...@gmail.com
on 29 Mar 2012 at 5:09
Hi David,
I did some quick&dirty benchmarking, very limited though. I attach the test
here. It is based on the ObjectSchemaTest from protostuff-runtime JUnit tests.
I had to comment our some checks that are assuming on proper serialization of
graphs with shared nodes. Kryo had problems with it (either because it cannot
handle it, or because I don't know how to properly configure Kryo).
Used Kryo version is 1.0.4:
<dependency>
<groupId>com.googlecode</groupId>
<artifactId>kryo</artifactId>
<version>1.04</version>
</dependency>
Here are my numbers:
Size of serialized representation using JavaSerializer: 3348
Speed of JavaSerializer: 100000 iterations in 77885 ms; 1.2839442768183862
iterations/ms
Size of serialized representation using KryoSerializer: 2367
Speed of KryoSerializer: 100000 iterations in 79588 ms; 1.2564708247474494
iterations/ms
Size of serialized representation using protostuff-default-id-strategy: 2381
Speed of protostuff-default-id-strategy: 100000 iterations in 39516 ms;
2.530620508148598 iterations/ms
Size of serialized representation using protostuff-strict-num-id-strategy: 679
Speed of protostuff-strict-num-id-strategy: 100000 iterations in 29536 ms;
3.3856988082340194 iterations/ms
Size of serialized representation using protostuff-auto-num-id-strategy: 677
Speed of protostuff-auto-num-id-strategy: 100000 iterations in 30702 ms;
3.2571168002084554 iterations/ms
So, the new implementation seems to win both in speed and in size against
others. But this of course depends on the user-defined classes, their
complexity and complexity of object graphs.
The test itself can be easily applied to anything else by exchanging the
user-defined classes and complex structures built using them. If you have
something more realistic and may be used for benchmarking other frameworks, try
it out or let me know.
Original comment by romixlev
on 30 Mar 2012 at 12:58
Attachments:
BTW, Kryo problem with shared nodes is known. There is a patch submitted on the
corresponding issue just today or yesterday ;-)
http://code.google.com/p/kryo/issues/detail?id=38
Original comment by romixlev
on 30 Mar 2012 at 2:43
Great stuff!
Note that you can also share the schemas from DefaultIdStrategy to
ExplicitIdStrategy.
// snip
public static class Share implements IdStrategy.Factory
{
final ExplicitIdStrategy.Registry r = new ExplicitIdStrategy.Registry();
final IdStrategy share = RuntimeEnv.ID_STRATEGY;
<T> Share registerPojo(Class<T> clazz, int id)
{
HasSchema<T> wrapper = share.getSchemaWrapper(clazz, true);
r.registerPojo(wrapper.getSchema(), wrapper.getPipeSchema(), id);
return this;
}
public IdStrategy create()
{
return r.strategy;
}
public void postCreate()
{
registerPojo(Foo.class, 1)
.registerPojo(Bar.class, 2);
}
}
This will yield even better performance because the schema is no longer lazily
loaded.
I'll put some new docs (after the release) regarding the new features and
performance tips.
1.0.5 will be released today. Let me know if you have any other concerns.
Original comment by david.yu...@gmail.com
on 30 Mar 2012 at 4:04
Ignore what I said regarding sharing :-)
I forgot it was intentionally made isolated ...
Original comment by david.yu...@gmail.com
on 30 Mar 2012 at 4:28
I attach a bit improved benchmark. It uses Kryo's possibility to register
classes before serializing and compression support. Plus, it uses
multithreading to do serialization in parallel.
With these changes, Kryo does a bit better. It produces smaller results and
with compression enabled even smallest. But when it comes to performance,
protostuff-runtime still beats it. I guess protostuff would beat
Kryo+compression, if protostuff would support compression as well.
Another interesting finding was that protostuff has no problems with
concurrency and the same serializer/strategy can be used with different
threads. But Kryo is very sensitive to multi-threading and each thread should
better use its own copy of serializers, otherwise some problems occur.
Original comment by romixlev
on 30 Mar 2012 at 4:55
Forgot to attach the file :-)
Original comment by romixlev
on 30 Mar 2012 at 4:56
Attachments:
The weird thing with kryo is that it trades off forward-backward compatibility
for smaller serialized size. Its actually a lot like protobuf with variable
length encoding but it took away the biggest feature.
Compression is easily pluggable.
See
https://github.com/jboss-switchyard/core/blob/master/runtime/src/main/java/org/s
witchyard/internal/io/SerializerType.java
Original comment by david.yu...@gmail.com
on 30 Mar 2012 at 5:14
> Compression is easily pluggable.
> See
https://github.com/jboss-switchyard/core/blob/master/runtime/src/main/java/org
> /switchyard/internal/io/SerializerType.java
Thanks. I quickly hacked something along these lines. The result is:
- protostuff with compression (gzip, highest compression level) is still faster
than Kryo without compression
- But the size of compressed protostuff's serialized representation is about 2
times bigger than the size of Kryo's compressed representation. This probably
indicates that protostuff format is not so well suited for compression, because
it already tries to pack bits as tight as possible.
Original comment by romixlev
on 31 Mar 2012 at 8:31
What's the dataset? If there a lot of strings, the compression ratio will be
good. But if its mostly numbers, you won't be gaining a lot from compression.
(This is also stated in the official protobuf docs)
If you want a better comparison, use the same compression algorithm (and level)
for both. That would give you a better understanding once you see the results.
Original comment by david.yu...@gmail.com
on 31 Mar 2012 at 9:00
David, I used the same JUnit test that I attached before. So the dataset is
according to that unit test and the same in both cases. I just added GZIP
compression for protostuff.
This is what I get:
Size of serialized representation using JavaSerializer: 3438
Speed of JavaSerializer: 2000 iterations in 1794 ms; 4.459308807134894
iterations/ms
Size of serialized representation using KryoSerializer: 1510
Speed of KryoSerializer: 2000 iterations in 1217 ms; 6.57354149548069
iterations/ms
Size of serialized representation using GZIP-KryoSerializer: 204
Speed of GZIP-KryoSerializer: 2000 iterations in 1497 ms; 5.344021376085505
iterations/ms
Size of serialized representation using KryoCompressedSerializer: 212
Speed of KryoCompressedSerializer: 2000 iterations in 1467 ms;
5.453306066802999 iterations/ms
Size of serialized representation using protostuff-default-id-strategy: 2597
Speed of protostuff-default-id-strategy: 2000 iterations in 1076 ms;
7.434944237918216 iterations/ms
Size of serialized representation using protostuff-strict-num-id-strategy: 679
Speed of protostuff-strict-num-id-strategy: 2000 iterations in 733 ms;
10.914051841746248 iterations/ms
Size of serialized representation using protostuff-auto-num-id-strategy: 677
Speed of protostuff-auto-num-id-strategy: 2000 iterations in 640 ms; 12.5
iterations/ms
Size of serialized representation using GZIP-protostuff-auto-num-id-strategy:
407
Speed of GZIP-protostuff-auto-num-id-strategy: 2000 iterations in 999 ms;
8.008008008008009 iterations/ms
Original comment by romixlev
on 31 Mar 2012 at 10:03
Can you attach the updated version? The size of the defualt id strategy is
already 2597 .. so ideally that is saying that its already sufficiently bigger
compare to the others.
Btw, try taking a look at https://github.com/eishay/jvm-serializers/wiki and
look for the deflate compression (last column). You'll see how much the
compression have taken off for all serializers.
With that dataset and compression algorithm, they're within 1% of each other.
(dfl-size/original-size) lesser is better.
Original comment by david.yu...@gmail.com
on 31 Mar 2012 at 1:37
Hi David,
> Can you attach the updated version?
Sure, I attach the file to this message. I added one more serializer class for
GZIP compression. Since everything is in a single file it is getting really,
really ugly ;-)
Original comment by romixlev
on 31 Mar 2012 at 4:37
Attachments:
David, I did some benchmarks using the same test, but using ProtostuffIOUtil
instead of GraphIOUtil. The interesting observation is:
Class.isAssignableFrom consumes almost 20%, mostly in the
Object.writeObjectTo().
If we look at that method and take into account that we mostly use the same
small set of classes when serializing, we see that we do the same expensive
check (isAssignableFrom) for the same class over and over again. May be it can
be optimized somehow? E.g. we compute it only once and then cache the result of
such a check somewhere, e.g. in the schema or in some sort of a mapping from a
class to an integer mask representing properties interesting for us?
But of course, it could be that this problem looks more serious than it is in
reality, because this test does only serialization/deserialization in a loop
and has no other business logic. I.e. it is an extremal case. And in most real
world scenarios, serialization is not the most time consuming factor anyway...
Original comment by romixlev
on 2 Apr 2012 at 10:58
"If we look at that method and take into account that we mostly use the same
small set of classes when serializing"
What particular type of classes?
Note that this is necessary with dynamic objects (java.lang.Object/interface).
Even on a List<?> or Map<?,?>, anything type can be assigned, therefore you
cannot tell whether the next element still is the same type as the previous one.
DerivativeSchema on the other hand, is better optimized. To take advantage of
it, use abstract classes rather than interfaces.
Btw in your test, try adding a java.lang.Serializable field and assign an enum.
Or a new interface that the enum implements (Similar to
AbstractObjectSchemaTest, bottom part).
Original comment by david.yu...@gmail.com
on 2 Apr 2012 at 11:50
> What particular type of classes?
Sorry. I was not clear. It is the same classes that were used in my tests
attached before. No changes. So, there are let's say 10-15 classes (including
some of JDK collection classes) that are really used to construct the Bean
objects and later used during serialization/de-serialization.
There are no doubts, we need to use isAssignableFrom during serialization. The
question is how often. In the performance test attached before, we essentially
serialize the same object many, many times in a loop. It means that we perform
isAssignableFrom many times for the same classes as arguments, right? And
isAssignableFrom seems to be a rather expensive operation in such a setup. So,
I just asked if the results of invocation of x.isAssignableFrom(y) could be
cached somehow, to avoid computing it next time and used the cached value
instead. I.e. some sort of a predicate myCustomIsAssignableFrom(x,y) which
could be used instead of isAssignableFrom (after it was computed once) and is
cheaper then the standard isAssignableFrom. But may be isAssignableFrom is the
cheapest possible implementation or may be this optimization is not worth the
effort.
Original comment by romixlev
on 2 Apr 2012 at 12:06
The problem with caching is if there are too many runtime objects. The
overhead of resizing the cache map will be a problem.
I'm still open to improving the ObjectSchema performance.
Can you check if using "instanceof" yields better results? (i just might)
Original comment by david.yu...@gmail.com
on 2 Apr 2012 at 4:53
Hi David,
> The problem with caching is if there are too many runtime objects. The
overhead
> of resizing the cache map will be a problem.
I suggest to cache classes and their properties, not objects. And the number of
classes is much smaller.
> I'm still open to improving the ObjectSchema performance.
> Can you check if using "instanceof" yields better results? (i just might)
I did it. Please find attached a small JUnit test. It measures the speed of
instanceof, isAssignableFrom and a cached version of isAssignableFrom. The
cached version is faster than others in my setup:
Speed of instanceof: 100000000 iterations in 9387 ms
Speed of isAssignableFrom: 100000000 iterations in 22308 ms
Speed of cached isAssignableFrom: 100000000 iterations in 6026 ms
Original comment by romixlev
on 3 Apr 2012 at 6:39
Attachments:
Results when run on java 1.6u26, ubuntu 10.04, intel core2quad 2.66ghz:
Speed of instanceof: 500000000 iterations in 23192 ms
Speed of isAssignableFrom: 500000000 iterations in 19044 ms
Speed of cached isAssignableFrom: 500000000 iterations in 13417 ms
What environment was your test run on?
Original comment by david.yu...@gmail.com
on 3 Apr 2012 at 1:48
Interesting!
This is what I have:
java version "1.6.0_30"
Java(TM) SE Runtime Environment (build 1.6.0_30-b12)
Java HotSpot(TM) Client VM (build 20.5-b03, mixed mode, sharing)
Windows Vista 32bit, Intel Core i5, M520 2.40 GHz Dual Core, Hyper-Threading
But aside from the fact that the speed of instance of and isAssignableFrom is
very different on different platforms, we see that caching actually helps in
any case ;-)
Original comment by romixlev
on 3 Apr 2012 at 3:55
The problem is that we have 3 isAssignableFrom checks (Message, Map,
Collection) in that order. To cache classes, we need 3 maps for that.
I don't think its worth it for the complexity it brings.
Btw, a bug was recently fixed that is related to NumericIdStrategy (Issue 111).
Original comment by david.yu...@gmail.com
on 11 Apr 2012 at 7:24
Hi David,
Just FYI, please have a look at https://github.com/jankotek/JDBM3/issues/71 and
https://github.com/jankotek/JDBM3/pull/72
I contributed a bit of code for JDBM3, to make its custom serialization a bit
better. They have implemented their own serialization by means of just 3-4
serialization-related classes. The interesting bit is: they produce a very
compact representation (even smaller than Kryo) and with my changes (which
improved their serialization speed almost by a factor of 100) they a faster
than both Kryo and protostuff with FQCN improvements. I used the same tests
that I did here and just added JDBM3 into the mix.
Their framework also uses the FQCN trick and provides specialized serializers
for built-in data types, often used collections and arrays. Overall, they are
not as mature as protostuff, but their implementation is much smaller and
covers most of the typical usage scenarios.
I write it here, because you may be interested to have a look at their
implementation with my changes and see if protostuff could benefit from
anything done there.
Original comment by romixlev
on 24 Apr 2012 at 1:59
I created a small standalone library from the JDBM3 implementation of
serialization. It is called quickser and can be found here:
https://github.com/romix/quickser
It is now even faster than what I reported before. Have a look, if you're
interested.
Original comment by romixlev
on 26 Apr 2012 at 10:57
Sorry, presumably very stupid question: What does FQNC mean?
(PS. Thanks for making protostuff available. Just getting started, think I'm
going to like it.)
Original comment by rennie.p...@gmail.com
on 6 Aug 2013 at 1:08
Never mind, I see now that it's FQCN misspelled, and Google can find FQCN for
me.
Original comment by rennie.p...@gmail.com
on 6 Aug 2013 at 1:18
Original issue reported on code.google.com by
hpgisler...@gmail.com
on 2 Dec 2011 at 7:16