heweitykc / protobuf-net

Automatically exported from code.google.com/p/protobuf-net
0 stars 0 forks source link

High CodeGen Overhead #339

Open GoogleCodeExporter opened 9 years ago

GoogleCodeExporter commented 9 years ago
I have tried the latest protobuf-net version (602) to replace 
DataContractFormatter. The first performance tests were very good but after the 
change everything got much slower. After digging deeper I have found that the 
serialization perf is indeed very good but for the first message the codegen 
can take for a complex type 200ms which is way too much if I serialize many 
different types over the wire only a few thousand times. 
The break even point to between code gen and DataContractSerializer is around 
20-40K objects. If I am serializing less than that there is no point to use 
protobuf-net. 

Still I want to become faster than DataContractserializer. Is there a way to 
store the emitted code on disc and reuse it later when the types are actually 
using it? XmlSerializer uses a similar mechanism but it does not cache the temp 
assembly anywhere (for good reasons I guess). 
Since assemblies are hard to cache it could perhaps make sense to cache only 
the emited IL instructions in a meta format to get faster but I have not 
profiled this yet if that route makes any sense. 

Original issue reported on code.google.com by kraus.al...@gmail.com on 16 Nov 2012 at 8:41

GoogleCodeExporter commented 9 years ago
The google-code download includes the "precompile" tool, which does exactly 
this; it generates a dll which can be referenced by your app-tier, which 
doesn't need to do any analysis. It still needs access to the protobuf-net 
library for the reader/writer API, but it can use the "core only" dll if you 
want to be minimal (or the "full" dll is fine). Then instead of using 
`Serializer.Serialize(...)` etc, you can create an instance of the generated 
serializer:

    new MyCustomSerializer().Serialize(...);

note that you can also store and re-use the serializer instance if you want to 
avoid allocations (it is fully thread-safe etc - you can use it concurrently 
from multiple threads).

However! I would also, if possible, be interested in your complex model - I 
wonder if there are some optimisations to be made in the analysis code. If it 
is something you can share with me, I'd be happy to investigate where the time 
is going, to see if we can make it faster.

Original comment by marc.gravell on 16 Nov 2012 at 12:12

GoogleCodeExporter commented 9 years ago
Thanks for the fast answer Marc. A precompile tool is one way but unfortunately 
our build process is already quite complex. The project is really large (>4000 
dlls) and I need to restrict my changes to some central locations. Then I would 
need to do some probing for the requested types to locate the correct 
serializer assembly at run time which sounds also expensive. 

I am thinking the a way out if this would be to to use DataContracts by default 
and then precompile the type on another thread. The next time I can use 
protobuf-net at full speed. That way I could keep the best of both worlds.

To make the code generation overhead less I fear is not enough since the effect 
does accumulate across the types. If we can make the overhead it lets say 20ms 
for 50 types we still waste a full second startup time. Time that the 
DataContractSerializer does not need because it seems to do it without much 
code gen. 

Original comment by kraus.al...@gmail.com on 17 Nov 2012 at 12:50

GoogleCodeExporter commented 9 years ago
Well, you can cause an in-place compile just by adding a few key types (the 
root objects) and calling CompileInPlace():

    var model = RuntimeTypeModel.Default;
    model.Add(someType, true);
    // ... More
    model.CompileInPlace();

However, I genuinely suspect I can make some pretty significant optimisations 
to the meta-programming layer. Would you be at all interested in me looking at 
that? I can try to simulate something here, but looking at your objects would 
be more accurate (under NDA or whatever).

Original comment by marc.gravell on 17 Nov 2012 at 2:43

GoogleCodeExporter commented 9 years ago
what is the number of types here, btw? 

Original comment by marc.gravell on 17 Nov 2012 at 2:46

GoogleCodeExporter commented 9 years ago
I have 100 different types with a total volume of 1,7MB of serialized data with 
DataContractSerializer. I will check if I can get a NDA for you or I will need 
to prepare some synthetic classes to give you something to play with. 

Original comment by kraus.al...@gmail.com on 19 Nov 2012 at 8:50