What approach could be taken to improve the performance for a large object graph?

origfla commented 8 years ago

I have a relatively large object graph, potential for ~ 500mb+ in size and the deserialization is dominating the performance of the application.

FsPickler binary serialization is already performing admirably well compared to a few other alternatives (so well done for that) but I was wondering what avenues may be open to squeezing any more performance out of FsPickler?

dsyme commented 8 years ago

Profile the serialization/deserialization using some kind of .NET profiler?

palladin commented 8 years ago

@origfla If you want to contribute you can start by running PerfView on your scenario and try to identify potential performance hot spots.

origfla commented 8 years ago

Thanks @palladin and @dsyme.

Please see attached screenshot from my CPU profiling session in VS2015 below. All activity is spent inside FsPickler and I'm struggling to interpret things definitively (as good or bad) once it's inside this library.

I can glean the following:

It is very busy with one of my "core" types being CellReference which is effectively of type int * T2, where T2 in turn is a struct of int * int. This is not unexpected given that what I am serializing contains many instances of this type. However, if something can be done to a type to improve performance then this is an easy place to look.
It is spending 17% of the time overall in JIT_TailCall. My client application is in C# (because I expect my clients to want to use C#) and the entire library (where I call FsPickler from) is in F#. I have gotten rid of the poor performance from excessive JIT_TailCalls elsewhere in the program by turning off the "Generate Tail Call" switch in all of my library projects. The C# client project does not have such a switch?
It is also spending 17% of the time overall in ReflectionSerialization::GetUninitialized Object. I am pretty sure that this is mostly wrt the type in (1) above. Is their a way to make this easier for FsPickler? An answer on SO: http://stackoverflow.com/questions/390578/creating-instance-of-type-without-default-constructor-in-c-sharp-using-reflectio by user Nawfal may be relevant...

Other relevant metrics are as follows: My disk is an SSD with a theoretical read capability of 725 MB/s. I am reading a FsPickler serialized file of size 177Mb and it is taking ~ 6.5s This implies a read rate of 27 MB/s - not close to the theoretical limit. On a simpler test dataset (simple collection that is more of a "table" and less of a "graph") I have seen FsPickler achieve a read rate of 40 MB/s - still well short of the theoretical limit.

capture

palladin commented 8 years ago

Hmm interesting... I found this http://stackoverflow.com/questions/31433605/how-to-eliminate-time-spent-in-jit-tailcall-for-functions-that-are-genuinely-non It seems that there is some performance correlation between struct, tails calls and 64 bit RyuJit. I don't know if @dsyme knows something about this.

eiriktsarpalis commented 8 years ago

Serialization in general is a very intensive computation, particularly deserialization. FsPickler is a generic serialization library, so it always ensures that no cycles occur in an object graph which can be slow. If you have full control over the data types that you are serializing, I would recommend writing your own persist logic to squeeze out optimal serialization performance.

origfla commented 8 years ago

@eiriktsarpalis That seems to be where my journey has taken me to at the moment. In particular, using standard .NET BinaryWriter and BinaryReader.

That said, in a simpler testing scenario, where there aren't the issues (1-3) as above, FsPickler is coming closer than I would have thought to the raw binary serialization and deserialization performance - quite impressive given its generic nature.

eiriktsarpalis commented 8 years ago

Closing this then?

mbraceproject / FsPickler

What approach could be taken to improve the performance for a large object graph? #65