mathnet / mathnet-numerics

Math.NET Numerics
http://numerics.mathdotnet.com
MIT License
3.45k stars 891 forks source link

Reconsider basic serialization support #350

Closed cdrnet closed 8 years ago

cdrnet commented 8 years ago

Math.NET Numerics currently does not provide any serialization support out of the box. This is by design, as serialization is very application specific with numerous options. We could never satisfy all use cases and would rather not limit ourselves with design constraints based on the lowest denominator (and have our code crippled with all kind of attributes and helper logic).

However, supporting just one approach on a data-representing types like RunningStatistics, Histogram, and vectors & matrices could make integration much more feasible. Quite a few serializers including FsPickler have a fallback for ISerializable types, so this could be a way forward.

Any thoughts on this?

eiriktsarpalis commented 8 years ago

Here's a rough guide of how you could make a type serializable in FsPickler: http://nessos.github.io/FsPickler/overview.html#Serializable-Types

In C# code, adding SerializableAttribute at the class definition should suffice to make the type serializable by libraries such as BinaryFormatter and FsPickler, provided of course that all of its fields are also of serializable types.

In general, I would recommend against using ISerializable since it's a very error-prone pattern. My pattern of preference is DataContract serialization since you have easy and explicit control of the serialization format. It is also very capable, particularly when combined with OnSerializing/OnSerialized and OnDeserializing/OnDeserialized callbacks. Here's a sample implementation that combines the two in F#.

tibel commented 8 years ago

Math.Net is a library to do calculations in you logic layer. A Matrix with its Storage class is a quite complex object to serialize in contrast to a DTO.

There is MathNet.Numerics.Data to write (serialize) you matrices and vectors and most serializes support surrogate types to serialize complex object graphs.

Also notice versioning when doing serialization. see https://twitter.com/terrajobst/status/662690170425094145

cdrnet commented 8 years ago

Thanks @tibel for reminding me - yes, this is essentially the reasoning behind the current design.

Maybe for context, this came up in a distributed computing scenario where there is a need to be able to transfer some data structures over the wire between nodes in a simple and efficient way (assuming the same version).

dsyme commented 8 years ago

Yes, I think serialization for basic storage types for ephemeral purposes in version-homogeneous distributed computations seems entirely reasonable.

redknightlois commented 8 years ago

Hi @cdrnet, did you consider Bond? https://microsoft.github.io/bond/why_bond.html

It support multiple ways to serialize, it is lighting fast (it beat the crap out of our structs/unmanaged based solution by more than 4x), it could allow eventually to talk with software written in Python or C++, and shouldn't be too difficult to provide external bindings to the objects themselves.

If you are considering provide a solution for serialization it is a must evaluate.

dsyme commented 8 years ago

@redknightlois I don't know of any .NET general purpose libraries (of the kind of Math.NET Numerics) that take dependencies on specific serialization technologies like Bond.

As @eiriktsarpalis describes, for the purposes being discussed here, the appropriate way to specify serialization independent of serialization format is by using the DataContract attributes (ISerializable can also be used if assembly homogeneity between producer and consumer can be assumed)

redknightlois commented 8 years ago

@dsyme That's why I said that it can be provided as an external binding.

As a user of MathNet myself when I had to serialize we had to write custom providers. We are currently using Bond because for us performance is important, but I also see the flexibility of protocol transcoding could be useful in general. ISerializable and DataContract aren't really that generic either, so I doubt the real value that can provide I know for a fact it wouldn't have add much value for us at least because it is pretty basic and rigid.

dsyme commented 8 years ago

@redknightlois OK, that makes sense. The email chain that led to this issue was about serialization of inputs, intermediate results and outputs suitable for use in homogeneous distributed computation of the kind supported by http://mbrace.io, which uses FsPickler under the hood.

I agree that in other settings you want to use other serialization formats.

cdrnet commented 8 years ago

So a way to move forward could be to:

DataContract instead of Serializable because to my understanding the latter is not available in any portable profiles nor the vnext stuff.

dsyme commented 8 years ago

@cdrnet That's a great plan

cdrnet commented 8 years ago

DataContracts for RunningStatistics, DescriptiveStatistics and Histogram have been released in v3.9.0.

cdrnet commented 8 years ago

@redknightlois you mentioned you have already written custom providers for Bond. Assuming we'd provide such providers out of the box (as extra packages, like the data packages). Would such a generic provider in practice be useful in your very specific case, or would you need/want to implement your own providers anyway in order to tweak to your specific Application concerns, and also to keep full control over it?

redknightlois commented 8 years ago

We built very custom schemas for it, but it could have being far easier if there would have been schemas for basic data types already there. We did go an extra mile and also incorporate some of the ideas that allow Bond to be insanely fast too, but that is another story.