mbraceproject / FsPickler

A fast multi-format message serializer for .NET
http://mbraceproject.github.io/FsPickler/
MIT License
324 stars 52 forks source link

CustomPickler fallback on default generated pickler #29

Closed Bananas-Are-Yellow closed 9 years ago

Bananas-Are-Yellow commented 9 years ago

I want to serialize a subset of a graph. When I deserialize later, I will reconnect with the external objects. To do that, I use my own tag-to-object table. When I first encounter an external object, I can write out the tag followed by the object. If I encounter it again, I can just write out the tag. I can use the StreamingContext to pass this state through the pickling, so that part is fine.

I can use the [<CustomPickler>] attribute and supply a pickler that does all this, but having written the tag, I then have to manually pickle the object. What I'd like to do is have some way for the custom pickler to obtain the pickler that would have been generated (as if the [<CustomPickler>] attribute had not been applied), so that I can use that to write out the object. Likewise when reading.

Is there some way to do this? The only way I could think of is for all my references to use a Wrapper<'T> struct, which holds the reference, so that the wrapper can have the custom pickler, but the referenced type 'T inside it does not, but I don't really want to complicate all my types and code just for this.

Thanks,

David.

eiriktsarpalis commented 9 years ago

Hi David,

This is not a possibility in the current version of FsPickler. In general, the library follows the discipline of type definitions being the sole authority on pickling semantics, so in that sense they cannot be parameterised externally. Adding this feature would probably introduce interference in recursive type patterns, so it should best be avoided. Keep in mind that picklers resolving themselves is actually possible and idiomatic, as evidenced in the RecursiveClass example here.

Perhaps you need something like this for your problem? Methods with OnSerialized/Deserialized attributes are thoroughly supported by FsPickler.

Bananas-Are-Yellow commented 9 years ago

My code is F# and I don't want mutable fields that get fixed up after deserialization, especially if the object needs to store a temporary tag, which I don't need when I'm finished. That would be even more intrusive than the wrapper idea.

I understand that a pickler can resolve itself, but how does that help in my situation? I'm trying to make use of your pickler generation so that I don't have to manually write a pickler for my fields.

I don't really understand your objection. Yes, type definitions drive the generated picklers, but you can also completely override this with a custom pickler. It doesn't seem unreasonable (to me) for the custom pickler to do something and then decide to call the generated pickler to do its thing too. How would that interfere with recursive types? The custom pickler would decide what to do for each instance of the type. It's a little bit like the Pickler.alt idea of the tag being used to decide what to do next. I don't suppose there is a way to get Pickler.alt to help in my case?

I could always fall back on the wrapper idea. It's no worse than (or just as bad as) using lazy values everywhere and having to use the Value property, I suppose.

eiriktsarpalis commented 9 years ago

Well, a custom pickler is in essence part of the type definition. Hence the requirement for the attribute and the impossibility of, say, registering the custom pickler elsewhere. The library needs to have a deterministic pickler generation semantics, for a reason: pickler generation is expensive (reflection, dynamic method generation, etc.) so instances are created once per type per runtime and cached. Also, for performance reasons, pickler dependencies are captured at generation time rather than being dereferenced at runtime. Introducing parametric pickler generation would introduce ambiguity and timing conditions where different pickling semantics would be bound depending on the order of pickler generation. For example, if your custom pickled type Foo were in a mutual recursive relationship with type Bar, generating the 'default' pickler for Foo beforehand would also result in Bar being bound to that 'default' version, which is almost certainly undesirable. On the other hand, if Bar were to be generated before Foo, this would not occur. So the deserializing party could either fail or succeed depending on minute runtime behaviour.

Bananas-Are-Yellow commented 9 years ago

In what I am describing (perhaps not all that well), there is no such problem. The type would only automatically generate one pickler on demand for your cache, which is the custom pickler, since the type has the [<CustomPickler>] attribute. So nothing changes in terms of your design philosophy.

What I'm suggesting is that in addition, there is a way for me to ask for the 'default' pickler to be generated for me to use privately. This would only be generated if and when I ask for it, and I would keep a reference to it inside my custom pickler. It would not live in your pickler cache, so this should have no side effects.

Referring to your code, if I understand it correctly, I think I would want a way to bypass the cache and call PicklerFactory.Create passing in an adjusted TypeShape, which means I would need to tell TypeShape.resolve to ignore any [<CustomPickler>] attribute. The resolver passed to PicklerFactory.Create would be the usual resolver, since picklers for any other types that it depends on would still come from the cache.

Does that make sense?

eiriktsarpalis commented 9 years ago

I hear what you're saying, just pointing out a few issues that may arise from this pattern. I'm convinced that what you're saying might as well work, but it requires careful re-engineering of the internals. In particular, I would like to bring to your attention this piece of code. Pickler generation makes use of an external (global) cache and an internal (disposable) cache. The latter is essential in order for recursive picklers to work and care must be taken in order to avoid partial or failed state to contaminate the global cache, lest it lose its 'referential transparency'. In recursive types your suggestion would almost certainly result in an infinite regression and stack overflow, unless the local cache logic is somehow rewritten. It is not exactly clear to me how this could be done.

That said, I recognize this is an extreme corner case, so users should conjure the 'default pickler' at their own risk. On the other hand, reverting to the 'default pickling semantics' sounds a bit ambivalent. What about users wanting to use field-based pickling inside ISerializable or DataContract types? I'm pointing this out since there could be an interesting way to make this a proper general-purpose feature.

Bananas-Are-Yellow commented 9 years ago

I'm not sure what the issue is with the local cache, but I might be missing something.

If I call PicklerFactory.Create somehow, won't this generate a pickler for the supplied shape without touching the cache for that pickler itself? It will use the supplied resolver (and then generatePickler and then resolvePickler) to obtain picklers for other types it depends on, but that's what I want. Even if my type depends on itself, I still want that pickler to come from the cache. It's only the 'root' pickler that should be 'default', if you see what I mean.

The scheme is just like overriding a method, where the code does something and then might also call the base implementation. In this case the override is the custom pickler and the base implementation is the 'default' pickler. If my method traverses to another instance of my type (e.g. a tree structure), this will call the override method on that instance. Same with picklers, i.e. it would call the custom pickler.

Can you explain what you are thinking of with your general-purpose feature? I didn't understand that last part about field-based pickling.

eiriktsarpalis commented 9 years ago

Yeah but if the type depends on itself, it must surely have been generated before being cached. But since the call occurs during pickler generation, this cannot have happened. This means that the custom pickler logic would eventually call itself, resulting in infinite regression. You either aggressively cache everything, resulting in potential inconsistencies, or face potential stack overflow conditions.

To keep your method overriding analogy, this means that the 'base implementation' must be clearly defined and included in the core logic of pickler generation. However, this notion of base implementation is in itself ambiguous. CustomPickler does not override a 'default' implementation, it is simply one of a multitude serialization schemes, like field-based serialization, datacontract-based serialization, or ISerializable types. Perhaps this might make this more apparent.

Bananas-Are-Yellow commented 9 years ago

When my CreatePickler method is called, it is passed a resolver, which I think is the temporary resolver that looks in the local cache first. An 'uninitialized' pickler for the result that I will return has already been added to the local cache at this point in time, is that right?

In my proposed scheme, my CreatePickler method will ask for the 'default' pickler, and I would supply the resolver I have, which will look in the local cache first for any types I depend on, which might include my own type if I depend on that recursively. The 'uninitialized' pickler algorithm already protects against recursion, doesn't it?

I agree that custom picklers are not in general to be thought of as overrides. The facility that I am exploring is for the case where this analogy does make sense. The 'default' implementation is defined as what you would have if you didn't have the [<CustomPickler>] attribute. If that means the type would not be serializable, then that's my fault and I'd expect an exception.

Although my example is serializing a subset of a graph, the facility seems like a general purpose way to make savings in the stream. If I have a complex object that I can see represents a common case value, instead of writing out the entire object, I could write out a simple value to indicate this common case, which I can reconstruct into that complex object again when reading. If not, then I want the 'default' pickler. Of course, I could reorganize my types so that the common value is represented, e.g. as a discriminated union, but that might make my code more complicated just for serialization. In my case, such a reorganization would be to use the Wrapper<'T> struct.

eiriktsarpalis commented 9 years ago

Your proposed scheme should work, in principle. This of course means that the notion of 'default pickler' must be 'hardcoded' in the internal resolution logic.

Still, I find the notion of a 'default pickler' to be problematic. Consider a type that implements ISerializable, has the DataContract and CustomPickler attributes. What is the 'default' pickling semantics under the definition that you provide? Will it use ISerializable? DataContract? Or will it just serialize its fields? What if somebody wants to override ISerializable rather than CustomPickler? My point is that thinking of CustomPickler as 'overriding the default semantics' is wrong, it is simply one of many different supported serialization schemes. Perhaps its choice of name is a bit misguiding.

I think there is merit to your suggestion, but these ideas need to be ironed out before writing out a plan to address it.

Bananas-Are-Yellow commented 9 years ago

TypeShape.resolve already tests possibilities in an order, e.g. CustomPickler beats DataContract, which beats ISerializable. So the 'default' pickler could be defined by what it would do in the absence of the [<CustomPickler>] attribute. However, I take your point - it would be more useful if there was more control over 'alternative' picklers.

Perhaps the alternative pickler type could be specified explicitly when I ask for the private pickler to be created? It could just be an enum.

eiriktsarpalis commented 9 years ago

Perhaps, an enum acting as a blacklist say.

Bananas-Are-Yellow commented 9 years ago

Looking at TypeShape.resolve, there are not many values for this enum:

I think I'd want to specify what I want, rather than what I don't want. In my case I'd specify Structural. If you ask for something you can't have, such as DataContract for a delegate type, this would be a coding error, so you'd get an exception.

eiriktsarpalis commented 9 years ago

Reading again this issue, it seems clear to that your initial requirement can be solved without having to change pickler generation logic. This can be possibly addressed by adding support for serialising object graphs using holes. For example:

FsPickler.SerializeWithHoles<Foo>(stream, foo, [x : obj ; y : obj ; z : obj])

This would serialise the graph, annotating any occurrence of x, y or z as a hole in the serialization. Its dual method

FsPickler.DeserializeWithHoles<Foo>(stream, [x' : obj ; y' : obj ; z' : obj])

would deserialize, patching any hole occurrence with the object of indicated index. Note that this cannot be type-safe and will not work well with cyclic graphs. Importantly, this addition is unrelated to picklers per se, but rather pertains to augmenting the graph traversal logic.

Bananas-Are-Yellow commented 9 years ago

In the end, I wrote my own pickler library. I hope you are not offended. It took me 2 weeks: one week to write the pickling logic itself, and another week to figure out the code generation. It was a good exercise and I learned a lot from doing it.

My requirements are different from yours.

I am focused on data storage, rather than transmitting data. This requires persistent links to external objects already stored, as we have discussed. The caller creates a pickler table, providing a function to override the default pickler. Typically this function will wrap the default pickler with an outer pickler that makes some decisions before calling the default pickler. When pickling is performed, the write state provides access to a caller-supplied context, which can store tables used by this outer pickler.

I am only interested in binary format, because I need the fastest and most compact data storage. I have implemented a text format for writing only, which is useful for visualizing and debugging the output. I don't need JSON for data storage, although I use Json.NET for data transmission over web sockets, which is a separate part of my application.

My data is a tree structure and I store child nodes in arrays. I generally use arrays a lot, since they are a compact representation. I wanted to avoid writing out an ObjectFlags field before every item in the array, even if it is only a single byte. If the type is not sealed, I generate a polymorphic pickler, which wraps the pickler for the type and records the actual type.

I am only interested in F#, so I have no interest in reference cycles or closures or legacy serialization methods. If I am using an existing type, which has internal cycles (e.g. Dictionary<'k,'v>), I'd rather have that serialized in a compact form.

The hardest part was the code generation. I didn't want to teach myself incomprehensible IL and Reflection Emit. So I use F# quotations and compile them using FSharp.Quotations.Evaluator. However, this has the limitation that it does not support Expr.FieldSet expressions, so I fall back on LINQ expressions in that area.

Lastly, my Pickler<'t> is a record, like your documentation, but unlike your code. I guess that's how you started out.

eiriktsarpalis commented 9 years ago

No worries, the idea of picklers can in principle apply to many hugely different implementations. I doubt you could have one library that works well in every conceivable application.

eiriktsarpalis commented 9 years ago

This has been partially addressed as of 1.2.5, closing.