taoensso / nippy

The fastest serialization library for Clojure
https://www.taoensso.com/nippy
Eclipse Public License 1.0
1.04k stars 60 forks source link

Difficulty trying to de/serialize clojure.lang.Var #94

Closed mikub closed 7 years ago

mikub commented 7 years ago

Seems to me that current custom type thaw macro does not allow for serialization of some selected (even existing) type into a literal or non-clojure type. If for example I want to turn all Vars (which cannot be serialized) into Strings it is not possible, since the thaw macro thaw-data will try to add metadata to the String and will fail in the process:

(nippy/extend-freeze clojure.lang.Var :clojure.core/var [x data-output] (.writeUTF data-output (str x)))

(nippy/extend-thaw :clojure.core/var [data-input] (.readUTF data-input))

(def some-var "a") => #'clojure.user/some-var (nippy/thaw (nippy/freeze #'clojure.user/some-var)) ClassCastException java.lang.String cannot be cast to clojure.lang.IObj clojure.core/with-meta--4146 (core.clj:217)

Is there any way around it (apart from sanitizing the data prior to the freezing)?

Thank you!

ptaoussanis commented 7 years ago

Hi there!

does not allow for serialization of some selected (even existing) type into a literal or non-clojure type

Sorry, not sure I understand this sentence - but will try work from your example.

The problem here is that you're trying to define a serializer that serializes Var->String, two completely different types.

Your input type (Var) supports and has metadata, but your output type (String) does not.

Nippy tries to preserve metadata, so it's grabbing your Var's metadata during freezing and trying to reattach it to a different type (which doesn't support metadata) during thawing.

In general you always want to thaw to the same type (or an equivalent type) to the input type being serialized. Put another way:

String <freezes to> bytes <thaws to> String
Vector <freezes to> bytes <thaws to> Vector
...
[Type] <freezes to> bytes <thaws to> [Same or equivalent type]

But in your case you have: [Type1] <freezes to> bytes <thaws to> [Incompatible type2]

Specifically: [Type 1 with metadata] <freezes to> bytes <thaws to> <Type 2 without metadata support>

What you'd actually want to do is to thaw back to a Var. Indeed, that's not actually possible/easy since Vars are pretty low level and not something you'd normally try create at runtime.

Ultimately the fundamental notion here is wrong: you shouldn't be trying to "serialize" a Var. What's your motivation?

If for some reason you're receiving input arguments for freezing that might be Vars and all you care about is de/serializing the actual value that the Vars are pointing at, then what you probably want is to just make sure that you deref any Var inputs:

(defn deref-var [x] (if (instance? clojure.lang.Var x) (deref x) x))
(def some-var "a")
(thaw (freeze (deref-var #'some-var)))

Hope that helps!

mikub commented 7 years ago

Thanks Peter! Yes, your understanding is correct, i am actually trying to convert Var to String (hence to incompatible type). In general the reasoning behind this is to provide a fall-back method of serializing (potentially all) non-serializable types. That is actually what I am after, the Var was just an example (maybe not ideal). Is there a way how to do it in nippy? E.g. specified (or maybe all) non-serializable types (in this case a Var) would be serialized by using some particular fn (e.g str ). Since nippy is already traversing the data graph to serialize it, it seems wrong to me to perform additional data sanitizing before calling freeze.

Thanks for help! Miro

ptaoussanis commented 7 years ago

No problem Miro :-)

Yeah, so clojure.lang.Var is definitely not what you're looking for as a fallback type.

Is this just a hypothetical problem you're trying to solve, or do you have a specific case you're running into problems with? Nippy already supports 2 kinds of automatic fallback:

  1. It uses Java's Serializable interface when possible
  2. Otherwise it uses Clojure's reader when possible

If neither of these work, you can set nippy/*freeze-fallback* which'll be used any time an argument can't be serialized in any other way.

(Out of curiosity, this works by extending Nippy's freezer to the JVM's base Object type).

Hope that helps!

mikub commented 7 years ago

Oh, excellent! nippy/*freeze-fallback* is pretty much exactly what I was after! Thanks heaps!

mikub commented 7 years ago

Hey Peter, one more thing - i am trying to use the *freeze-fallback* (or set-freeze-fallback! respectively) and for some reason cannot make it work. With the following:

(nippy/set-freeze-fallback!
  (fn [data-output x]
    (let [s (str x)
          ba (.getBytes s "UTF-8")
          len (alength ba)]
      (.writeByte data-output (byte 13))
      (.writeInt data-output (int len))
      (.write data-output ba 0 len))))

I would still get the same exception:

(nippy/thaw (nippy/freeze #'some-var))
ClassCastException java.lang.String cannot be cast to clojure.lang.IObj  clojure.core/with-meta

though with different stack trace:

      :trace [[clojure.core$with_meta__4146 invoke "core.clj" 217]
               [taoensso.nippy$thaw_from_in_BANG_ invoke "nippy.clj" 1126]
               [taoensso.nippy$thaw$thaw_data__18383 invoke "nippy.clj" 1330]

Anything obviously wrong with my code?

I also tried more concise:

(nippy/set-freeze-fallback!
  (fn [data-output x]
      (.writeByte data-output (byte 16))
      (.writeUTF data-output (str x))))

Thanks for help!

ptaoussanis commented 7 years ago

Hey there :-)

So you're running into the same problem for the same reasons I described before. You're trying to serialize a Var (which has metadata) into a String (which cannot take metadata).

Why are you trying to serialize a Var? A Var is a special low-level type generally concerned with supporting Clojure's dynamic (REPL) features. It's basically a pointer object. It's not normally something you'd need or want to serialize at the application level.

What specific effect are you trying to achieve? Could you provide a concrete example?

mikub commented 7 years ago

The use case is quite simple - fallback function for non-serializable objects. Var serves as an example of such unserializable object. Instead of throwing a runtime Exception I want nippy to use this fallback function (str x) if an object cannot be serialized. This is quite standard feature / use case in serialization libraries I guess.

It can happen e.g. if you get data which you have no control over (in terms of data structure design) and that may contain some arbitrary (and potentially non-serializable) objects. In such cases you dont want your application to crash, you want to handle the situation gracefully.

Seems I misunderstood the use of the freeze-fallback, i thought it was meant for exactly such purposes. It appears to me that there is a constraint in nippy that you can only serialize into custom protocols records (that may not equal to original Object anyway), but not into String...

Thanks!

ptaoussanis commented 7 years ago

The use case is quite simple - fallback function for non-serializable objects. Var serves as an example of such unserializable object.

As I mentioned, Var is not an appropriate type for your use case as I understand it. Vars have a specific purpose related to Clojure's dynamic/REPL environment and aren't the kind of thing you can/should encounter as general user input.

Trying to de/serialize a Var isn't something that inherently makes much sense (which is why you're seeing an exception).

If for some reason you're encountering Vars and just want to de/serialize their dereferenced values, you can use the deref solution I provided above.

If you really want to just push through and get the behaviour that you're looking for, you can directly extend the nippy/IFreezable2 protocol to clojure.lang.Var. Though would recommend against this approach for the reasons already discussed :-)

Hope that helps! Cheers!

mikub commented 7 years ago

Hi Peter, it seems you are focusing only on the Var. It was just an example. Instead, think any-non-serializable object. In general this would often be reference to local address space. I agree it should not get on the data graph in the first place. What I am talking here about is that when it does, I dont want my application to crash, instead I want it to use the *freeze-fallback* function. Now it works perfectly for objects without metadata:

(nippy/set-freeze-fallback!
  (fn [data-output x]
    (let [s (str x)
          ba (.getBytes s "UTF-8")
          len (alength ba)]
      (.writeByte data-output (byte 13))
      (.writeInt data-output (int len))
      (.write data-output ba 0 len))))

(nippy/thaw (nippy/freeze (java.lang.Object.)))
=> "java.lang.Object@46857f71"

But it errors out for clojure non-serializable objects, in case that the final fallback method produces a literal that does not bear metadata (does not implement clojure.lang.IObj) - e.g. String:

(nippy/thaw (nippy/freeze #'some-var))
ClassCastException java.lang.String cannot be cast to clojure.lang.IObj  clojure.core/with-meta

Imho this is a bug and an inconsistent behavior. If *freeze-fallback* is used, metadata should not be broadcasted if the resulting object is not clojure.lang.IObj.

I agree this is a rare use case, but i think it is valid nevertheless. I just want to make a foolprooff application and don't rely on users always knowing what they are doing. The fallback functionality should imho serve exactly for this purpose - to prevent an exception from occurring regardless of what data are there. This is currently not the case.

I am able to implement this use case without issues with Transit as well as with EDN.

ptaoussanis commented 7 years ago

Hi Miro,

it seems you are focusing only on the Var. It was just an example. Instead, think any-non-serializable object.

I am focusing on the Var because, as I've been suggesting, the specific (and only) example you keep citing is causing an atypical and (it is my contention) unrealistic problem.

If we're getting bogged down on the specific type, then may I request that you suggest a different type? I.e. provide a concrete example of a type that:

  1. Has attached metadata (so will trigger automatic metadata retention).
  2. Doesn't already have automatic support for de/serialization through Nippy's own protocol, Java's Serializable interface, or the Clojure EDN reader.
  3. You could conceivably encounter through some sort of real-world unsanitized system (i.e. an example that isn't entirely synthetic).

My point is (and has been): your objective seems to be to do something fairly unusual. I've provided two reasonable ways to address the objective, including extending the underlying protocol which should give you as much flexibility as you like.

What I am talking here about is that when it does, I dont want my application to crash, instead I want it to use the freeze-fallback function.

Software, in general, may throw an exception when given input that doesn't make sense. This can be a particularly useful/important property for software that deals with data storage - since if a program's being asked to store or do something with data that doesn't make sense, we risk data loss.

But it errors out for clojure non-serializable objects, in case that the final fallback method produces a literal that does not bear metadata

You keep stating this, but it's not accurate: you can serialize whatever you like. But if the thing being serialized has metadata then, by default, there's an expectation that the deserialized output also supports metadata. Otherwise there's potentially easy-to-miss data loss.

You're asking Nippy to serialize something. That thing has metadata attached. But you're giving Nippy no way to reconstruct the metadata on deserialization. So it's throwing to prevent you from accidentally losing data.

This is for convenience and safety, and a useful property for 99% of use cases (indeed all of the ones I can think of).

And I like that this design is explicit. If you specifically don't care about metadata during serialization and want to just drop it, you can easily make that explicit: (defn without-meta [x] (if (meta x) (with-meta x nil) x)).

Otherwise, again, you're risking easy-to-miss data loss. When in doubt, Nippy will try to protect your data.

When you give it an unusual request that would silently drop data, it throws an exception. If you want to catch+ignore it, you're welcome to. Or you can strip metadata before serialization, or you can dereference Vars (since that's the only example we've been able to think of where this discussion is relevant), or you can drop to the level of extending the Nippy protocol yourself and (in effect) drop metadata there.

Imho this is a bug and an inconsistent behavior. If freeze-fallback is used, metadata should not be broadcasted if the resulting object is not clojure.lang.IObj.

The problem is that Nippy has no way of knowing at freeze time, what the intended output type will be, and if it will support metadata. That's by design, and has numerous important benefits.

We could check this at thaw time, but then we reach the crux of this debate again. What should Nippy do if an object was serialized with metadata, and we're now deserializing to an object type that doesn't support metadata? Silently drop the metadata?

I.e. risk losing critical data to make it a little more convenient to support a rare (if not unique) and synthetic use case?

I just want to make a foolprooff application and don't rely on users always knowing what they are doing.

Foolproof means two things to me:

  1. Things don't throw ("crash") unnecessarily.
  2. Things do throw when there's a good reason to (e.g. potential unexpected data loss).

If you understand the risks, prefer not to do something explicit (like strip metadata on inputs), and don't want to bother catching this particular safety - Nippy doesn't get in your way: you can directly extend the necessary protocol.

I am able to implement this use case without issues with Transit as well as with EDN.

Nippy offers a particular set of considered tradeoffs and design priorities that I think make sense for its intended applications :-)

Anyway, hope that was useful and makes some sense!

Please note that this'll need to be my last response on the current topic - am pretty time constrained and have some other things I'd like to be working on :-)

Cheers!

mikub commented 7 years ago

Thanks Peter, I absolutely get where you are coming from.

If we're getting bogged down on the specific type, then may I request that you suggest a different type? I.e. provide a concrete example of a type that:

You are making a good point here. I thought all clojure references to local address space would behave in similar way as Var, but after some testing it really seems that Var is quite specific. E.g. with the str fallback this works (I didnt think it would; i thought there would be some metadata as well):

(nippy/thaw (nippy/freeze (async/chan)))
=> "clojure.core.async.impl.channels.ManyToManyChannel@6e8a1cda"
(nippy/thaw (nippy/freeze (atom {})))
=> "clojure.lang.Atom@67ec669b"

If it is just the Var, then I can have custom record for serializing it and the rest would be handled by the str fallback.

What should Nippy do if an object was serialized with metadata, and we're now deserializing to an object type that doesn't support metadata? Silently drop the metadata?

In ideal world this would be configurable and up to a user to decide. However since it seems that really this is only specific to Vars I will be able to work around it - e.g create specific record and handler for Vars.

Thanks for your help, appreciate the time you took to respond to this somewhat fringe use case!

P.S. it woul really be helpfull if all the serialization fns (i.e. write-*) in nippy wouldn't be private since they all might be reused for the fallback fn (you can see in my example i had to really re-implement write-str from scratch).

ptaoussanis commented 7 years ago

In ideal world this would be configurable and up to a user to decide. However since it seems that really this is only specific to Vars I will be able to work around it - e.g create specific record and handler for Vars.

That's certainly a possibility, and an idea I was feeling out in our discussion. It'd be fairly straightforward to mod extend-freeze to optionally+explicitly ignore metadata.

At the moment am leaning toward saying that the gains wouldn't be worth the extra conceptual complexity (users needing to understand what option they want for this). The (very) rare user who really wants to ignore metadata and understands the implications can just extend the underlying protocol without too much effort.

As a middle ground, I think I might just document the underlying protocols a little more.

P.S. it woul really be helpfull if all the serialization fns (i.e. write-*) in nippy wouldn't be private since they all might be reused for the fallback fn (you can see in my example i had to really re-implement write-str from scratch).

That's indeed on the cards. Erred on the side of making these private for now since:

  1. Nippy's concerned with micro optimizations, so may occasionally benefit from breaking changes to these kinds of low-level utils.
  2. Nippy has unusually strong requirements for stability, including in its API.

The two are in opposition, so decided again to err on the side of making tradeoffs that benefit the largest group of users. The rare advanced users who write their own de/serializers also tend to be in the best position to understand the low-level platform tooling (i.e. have the least need/desire for convenience fns).

Do intend to open up more of the low-level utils as/when I'm confident that I'll never want them to change again.

Thanks for your help, appreciate the time you took to respond to this somewhat fringe use case!

No problem, thanks for saying so. I appreciated the discussion.

Good luck with your project, cheers!