taoensso / nippy

The fastest serialization library for Clojure
https://www.taoensso.com/nippy
Eclipse Public License 1.0
1.04k stars 60 forks source link

Nippy doesn't know how to freeze metadata from next.jdbc #171

Closed slipset closed 2 months ago

slipset commented 2 months ago

This is most likely me not reading the docs well enough, but: next.jdbc attaches metadata to its results which nippy doesn't know how to serialize, and nippy then throws.

clojure.lang.ExceptionInfo: Unfreezable type: class next.jdbc.result_set$navize_row$fn__31316 {:type next.jdbc.result_set$navize_row$fn__31316, :as-str "#function[next.jdbc.result-set/navize-row/fn--31316]"}
    at taoensso.nippy$throw_unfreezable.invokeStatic(nippy.clj:1005)

I would be more than happy if I could instruct nippy to just disregard this metadata.

ptaoussanis commented 2 months ago

@slipset Hi Erik, I'm sorry but I'm not 100% sure what you're asking exactly.

If you want to drop the metadata from an object in Clojure, you can use (with-meta <object> nil). So in your case something like (nippy/freeze (with-meta <my-jdbc-result> nil)).

Alternatively, you could disable nippy/*incl-metadata?* for a particular freeze call (using binding), or for all freeze calls (using alter-var-root). That'll affect all metadata though - not just the next.jdbc stuff, so it might be better to just use (with meta <object> nil) on your jdbc results. Just let me know if you'd like examples.

I hope that helps?

slipset commented 2 months ago

It might be because I'm not 100% sure what I'm asking :) So, the troublesome metadata comes from next.jdbc: We have the protocol DatafiableRow which says:

...just adds metadata so that datafy can be called on the row...

As for the implementation of this protocol, we see it here

So, to what I'm asking for, envisioning: 1) We're already doing (with-meta bla nil), but we're doing it as we get results from jdbc, not when we freeze. This is the "easiest" for us, because as we freeze, we might be freezing a deeper structure, so traversing that and setting (with meta bla nil) is a bit cumbersome, but there is a slight performance cost of clearing the metadata for everything we fetch from the database in case it should ever be frozen. 2) I think disabling nippy/*incl-metadata?* is a viable option, but we'd have to do it globally since we rely quite a bit on carmine, which uses nippy (as you know, I guess), but it feels a bit heavy handed. 3) What I would like I think, is to be able to register a "skipping-deserializer" for the jdbc metadata stuff, which just ignores it, or 4) Instruct nippy to silently (or log/warn or log/error) ignore stuff it doesn't know how to serialize.

Again, sorry if this should have been obvious to me :)

slipset commented 2 months ago

Reading about nippy/*incl-metadata?* made me find nippy/*freeze-fallback* which might just be what I'm looking for, setting it to :write-unfreezable ?

ptaoussanis commented 2 months ago

Okay, gotcha - thanks for clarifying.

And I guess next.jdbc doesn't allow you to disable this metadata from its side (or you sometimes do use this metadata)?

I'd avoid using nippy/*freeze-fallback* for this since that could have unintended consequences, may affect performance, and would probably unnecessarily bloat your serialized output.

If you're looking for a very easy solution, it'd probably be better to globally turn off nippy/*incl-metadata?* since most folks don't actually need/use metadata serialization.

What I would like I think, is to be able to register a "skipping-deserializer" for the jdbc metadata stuff, which just ignores it

This is where I'd lean. There's a few options for how to approach this, depending on the details of what next.jdbc is doing. I'm unfortunately not familiar with the library, but basically, you'll want to use nippy/extend-freeze and nippy/extend-thaw to set up custom handling of the relevant types.

E.g. if the troublesome metadata consists of a limited number of specific types, you could configure Nippy to no-op on these types, e.g.:

(nippy/extend-freeze AMetaDataType
  :next.jdbc/a-metadata-type
  [x out]
  (nippy/freeze-to-out! out nil) ; Write metadata value as nil
  )

(nippy/extend-thaw :next.jdbc/a-metadata-type
  [in]
  (nippy/thaw-from-in! in) ; Read back the nil
  )

That'll basically intercept the relevant metadata as it's being serialized, so that it gets serialized as nil instead.

Having said that - I'd actually be keen to understand better whether it might be useful for Nippy to automatically skip the relevant types out-the-box. Would be happy to take a look at a PR.

Or I can take a look at doing this myself if you (or someone else) could point me to (and/or provide a simple example I can play with) that shows the relevant info re: what specific result and metadata types are involved.

Is the result metadata always unserializable? What's it actually used for?

slipset commented 2 months ago

Thanks again for coming back to me. I'll loop in @seancorfield so he can shed some light on this as well. The metadata in question should (I believe) not be serialized, as it's used for interacting with tools like REBL (which I believe is now called something else) to visually browse result sets.

I do think the extend-freeze is the way to go in this particular case, just need to figure out the types.

Having said that, I don't think this next.jdbc metadata ever makes sense to serialize over nippy, as they are basically closures, so if I'm not mistaken, it would be impossible to thaw them in any meaningful way.

slipset commented 2 months ago

Here's some more info: So, printing out the metadata on the result I get from the database gives me:

#:clojure.core.protocols{datafy
                #function[next.jdbc.result-set/navize-row/fn--31316],
                nav
                 #function[next.jdbc.result-set/navable-row/fn--31319]}

So I guess it's the clojure.core.protocols.Datafy which is the problem? And, I guess it doesn't make sense to ever serde that as it's just two fns, meaning that nippy could safely do nothing about it and also be happy knowing about it since it's a core protocol?

slipset commented 2 months ago

So, being slow, I'm sorry. But it just occurs to me. Protocol implementations doesn't make sense to serialize, because, well functions. And I guess what we're seeing here is an effect of protocols being attachable to metadata as of Clojure 1.10/1.11? Again, I guess one can be even more general than what I wrote in the previous comment ie, don't freeze clojure.core.protocols.Datafy, but rather be so general as to say "Don't freeze any protocol implementation in metadata"?

seancorfield commented 2 months ago

Functions are never going to be serializable which means Nippy is never going to be able to handle any objects that implement protocols via metadata. I'm not at my computer yet to check, but do functions implement Serializable? Could that be a useful "flag" for Nippy to use to avoid trying to freeze underislzable data in general?

slipset commented 2 months ago

Should be sufficient to check if something is a function, and skip it if it is?

ptaoussanis commented 2 months ago

Thanks for your input on this Sean 🙏

So I guess it's the clojure.core.protocols.Datafy which is the problem?

Interesting, I'm not familiar with Datafy nor metadata protocol extension - both had slipped by my radar.

Could that be a useful "flag" for Nippy to use to avoid trying to freeze underislzable data in general? / Should be sufficient to check if something is a function, and skip it if it is?

The problem is that we don't want to just silently skip arbitrary fns in general. If someone's trying to serialize a fn, it may be by ignorance or an accident - in which case the current behaviour (throwing) is appropriate to indicate that an unfreezable type was encountered.

It might be reasonable to make an exception in the specific case though of top-level fn vals in metadata due to metadata protocol extension.

Will look into this properly next time I'm on batched Nippy work (certainly before end of this month) - want to understand better what problem metadata protocol extension is meant to solve, how widely it's used, etc.

@slipset What's your level of urgency on this?

slipset commented 2 months ago

None :) We've worked around this for some time and can deffo do so for some more time

seancorfield commented 2 months ago

The ability to extend protocols via metadata was added in Clojure 1.10, about six years ago, so that users could make arbitrary data satisfy specific protocols without needing to specifically extend the protocol. It's primarily used to allow certain hash maps to participate in protocol functions -- rather than trying to extend the protocol to all hash map data types -- and you'll see it used quite a bit in applications that use Component (Sierra's lib).

I don't know how widespread its use is with datafy and nav, but those were also introduced in Clojure 1.10, and there are quite a few tools relying on the protocols behind those (REBL -- now Morse, Reveal, Portal). next.jdbc included protocol metadata to support datafy and nav from the beginning and its first version was released shortly after Clojure 1.10.

Note that, with next.jdbc, the metadata is on elements of the result set vector, not on the vector itself, so the metadata is inside the collection: individual result set rows (hash maps) are Datafiable and Navigable.

In all of these cases, the metadata will include keys that are fully-qualified symbols (identifying a function from some protocol) and a value that is a function to be used as the implementation of that protocol, and that could occur at any nesting "level" within any data structure.

And, now I'm at a computer, in answer to my own question: yes, Clojure functions are marked as Serializable in the Java sense -- so I guess that's not helpful for Nippy, since it can't serialize some things that Java thinks are serializable.

ptaoussanis commented 2 months ago

@seancorfield Hi Sean, thanks for the detailed info - that's very helpful and much appreciated! 🙏

I've modified Nippy master to auto strip {<qualified-sym> <fn>} kvs from metadata maps 👍

and that could occur at any nesting "level" within any data structure.

Thanks for clarifying this. That won't be a problem, Nippy's freezing is recursive - so we need only define the behaviour for metadata maps in general, and that'll automatically apply to metadata at any nesting level.

Clojure functions are marked as Serializable in the Java sense -- so I guess that's not helpful for Nippy, since it can't serialize some things that Java thinks are serializable.

Correct, though that's not a problem - Nippy already tests for fn? as a special case for this reason.

ptaoussanis commented 2 months ago

@slipset Hi Erik- this should now be addressed in v3.4.0-RC3.

If you get an opportunity, please let me know if that works for you?

I'm aiming to cut the final release by end of this month.

Thanks again for pinging about this! Cheers :-)

slipset commented 2 months ago

Works like a charm, it seems :) Thanks a lot!

We seem to have nippy as a transitive dep on carmine (at 3.2.0) Would it be okay to just explicitly depend on nippy 3.4.0?

ptaoussanis commented 2 months ago

You're very welcome, thanks for the fast confirmation. And again apologies for the hassle on this!

We seem to have nippy as a transitive dep on carmine (at 3.2.0) Would it be okay to just explicitly depend on nippy 3.4.0?

Yes, that would be fine. Any time I update Carmine, I'll always bump to the latest stable Nippy version.

Just be aware that if you're freezing data with Nippy vX, you should always thaw with Nippy >= vX otherwise you may run into exceptions while thawing.

I.e. if you do update Nippy on nodes that write to Redis, make sure you also update it on nodes that read from Redis.

Also, depending on your risk profile - you may want to wait for v3.4.0 stable at the end of this month.