taoensso / nippy

The fastest serialization library for Clojure
https://www.taoensso.com/nippy
Eclipse Public License 1.0
1.04k stars 60 forks source link

*freeze-fallback* name misleading ? #141

Closed behrica closed 2 years ago

behrica commented 3 years ago

My use case is to ignore all non freezable errors.

I thought I can do that and set freeze-fallback to an empty method. But this does not work as expected.

I saw as well in the code, that if freeze-fallback is set, it is called always, and not only in case of un-freezable types, as the name suggests.

This seems to be a contradiction with its name "fallback".

behrica commented 3 years ago

Maybe an other use case question would be:

How can is "see", if I have in a complex map, some thing which are not freezable. I though to use freeze-fallback for this with a noop implementation which just prints what gets for debugging purpose.

ptaoussanis commented 3 years ago

Hi Carsten,

I'm sorry, I'm not sure that I understand your question.

My use case is to ignore all non freezable errors. I thought I can do that and set freeze-fallback to an empty method. But this does not work as expected.

Can you please give an example of the input and behaviour/output that you're hoping for?

I saw as well in the code, that if freeze-fallback is set, it is called always, and not only in case of un-freezable types, as the name suggests.

I'm not sure what you mean here by "called always". freeze-fallback is called when trying to freeze types that don't have a more specific implementation registered. It is in this sense a "fallback" implementation.

behrica commented 3 years ago

This is no what I see. In my tests, I had the impression that it is called "always"

behrica commented 3 years ago

It happnes on a specific datatype, it seems:

  (def frozen-broken
    (with-redefs [nippy/*freeze-fallback* (fn [data-output x]
                                            (println "could no freeze: " x))]
      (nippy/freeze-to-string (tech.v3.dataset/->>dataset {:a [1 2]}))))

  (nippy/thaw-from-string frozen-broken)

This prints "could not freeze" for some 2-d Arrays (which are inside the dataset) and the thaw fails with NPEs But without the redef of freeze-fallback it all works fine. The dataset can be frozen and thawn without issue:

 (def working-x
    (nippy/freeze-to-string (tech.v3.dataset/->>dataset {:a [1 2]})))
  (nippy/thaw-from-string working-x)

The dataset libraries extends nippy to support dataset like this: https://github.com/techascent/tech.ml.dataset/blob/master/src/tech/v3/dataset/io/nippy.clj

This is probably related

behrica commented 3 years ago

This is why I said the function in freeze-fallback gets called, even though the full object can be frozen without issue.

behrica commented 3 years ago

So my assumption is, that if freeze of an object works without error, freeze-fallback would never be called, even if set to non nil.

behrica commented 3 years ago

Hi Carsten,

I'm sorry, I'm not sure that I understand your question.

My use case is to ignore all non freezable errors. I thought I can do that and set freeze-fallback to an empty method. But this does not work as expected.

Can you please give an example of the input and behaviour/output that you're hoping for?

I have a complex deep nested map, where some values are clojure fns, so not freezable, I would like to see "where" are those values, which are not freezable.

I though I can use freeze-fallback to "log" me all this places

behrica commented 3 years ago

Hi Carsten,

I'm sorry, I'm not sure that I understand your question.

My use case is to ignore all non freezable errors. I thought I can do that and set freeze-fallback to an empty method. But this does not work as expected.

Can you please give an example of the input and behaviour/output that you're hoping for?

I saw as well in the code, that if freeze-fallback is set, it is called always, and not only in case of un-freezable types, as the name suggests.

Line https://github.com/ptaoussanis/nippy/blob/ba8827708ea9da828735562244d151b59eb692a7/src/taoensso/nippy.clj#L1237 seems to call ff before anything else, but maybe I understand it wrong

ptaoussanis commented 3 years ago

This is no what I see. In my tests, I had the impression that it is called "always"

freeze-fallback is called only as a fallback for types that don't have a more specific implementation. So it won't be called for most of the types in the stress data example.

It will be called for unknown types, as in your example with tech.v3.dataset. That's a type that Nippy doesn't know anything about out-the-box, so the fallback implementation will kick in.

So my assumption is, that if freeze of an object works without error, freeze-fallback would never be called, even if set to non nil.

This isn't accurate. freeze-fallback is called for all types that don't have a specific freeze implementation. In other words, you can think of freeze-fallback as the generic catch-all for when Nippy doesn't have a specific freeze implementation registered for the type.

Does that make sense?

I have a complex deep nested map, where some values are clojure fns, so not freezable, I would like to see "where" are those values, which are not freezable.

I'm sorry, I'm not sure what you mean by "where". If you set a non-nil *freeze-fallback* fn, that will indeed be called on all objects (nested or not) that you are attempting to freeze, and for which there is not a type-specific implementation already registered.

Can you please provide a specific example of the input and output/behaviour that you'd like to see? In that case I might be able to try suggest something.

ptaoussanis commented 3 years ago

seems to call ff before anything else, but maybe I understand it wrong

The context is important here: https://github.com/ptaoussanis/nippy/blob/ba8827708ea9da828735562244d151b59eb692a7/src/taoensso/nippy.clj#L1233

That whole code block only runs when attempting to freeze a type that doesn't have a more specific freeze implementation. That block is a freeze implementation for the Object class, which is the parent of all other classes.

If you have a protocol (like Nippy's freeze protocol) that targets both Object and a more specific child class (e.g. String) - then the more specific implementation will be used when available. So the Object implementation only kicks in when a better (more specific) one isn't available. I.e. that whole code block is already in the "fallback" case.

Does that make sense?

When you call freeze on something (let's say class X) - Nippy will effectively check if there is a freeze implementation for the specific class X.

Since Object is the ultimate parent of every class, this is the final catch-all ("fallback") case.

behrica commented 3 years ago

Yes, things are more clear now.

Thanks for that.

I try to formulate more clear my "doubts"

behrica commented 3 years ago

My use case is that I sit in front of a big nested map, which I cannot print due to its size for inspection. I want to serialize it to diskwith Nippy. I could perfectly live with removing parts of it, very likely.

I "try" and I get an error on a unknown type, a function is referenced somewhere in the nested map.

The I want to "explore" this issue. So I would like to know, things such as:

Because I want to write some code which removes the problematic objects from the nested map, so I need to know "where" they are in the map, the "path" to them

Removing means to write "dissoc" statements, which need a "path"

behrica commented 3 years ago

For now I don't know, if I can do that in a more intelligent way then :

  1. Calling freeze -> see error
  2. Inspect carefully data structure, until I find the exact place and write code to remove it (or register a freezer for it)
  3. Goto 1

And I do that until all errors are gone (but maybe I have hundreds ..) Which would mean, I should do something else.

behrica commented 3 years ago

This is no what I see. In my tests, I had the impression that it is called "always"

freeze-fallback is called only as a fallback for types that don't have a more specific implementation. So it won't be called for most of the types in the stress data example.

It will be called for unknown types, as in your example with tech.v3.dataset. That's a type that Nippy doesn't know anything about out-the-box, so the fallback implementation will kick in.

So the fact that techml call extend-freeze to handle Dataset via : nippy/extend-freeze does not change this ? https://github.com/techascent/tech.ml.dataset/blob/master/src/tech/v3/dataset/io/nippy.clj

So it is still "unknown" ?

behrica commented 3 years ago

seems to call ff before anything else, but maybe I understand it wrong

The context is important here:

https://github.com/ptaoussanis/nippy/blob/ba8827708ea9da828735562244d151b59eb692a7/src/taoensso/nippy.clj#L1233

That whole code block only runs when attempting to freeze a type that doesn't have a more specific freeze implementation. That block is a freeze implementation for the Object class, which is the parent of all other classes.

If you have a protocol (like Nippy's freeze protocol) that targets both Object and a more specific child class (e.g. String) - then the more specific implementation will be used when available. So the Object implementation only kicks in when a better (more specific) one isn't available. I.e. that whole code block is already in the "fallback" case.

Does that make sense?

When you call freeze on something (let's say class X) - Nippy will effectively check if there is a freeze implementation for the specific class X.

  • If yes, use it.
  • If no, then fall back to the freeze implementation of the nearest applicable parent.

Since Object is the ultimate parent of every class, this is the final catch-all ("fallback") case. This I don't see. The "fall back method" gets executed eventhough there is a freeze implementation for Dataset registered.

My code calls this before trying to freeze:

(nippy/extend-freeze
 Dataset :tech.ml/dataset
 [ds out]
 (nippy/-freeze-without-meta! (ds-base/dataset->data ds) out))

Nevertheless I see that the function in freeze-fallback get executed for something which is inside the "Dataset" record.

behrica commented 3 years ago

Doing this:

(def frozen-broken
    (with-redefs [nippy/*freeze-fallback* (fn [data-output x]
                                            (println "could no freeze: " x))]
      (nippy/freeze-to-string (tech.v3.dataset/->>dataset {:a [1 2]}))))

prints in the repl:

could no freeze:  #object[[I 0x7b64cde2 [I@7b64cde2]
could no freeze:  #object[[J 0x71bcb48f [J@71bcb48f]

and the base64 string is invalid (it fails on thaw)

behrica commented 3 years ago

Doing the same without `with-redef" works perfectly. Freeze and thaw without issues.

behrica commented 3 years ago

The nested arrays are the way Dataset internally stores the data, in primitive arrays.

behrica commented 3 years ago

I found the same issue without using dataset:

(with-redefs [nippy/*freeze-fallback* (fn [data-output x]
                                        (println "could no freeze: " x))]
  (nippy/freeze-to-string {:a (double-array [1 2])}))

prints:

could no freeze: #object[[D 0x4b065008 [D@4b065008] could no freeze: #object[[D 0x41bcf035 [D@41bcf035]

behrica commented 3 years ago

Formulated otherwise:

I expect that the following pieces of code both successfully freeze and thaw:

 (nippy/thaw-from-string (nippy/freeze-to-string {:a (double-array [1 2])}))
(nippy/thaw-from-string 
 (with-redefs [nippy/*freeze-fallback* (fn [data-output x]
                                         (println "could no freeze: " x))]
   (nippy/freeze-to-string {:a (double-array [1 2])})))

but the second fails with:

 2. Caused by clojure.lang.ExceptionInfo
   Thaw failed against type-id: 112
   {:type-id 112}
                 nippy.clj: 1773  taoensso.nippy/thaw-from-in!
                 nippy.clj: 1583  taoensso.nippy/thaw-from-in!
                 nippy.clj: 1883  taoensso.nippy/thaw/fn/thaw-data
                 nippy.clj: 1909  taoensso.nippy/thaw/fn
                 nippy.clj: 1274  taoensso.nippy/call-with-bindings
                 nippy.clj: 1270  taoensso.nippy/call-with-bindings
                 nippy.clj: 1852  taoensso.nippy/thaw
                 nippy.clj: 1828  taoensso.nippy/thaw
                 nippy.clj: 2155  taoensso.nippy/thaw-from-string
                 nippy.clj: 2149  taoensso.nippy/thaw-from-string
                 nippy.clj: 2152  taoensso.nippy/thaw-from-string
                 nippy.clj: 2149  taoensso.nippy/thaw-from-string
                      REPL:  447  scicloj.metamorph.ml-test/eval39886
                      REPL:  447  scicloj.metamorph.ml-test/eval39886
             Compiler.java: 7181  clojure.lang.Compiler/eval
             Compiler.java: 7136  clojure.lang.Compiler/eval
                  core.clj: 3202  clojure.core/eval
                  core.clj: 3198  clojure.core/eval
    interruptible_eval.clj:   87  nrepl.middleware.interruptible-eval/evaluate/fn/fn
                  AFn.java:  152  clojure.lang.AFn/applyToHelper
                  AFn.java:  144  clojure.lang.AFn/applyTo
                  core.clj:  667  clojure.core/apply
                  core.clj: 1977  clojure.core/with-bindings*
                  core.clj: 1977  clojure.core/with-bindings*
               RestFn.java:  425  clojure.lang.RestFn/invoke
    interruptible_eval.clj:   87  nrepl.middleware.interruptible-eval/evaluate/fn
                  main.clj:  437  clojure.main/repl/read-eval-print/fn
                  main.clj:  437  clojure.main/repl/read-eval-print
                  main.clj:  458  clojure.main/repl/fn
                  main.clj:  458  clojure.main/repl
                  main.clj:  368  clojure.main/repl
               RestFn.java:  137  clojure.lang.RestFn/applyTo
                  core.clj:  667  clojure.core/apply
                  core.clj:  662  clojure.core/apply
                regrow.clj:   20  refactor-nrepl.ns.slam.hound.regrow/wrap-clojure-repl/fn
               RestFn.java: 1523  clojure.lang.RestFn/invoke
    interruptible_eval.clj:   84  nrepl.middleware.interruptible-eval/evaluate
    interruptible_eval.clj:   56  nrepl.middleware.interruptible-eval/evaluate
    interruptible_eval.clj:  152  nrepl.middleware.interruptible-eval/interruptible-eval/fn/fn
                  AFn.java:   22  clojure.lang.AFn/run
               session.clj:  202  nrepl.middleware.session/session-exec/main-loop/fn
               session.clj:  201  nrepl.middleware.session/session-exec/main-loop
                  AFn.java:   22  clojure.lang.AFn/run
               Thread.java:  829  java.lang.Thread/run

1. Caused by java.io.EOFException
   (No message)

      DataInputStream.java:  272  java.io.DataInputStream/readByte
                 nippy.clj: 1590  taoensso.nippy/thaw-from-in!
                 nippy.clj: 1583  taoensso.nippy/thaw-from-in!
                 nippy.clj: 1439  taoensso.nippy/read-kvs-into/fn
ptaoussanis commented 2 years ago

I expect that the following pieces of code both successfully freeze and thaw:

(nippy/thaw-from-string (nippy/freeze-to-string {:a (double-array [1 2])}))

(nippy/thaw-from-string 
 (with-redefs [nippy/*freeze-fallback* (fn [data-output x]
                                         (println "could no freeze: " x))]
   (nippy/freeze-to-string {:a (double-array [1 2])})))

The second case will fail while thawing because:

  1. You're trying to freeze a double array which is a type without a dedicated Nippy freezer implementation registered.
  2. Which means *freeze-fallback* (if provided) will be used instead.
  3. You've overridden *freeze-fallback* with a broken custom implementation that doesn't actually write anything to data-output.
  4. So Nippy's output basically indicates that there's a map with a single kv pair, but the data for the key's value is missing.
  5. This causes the thaw to throw an EOF exception: thaw's still expecting data for the missing key value, but no data is available.

Closing for now. If there's still an issue/question you need assistance with, it'd be helpful to have a clear restatement and ideally a short working example. Thanks!