techascent / tech.ml.dataset

A Clojure high performance data processing system
Eclipse Public License 1.0
680 stars 35 forks source link

Roundtrip to nippy fails on this ds #433

Closed harold closed 4 days ago

harold commented 5 days ago
> d
[{:sha "4e37a59bd3a3c9ea80bfa51ed8e6466ca40daa27",
  :date #inst "2023-08-16T16:57:59.000-00:00",
  :message "`[logo]` Place lower on README",
  :parents ["2ea634aea12b702c1fb2692feaed37e638f202a1"]}
 {:sha "bde26e8c8bed36f0bb0c17a62da7df6336cc4ea5",
  :date #inst "2021-08-14T20:48:33.000-00:00",
  :message "Add positional renaming of columns #262 (#264)",
  :parents ["cc8942647aae7644f00af63129fa9fa4e1cf6860"]}
 {:sha "23624f28ea422a7630374289530ab104601bd6c6",
  :date #inst "2021-11-07T16:52:07.000-00:00",
  :message "6.029",
  :parents ["243daf0cc66cf2900b2473f4c26a8f10a372682c"]}
 {:sha "3ab4216093bba9499107725ef623063ba351f787",
  :date #inst "2021-06-24T19:37:09.000-00:00",
  :message "Updating dtype-next with vectorized code.",
  :parents ["4a1a30871af42e490c628debb567738ae973ac98"]}
 {:sha "be706e114cd13921a264f40b21ebf7c21b29c897",
  :date #inst "2021-04-07T15:55:23.000-00:00",
  :message
  "fix(print): check if value is not nil before running .toString (#226)",
  :parents ["37aa95c704adbfdded0e73b6ed95c252003a6b6a"]}]
> (ds/->dataset d)
_unnamed [5 4]:

|                                     :sha |                        :date |                                                              :message |                                     :parents |
|------------------------------------------|------------------------------|-----------------------------------------------------------------------|----------------------------------------------|
| 4e37a59bd3a3c9ea80bfa51ed8e6466ca40daa27 | Wed Aug 16 10:57:59 MDT 2023 |                                        `[logo]` Place lower on README | ["2ea634aea12b702c1fb2692feaed37e638f202a1"] |
| bde26e8c8bed36f0bb0c17a62da7df6336cc4ea5 | Sat Aug 14 14:48:33 MDT 2021 |                        Add positional renaming of columns #262 (#264) | ["cc8942647aae7644f00af63129fa9fa4e1cf6860"] |
| 23624f28ea422a7630374289530ab104601bd6c6 | Sun Nov 07 09:52:07 MST 2021 |                                                                 6.029 | ["243daf0cc66cf2900b2473f4c26a8f10a372682c"] |
| 3ab4216093bba9499107725ef623063ba351f787 | Thu Jun 24 13:37:09 MDT 2021 |                             Updating dtype-next with vectorized code. | ["4a1a30871af42e490c628debb567738ae973ac98"] |
| be706e114cd13921a264f40b21ebf7c21b29c897 | Wed Apr 07 09:55:23 MDT 2021 | fix(print): check if value is not nil before running .toString (#226) | ["37aa95c704adbfdded0e73b6ed95c252003a6b6a"] |
> (map meta (ds/columns (ds/->dataset d)))
({:categorical? true, :name :sha, :datatype :string, :n-elems 5}
 {:categorical? true, :name :date, :datatype :object, :n-elems 5}
 {:categorical? true, :name :message, :datatype :string, :n-elems 5}
 {:categorical? true, :name :parents, :datatype :persistent-vector, :n-elems 5})
> (ds/write! (ds/->dataset d) "d.nippy")
nil
> (ds/->dataset "d.nippy")
Execution error at tech.v3.dataset.io.nippy/eval60200$fn$fn (nippy.clj:45).
Unthawed data is not a dataset: class clojure.lang.Symbol

Here's the stacktrace:

1. Unhandled java.lang.Exception
   Unthawed data is not a dataset: class clojure.lang.Symbol

                 nippy.clj:   45  tech.v3.dataset.io.nippy/eval60200/fn/fn
                    io.clj:   44  tech.v3.dataset.io/wrap-stream-fn
                    io.clj:   35  tech.v3.dataset.io/wrap-stream-fn
                 nippy.clj:   41  tech.v3.dataset.io.nippy/eval60200/fn
              MultiFn.java:  234  clojure.lang.MultiFn/invoke
                    io.clj:  243  tech.v3.dataset.io/->dataset
                    io.clj:  115  tech.v3.dataset.io/->dataset
                    io.clj:  254  tech.v3.dataset.io/->dataset
                    io.clj:  115  tech.v3.dataset.io/->dataset
               dataset.clj:  127  tech.v3.dataset/->dataset
               dataset.clj:   22  tech.v3.dataset/->dataset
                      REPL:28509  eval75009
                      REPL:28509  eval75009
             Compiler.java: 7176  clojure.lang.Compiler/eval
    interruptible_eval.clj:  106  nrepl.middleware.interruptible-eval/evaluator/run/fn
    interruptible_eval.clj:  101  nrepl.middleware.interruptible-eval/evaluator/run
               session.clj:  229  nrepl.middleware.session/session-exec/session-loop
        SessionThread.java:   21  nrepl.SessionThread/run
harold commented 4 days ago

Investigating this more, it appears to be nippy version / clojure version dependent.

Upgrading both of those in my project (nippy 3.4.2 and clojure 1.12.0) made this problem go away.

Closing this.