scicloj / tablecloth

Dataset manipulation library built on the top of tech.ml.dataset
https://scicloj.github.io/tablecloth
MIT License
305 stars 27 forks source link

Json parsing error with tc/dataset :parser-fn option #157

Open jasalt opened 5 months ago

jasalt commented 5 months ago

The :parser-fn option works as expected when parsing Clojure data structure:

  (-> (tc/dataset [{"test" 1 "time-period" "2024-06-21"}
                   {"test" 2 "time-period" "2024-06-22"}
                   {"test" 3 "time-period" "2024-06-23"}]
                  {:key-fn keyword :parser-fn {:time-period :local-date}}))

  ;; | :test | :time-period |
  ;; |------:|--------------|
  ;; |     1 |   2024-06-21 |
  ;; |     2 |   2024-06-22 |
  ;; |     3 |   2024-06-23 |
  ;;                      ^- :local-date datatype

However it returns error if same data is read from .json:

  (spit "test.json" "[
    {\"test\": 1, \"time-period\": \"2024-06-20\"},
    {\"test\": 2, \"time-period\": \"2024-06-21\"},
    {\"test\": 3, \"time-period\": \"2024-06-22\"}]")

  (tc/dataset "test.json" {:key-fn keyword :parser-fn {:time-period :local-date}})

  ;; |   :$value |                                                             :$error |
  ;; |-----------|---------------------------------------------------------------------|
  ;; | test.json | Wrong number of args (0) passed to: clojure.lang.PersistentArrayMap |

Separate tc/convert-types call using same parser-fn params however works:

  (-> (tc/dataset "test.json" {:key-fn keyword})
      (tc/convert-types {:time-period :local-date}))

  ;; | :time-period | :test |
  ;; |--------------|------:|
  ;; |   2024-06-20 |     1 |
  ;; |   2024-06-21 |     2 |
  ;; |   2024-06-22 |     3 |
  ;;              ^- :local-date datatype

Using:

genmeblog commented 5 months ago

Thanks for the report. Can you tell me which JVM you are using and if there is a stacktrace, please attach it.

genmeblog commented 5 months ago

Looks like it's a tech.ml.dataset error:

(spit "test.json" "[
    {\"test\": 1, \"time-period\": \"2024-06-20\"},
    {\"test\": 2, \"time-period\": \"2024-06-21\"},
    {\"test\": 3, \"time-period\": \"2024-06-22\"}]")

  (tc/dataset "test.json" {:key-fn keyword :parser-fn {:time-period :local-date}})
clojure.lang.ArityException: Wrong number of args (0) passed to: clojure.lang.PersistentArrayMap
    at clojure.lang.AFn.throwArity(AFn.java:429)
    at clojure.lang.AFn.invoke(AFn.java:28)
    at charred.api$json_reader_fn$fn__47171.invoke(api.clj:508)
    at charred.api$read_json_supplier.invokeStatic(api.clj:606)
    at charred.api$read_json_supplier.doInvoke(api.clj:560)
    at clojure.lang.RestFn.invoke(RestFn.java:426)
    at charred.api$read_json.invokeStatic(api.clj:618)
    at charred.api$read_json.doInvoke(api.clj:614)
    at clojure.lang.RestFn.applyTo(RestFn.java:142)
    at clojure.core$apply.invokeStatic(core.clj:669)
    at clojure.core$apply.invoke(core.clj:662)
    at tech.v3.dataset.io$eval47389$fn__47390.invoke(io.clj:66)
    at clojure.lang.MultiFn.invoke(MultiFn.java:234)
    at tech.v3.dataset.io$__GT_dataset.invokeStatic(io.clj:239)
    at tech.v3.dataset.io$__GT_dataset.invoke(io.clj:111)
    at tech.v3.dataset$__GT_dataset.invokeStatic(dataset.clj:125)
    at tech.v3.dataset$__GT_dataset.invoke(dataset.clj:22)
    at tablecloth.api.dataset$dataset.invokeStatic(dataset.clj:91)
    at tablecloth.api.dataset$dataset.invoke(dataset.clj:52)
    at tablecloth.api$dataset.invokeStatic(api.clj:912)
    at tablecloth.api$dataset.invoke(api.clj:790)
jasalt commented 5 months ago

Yes, seemed like that could be the case. I stepped through the execution but didn't quite fully get how it comes together. New to Tablecloth and the overall TMD ecoystem.

$ clj -version
Clojure CLI version 1.11.1.1429

$ java -version
openjdk version "17.0.10" 2024-01-16
OpenJDK Runtime Environment (build 17.0.10+7-Debian-1deb12u1)
OpenJDK 64-Bit Server VM (build 17.0.10+7-Debian-1deb12u1, mixed mode, sharing)
genmeblog commented 5 months ago

No worries. Addressed the problem already.