replikativ / konserve

A clojuresque key-value/document store protocol with core.async.
Eclipse Public License 1.0
298 stars 25 forks source link

custom serializers #109

Open awb99 opened 10 months ago

awb99 commented 10 months ago

Hi! I need to add a few more types so that konserve can store them. I would like to use incognito to write simple conversion routines.

I got this to work

(require '[tick.core :as t])
(incognito-writer
 {'java.time.LocalDateTime (fn [r] (str r))}
   (-> t/now t/date-time))
   ; returns a map with :tag :value

However I struggle to create a konserve store that writes LocalDateTimes.

This does not work:

(defrecord Bar [a b])
(def write-handlers {'notebook.encoding.Bar (fn [bar] bar)})
(def read-handlers (atom {'notebook.encoding.Bar map->Bar}))
(def write-handlers-incognito 
    {'java.time.LocalDateTime (fn [r] (str r))})

(def store (<!! (filestore/connect-fs-store 
            "/tmp/bongo64" 
            {:write-handlers  (atom (merge write-handlers 
                                     (incognito-write-handlers write-handlers-incognito)
                                     ))
             :read-handlers  read-handlers})))

Any ideas?

whilo commented 10 months ago

Incognito only works for record types. You need to use the low level serializers of the specific serialization library you are using unfortunately, e.g. Fressian. Note: Maybe it should support LocalDateTime directly.

awb99 commented 10 months ago

Thanks @wilo!

awb99 commented 10 months ago

The below code works. It extends fressian serialization for LocalDateTime and Instant. Perhaps it is useful to someone.

(ns crb.db.konserve-types
  (:require
   [clojure.core.async :as async :refer [<!! <!]]
   [konserve.filestore :as filestore]
   [konserve.serializers :refer [fressian-serializer]]
   [konserve.core :as k]
   [tick.core :as t]
   [cljc.java-time.instant :as ti]
   [cljc.java-time.local-date-time :as ldt]
   [cljc.java-time.zone-offset :refer [utc]])
  (:import [org.fressian.handlers WriteHandler ReadHandler]
           (java.util Date)))

;; epoch conversion

(defn datetime->epoch-second [dt]
  (ldt/to-epoch-second dt utc))

(defn epoch-second->datetime [es]
  (-> es (ldt/of-epoch-second 1 utc)))

;; read-write-handlers

(def custom-read-handler
  {"java.util.Date" (reify ReadHandler
                      (read [_ reader _tag _component-count]
                        (Date. ^long (.readObject reader))))
   "java.time.LocalDateTime" (reify ReadHandler
                               (read [_ reader _tag _component-count]
                                 (epoch-second->datetime ^long (.readObject reader))))
   "java.time.Instant" (reify ReadHandler
                         (read [_ reader _tag _component-count]
                           (ti/of-epoch-second (.readObject reader))))})

(def custom-write-handler
  {Date {"java.util.Date" (reify WriteHandler
                            (write [_ writer instant]
                              (.writeTag    writer "java.util.Date" 1)
                              (.writeObject writer (.getTime ^Date instant))))}
   java.time.LocalDateTime {"java.time.LocalDateTime" (reify WriteHandler
                                                        (write [_ writer instant]
                                                          (.writeTag    writer "java.time.LocalDateTime" 1)
                                                          (.writeObject writer (datetime->epoch-second instant))))}
   java.time.Instant {"java.time.Instant" (reify WriteHandler
                                            (write [_ writer instant]
                                              (.writeTag    writer "java.time.Instant" 1)
                                              (.writeObject writer (ti/get-epoch-second instant))))}})

(defn create-store [path]
  (<!! (filestore/connect-fs-store
        path
        :serializers
        {:FressianSerializer
         (fressian-serializer custom-read-handler
                              custom-write-handler)})))

(comment
  ;; find out the class of the types I am interested in
  (-> (Date.) class)
  ;; => java.util.Date
  (-> (t/now) class)
  ;; => java.time.Instant
  (-> (t/now) (t/date-time) (class))
  ;; => java.time.LocalDateTime  

  ;; find out how to convert the type to something that is serializable
  (.getTime (Date.))
  (-> (t/now) (ti/get-epoch-second))
    ;; => 1698418718
  (ti/of-epoch-second 1698418718)
  (-> (t/now) (t/date-time)
      ;class
      ;(datetime->epoch-second)
      datetime->epoch-second)

;; create store with custom serializers
  (def store (create-store "/tmp/willy1"))

  (<!! (k/assoc-in store [:demo] 15))
  (<!! (k/get-in store [:demo]))

  (<!! (k/assoc-in store [:inst] (t/now)))
  (<!! (k/get-in store [:inst]))

  (<!! (k/assoc-in store [:ldt] (-> (t/now) (t/date-time))))
  (<!! (k/get-in store [:ldt]))

  ;
  )
whilo commented 10 months ago

Cool! I don't know about Fressian itself, but maybe upstream is interested in covering this. As far as I know date support has expanded to these new types on the JVM and ideally Fressian could also cover them.

awb99 commented 10 months ago

I would like to add a few comments: I am using datahike and konserve for a production app to manage workflows in my company. Both datahike and konserve are stable. A big thanks to @whilo and all the others.

The reason for me diving into the serializer hell was because konserve was throwing errors when serializing localdate.

What made it difficult is that there is (almost) no readme for serializers. The best example is a unittest in konserve.

The default filestore needs the serialization/deserialization wrappers in an atom. But when I setup the serialization manually then there are no atoms. Why are parameters that get passed in to start konserve atoms?

I wanted to do a quick and dirty solution to serialize all date typed to string (str d). This is because tick.core can construct most of date types from a string. My idea was to use it so I can implement all types quickly and then later optimize.

Another confusing thing is default-serializers and incognito. In the konserve code there are two different tyoe of serialization params.. I guess one to add tyoes to incognito and one to add types that implement custom serialization functions.

awb99 commented 10 months ago

From konserve.serializers:

(defrecord FressianSerializer [custom-read-handlers custom-write-handlers]
  #?@(:cljs (INamed ;clojure.lang.Named
             (-name [_] "FressianSerializer")
             (-namespace [_] "konserve.serializers")))
  PStoreSerializer
  (-deserialize [_ read-handlers bytes]
    (let [handlers #?(:cljs (merge custom-read-handlers (incognito-read-handlers read-handlers))
                      :clj (-> (merge fress/clojure-read-handlers
                                      custom-read-handlers
                                      (incognito-read-handlers read-handlers))

There are read-handlers and custom-read-handlers. It is completly unclear what is the difference.

pkpkpk commented 10 months ago

I will add docs for this to my PR #109 #108 #93 #71

whilo commented 10 months ago

@awb99 Incognito was my best effort attempt to abstract serialization of custom Clojure defrecords for different serialization formats. I put the handlers into atoms so you could install new handlers in operation, a feature that might make sense in replikativ, but is not really that important. Unfortunately many custom JVM types are not defrecords and that is why the custom serializers exist to allow to directly install custom handlers for the specific serialization library you use (for us currently Fressian). I agree that this is unfortunately not well documented. @pkpkpk Thanks for looking into this!