replikativ / datahike

A fast, immutable, distributed & compositional Datalog engine for everyone.
https://datahike.io
Eclipse Public License 1.0
1.62k stars 95 forks source link

[Bug]: datahike.migrate has a problem with schema/double (which cbor converts to float) #633

Open awb99 opened 1 year ago

awb99 commented 1 year ago

What version of Datahike are you using?

0.6

What version of Java are you using?

17

What operating system are you using?

guix

What database EDN configuration are you using?

irrelevant to this ticket

Describe the bug

I started my datahike database with version 0.4 with schema-on-write. I ONLY used double in the schema definition. Now I had to migrate the 0.4 edn dumps to import in 0.6 which uses cbor dumps. Previously I was exporting in 0.4 edn dumps which I would import on another machine without a problem. Now I have noted that when the edn dumps are read via the edn parser and imported into 0.6 that I had to make an adjustment to change all float values to double values. Interestingly, this was not a problem before.

What is the expected behaviour?

put into readme that :double has a different meaning in 0.4 in relationship to 0.6.

How can the behaviour be reproduced?

(defn float->double [v]
  (if (float? v)
    (double v)
    v))

(defn db-migrate []
  (warn "migrating cbor db..")
  (let [txs-old (load-cbor crbdb/conn "data/datahike-dump/eavt-dump")
        s (stats txs-old)
        max-eid (:max-eid s)
        max-tx (:max-tx s)
        tx-old-no-schema (remove datom-schema? txs-old)
        tx-old-safe (map #(-> %
                              (update :v float->double))
                         tx-old-no-schema)
        schema-with-eids (assoc-schema-ids max-eid schema)]
    (warn "dump stats: " s)
    (warn "transacting schema with eid above: " max-eid)
    (let [result-schema (crbdb/transact schema-with-eids)]
      (print-tx-stats result-schema))
    (let [result-import (api/transact crbdb/conn (vec tx-old-safe))]
      (warn "import dump result:")
      (print-tx-stats result-import))
    ;(assoc crbdb/conn :max-tx max-tx)
    (warn "db migration finished!")))
awb99 commented 1 year ago

After more testing, this is a bug in the cbor persistence layer. Just calling datahike.migrate/export-db and datahike.migrate.import-db results to import errors for doubles which cbor seems to store as float:

[datahike.db.transaction:45] - Bad entity value 0.0 at [:db/add 2199133 :lineitem/price 0.0 536871102] , value does not match schema definition. Must be conform to: double? {:error :transact/schema, :value 0.0, :attribute :lineitem/price, :schema #:db{:valueType :db.type/double, :cardinality :db.cardinality/one, :ident :lineitem/price}}

TimoKramer commented 1 year ago

Hey @awb99 , thanks for reporting this issue and great find! We value contributions and are happy to help with it. In case you find some time to fix this, please don't hesitate. We are all pretty busy lately and I really hope this issue can be resolved soon.