Export needs sorting schema first to be importable

replikativ / datahike

A fast, immutable, distributed & compositional Datalog engine for everyone.

https://datahike.io

Eclipse Public License 1.0

1.63k stars 97 forks source link

Export needs sorting schema first to be importable #262

Open markusalbertgraf opened 3 years ago

markusalbertgraf commented 3 years ago

Hi, when schema is added later in the life of a db, exported data can not be imported again without sorting schema to the top of the file first. There is a script at https://markusgraf.net/2020-12-03-Datahike-export-schema-sort.html that does this as a short term solution along with some thoughts on what I think is going on.

markusalbertgraf commented 3 years ago

I think Datomic solves this with partitions. Partitions are:

:db.part/db
:db.part/tx
:db.part/user 1) db/id are Longs 2) schema entries start at db/id 0 3) tx entries starting at db/id hex 0c 00 00 00 03 f0 4) data entries start at db/id hex 10 00 00 00 03 eb

In a new database there have already been a few transactions at creation time to preload schema. Hex 3e8 equals decimal 1000. So there seems to be an offset where tx and data counters start to reserve space for preloading and "hardcoding" of Schema.

mainej commented 3 years ago

I'll second this. After exporting my db, I can't import it. The sequence of steps to reproduce the issue:

Transact schema for an attribute
Transact an entity with that attribute
Transact 1000 more datoms
Transact schema for a second attribute
Transact a value for the second attribute onto the entity created in step 2
Export
Import

This fails because the export happens in :eavt order. The entity has a lower :db/id than the attribute, so the data for the entity, including the values for both the first and second attribute, is exported first. Subsequently, the schema definition for the second attribute is exported. Then the import (which transacts in batches of 1000) fails because the attribute is used before it is defined.

mainej commented 3 years ago

FWIW, here's some import code which seems to work with the format created by datahike.migrate/export-db, though it is slow:

(let [conn (make-conn)]
  (->> (io/reader "/tmp/eavt-dump")
       line-seq
       (map read-string)
       ;; The existence of the extra Datoms to hold :db/txInstant throws of the
       ;; ordering of the entity creation, and causes errors
       (remove (comp #{:db/txInstant} :a))
       (sort-by :tx)
       (partition-by :tx)
       ;; Re-create database, one transaction at a time. This is slow, but
       ;; safe, because attributes are created before they are used, and
       ;; entities are created before they are referred to.
       (map #(d/transact conn %1))
       doall))