juji-io / datalevin

A simple, fast and versatile Datalog database
https://github.com/juji-io/datalevin
Eclipse Public License 1.0
1.07k stars 60 forks source link

attributes with unthawable values won't retract or overwrite #208

Closed den1k closed 1 year ago

den1k commented 1 year ago

When an attribute value is unthawable :db/add or :db/retract on the same attribute cease to work. This ends up causing a tricky situation where the entire entity must be retracted using :db/retractEntity. It's fine behavior for the db to not serialize/deserialize or thaw/unthaw everything but to not allow overwriting and retraction of the attribute is a bug IMO.

Repro:


(defonce test-conn (d/create-conn))

(d/transact! test-conn [{:foo 1}])
; =>
{:datoms-transacted 1}

(d/touch (d/entity @test-conn 1))
; =>
{:foo 1, :db/id 1}

(d/transact! test-conn [{:db/id 1
                         :foo   (type [])}])
; =>
{:datoms-transacted 2}

(d/touch (d/entity @test-conn 1))
; =>
{:foo   #:nippy{:unthawable {:type       :serializable,
                             :cause      :quarantined,
                             :class-name "java.lang.Class",
                             :content    #datalevin/bytes"rO0ABXZyAB1jbG9qdXJlLmxhbmcuUGVyc2lzdGVudFZlY3RvcpJrsYGlVq0zAgAFSQADY250SQAFc2hpZnRMAAVfbWV0YXQAHUxjbG9qdXJlL2xhbmcvSVBlcnNpc3RlbnRNYXA7TAAEcm9vdHQAJExjbG9qdXJlL2xhbmcvUGVyc2lzdGVudFZlY3RvciROb2RlO1sABHRhaWx0ABNbTGphdmEvbGFuZy9PYmplY3Q7eHIAHmNsb2p1cmUubGFuZy5BUGVyc2lzdGVudFZlY3RvckDGjt5Zq+ubAgACSQAFX2hhc2hJAAdfaGFzaGVxeHA"}},
 :db/id 1}

;; retract attribute :foo
(d/transact! test-conn [[:db/retract 1 :foo]])
; =>
{:datoms-transacted 1}

;; :foo is not retracted!

(d/touch (d/entity @test-conn 1))
; =>
{:foo   #:nippy{:unthawable {:type       :serializable,
                             :cause      :quarantined,
                             :class-name "java.lang.Class",
                             :content    #datalevin/bytes"rO0ABXZyAB1jbG9qdXJlLmxhbmcuUGVyc2lzdGVudFZlY3RvcpJrsYGlVq0zAgAFSQADY250SQAFc2hpZnRMAAVfbWV0YXQAHUxjbG9qdXJlL2xhbmcvSVBlcnNpc3RlbnRNYXA7TAAEcm9vdHQAJExjbG9qdXJlL2xhbmcvUGVyc2lzdGVudFZlY3RvciROb2RlO1sABHRhaWx0ABNbTGphdmEvbGFuZy9PYmplY3Q7eHIAHmNsb2p1cmUubGFuZy5BUGVyc2lzdGVudFZlY3RvckDGjt5Zq+ubAgACSQAFX2hhc2hJAAdfaGFzaGVxeHA"}},
 :db/id 1}

;; overwriting also does not work

(d/transact! test-conn [[:db/add 1 :foo :bar]])
;=> 
{:datoms-transacted 2}

(d/touch (d/entity @test-conn 1))
;=>
{:foo   #:nippy{:unthawable {:type       :serializable,
                             :cause      :quarantined,
                             :class-name "java.lang.Class",
                             :content    #datalevin/bytes"rO0ABXZyAB1jbG9qdXJlLmxhbmcuUGVyc2lzdGVudFZlY3RvcpJrsYGlVq0zAgAFSQADY250SQAFc2hpZnRMAAVfbWV0YXQAHUxjbG9qdXJlL2xhbmcvSVBlcnNpc3RlbnRNYXA7TAAEcm9vdHQAJExjbG9qdXJlL2xhbmcvUGVyc2lzdGVudFZlY3RvciROb2RlO1sABHRhaWx0ABNbTGphdmEvbGFuZy9PYmplY3Q7eHIAHmNsb2p1cmUubGFuZy5BUGVyc2lzdGVudFZlY3RvckDGjt5Zq+ubAgACSQAFX2hhc2hJAAdfaGFzaGVxeHA"}},
 :db/id 1}
huahaiy commented 1 year ago

If you retract with the value, it works. (d/transact! test-conn [[:db/retract 1 :foo (type [])]])

The methods you tried are difficult to make them work, because the only way to get the exact form of the saved wrong bytes is to use the original data when retracting it.

Added a test to show this. https://github.com/juji-io/datalevin/blob/master/test/datalevin/test/transact.cljc#L800-L814

So this is a wontfix.

den1k commented 1 year ago

good to have a workaround!

the only way to get the exact form of the saved wrong bytes is to use the original data when retracting it

I’m not sure I’m understanding. Why does retraction for other values work providing EA and not V but not in this case?

Since V is queryable using EA and the attribute is not a ref but arbitrary EDN/bytes maybe V could be implicit in the retract to achieve consistent behavior?

huahaiy commented 1 year ago

It is possible, but it is difficult, as there are many issues to consider, such as multi-value, single-value, other data types, and so on. Since it is retractable with the value given, I don't see a need to change the behavior.

den1k commented 1 year ago

I see. (type []) was an example and is easily reproducible. But in a real application where arbitrary values which fail to serialize can not be retracted this remains an issue especially because the following does not work:

(d/transact! test-conn [[:db/retract 1 :foo (:foo (d/entity @test-conn 1))]])
; => {:datoms-transacted 0} ;; should be 1

Removing everything under the attribute also does not work:

(d/transact! test-conn [[:db.fn/retractAttribute 1 :foo]])

I guess the only way around this is to either retract the entire entity or to try to serialize and deserialize the value and not write it to the DB on fail.

huahaiy commented 1 year ago

This all stems from the asymmetry of nippy serialization, allow serialization of unknown but failed to deserialize. If we make it symmetric, i.e. don't allow serialization of unknown and throw exception instead, this problem disappears.

So we can change it such that un-thawable things would not be able to be transacted, would this be a better resolution?

den1k commented 1 year ago

Yes 100%. There's no point to transact values that can't be read (unthawed). Thanks @huahaiy

den1k commented 1 year ago

just saw the fix. It's not just classes that have this issue:

(d/transact! test-conn [{:foo (defn foo [] :foo)}])
(d/touch (d/entity @test-conn 2))
{:foo #:nippy{:unthawable {:type :serializable,
                           :cause :quarantined,
                           :class-name "clojure.lang.Var",
                           :content #datalevin/bytes"rO0ABXNyABtjbG9qdXJlLmxhbmcuVmFyJFNlcmlhbGl6ZWS2pCcBU/2jLQIAAkwABm5zTmFtZXQAFUxjbG9qdXJlL2xhbmcvU3ltYm9sO0wAA3N5bXEAfgABeHBzcgATY2xvanVyZS5sYW5nLlN5bWJvbBCHbBnxuSwjAgAESQAHX2hhc2hlcUwABV9tZXRhdAAdTGNsb2p1cmUvbGFuZy9JUGVyc2lzdGVudE1hcDtMAARuYW1ldAASTGphdmEvbGFuZy9TdHJpbmc7TAACbnNxAH4ABXhwUE0/xnB0AAt0ZXNzZXJhZS5kYnBzcQB+AAOtak+bcHQAA2Zvb3A"}},
 :db/id 2}

(d/transact! test-conn [[:db/retract 2 :foo]])
=> {:datoms-transacted 1}

(d/touch (d/entity @test-conn 2))
{:foo #:nippy{:unthawable {:type :serializable,
                           :cause :quarantined,
                           :class-name "clojure.lang.Var",
                           :content #datalevin/bytes"rO0ABXNyABtjbG9qdXJlLmxhbmcuVmFyJFNlcmlhbGl6ZWS2pCcBU/2jLQIAAkwABm5zTmFtZXQAFUxjbG9qdXJlL2xhbmcvU3ltYm9sO0wAA3N5bXEAfgABeHBzcgATY2xvanVyZS5sYW5nLlN5bWJvbBCHbBnxuSwjAgAESQAHX2hhc2hlcUwABV9tZXRhdAAdTGNsb2p1cmUvbGFuZy9JUGVyc2lzdGVudE1hcDtMAARuYW1ldAASTGphdmEvbGFuZy9TdHJpbmc7TAACbnNxAH4ABXhwUE0/xnB0AAt0ZXNzZXJhZS5kYnBzcQB+AAOtak+bcHQAA2Zvb3A"}},
 :db/id 2}

(d/transact! test-conn [[:db/retract 2 :foo (:foo (d/entity @test-conn 2))]])
(d/touch (d/entity @test-conn 2))
den1k commented 1 year ago

The issue might be the following in nippy. It freezes everything but thaws only some types.

taoensso.nippy/*freeze-serializable-allowlist*
=> #{"*"}
taoensso.nippy/*thaw-serializable-allowlist*
=>
#{"java.lang.NullPointerException"
  "java.time.YearMonth"
  "clojure.lang.ExceptionInfo"
  "java.time.ZoneOffset"
  "[I"
  "java.time.ZoneId"
  "clojure.lang.ArityException"
  "java.lang.ArithmeticException"
  "java.lang.Throwable"
  "java.net.URI"
  "java.lang.RuntimeException"
  "java.time.Year"
  "[B"
  "java.time.ZonedDateTime"
  "java.time.DateTimeException"
  "[D"
  "java.lang.Exception"
  "[S"
  "java.time.MonthDay"
  "java.lang.IndexOutOfBoundsException"
  "[J"
  "java.lang.IllegalArgumentException"
  "java.time.LocalTime"
  "java.time.OffsetTime"
  "java.time.LocalDate"
  "org.joda.time.DateTime"
  "java.util.UUID"
  "[F"
  "java.time.OffsetDateTime"
  "java.util.Date"
  "java.time.LocalDateTime"
  "[Z"
  "java.time.Clock"
  "[C"}

after running

(alter-var-root
  (var taoensso.nippy/*freeze-serializable-allowlist*)
  (constantly taoensso.nippy/*thaw-serializable-allowlist*))

Now this throws properly:

(d/transact! test-conn [{:bar (defn bar [] :bar)}])
Execution error (ExceptionInfo) at taoensso.nippy/throw-unfreezable (nippy.clj:1005).
Unfreezable type: class clojure.lang.Namespace

While other transactions work fine. I think de7a85d can be rolled back. I will use the above fix in all of my apps. Seems sensible to me to enforce this for the DB (only freeze what can be thawed) since the thawed values are useless and storing them leads to bugs in the DB api.

huahaiy commented 1 year ago

That's what https://github.com/juji-io/datalevin/commit/de7a85d2ea8a52f355b154efc19ba63f520ab6f5 is supposed to do. We bind it instead of alter-var-root, because we don't want to disrupt other people's use of nippy.

The current code actually does throw on {:bar (defn bar [] :bar)}.

java.lang.Class is a special case that we have to single it out. Others are fine.

den1k commented 1 year ago

ah okay. Great!

den1k commented 1 year ago

Seems like there would be a performance hit since the binding and the into run each time serialize is invoked.

den1k commented 1 year ago

there's also https://clojure.atlassian.net/browse/DJSON-32

huahaiy commented 1 year ago

I wouldn't worry about performance hit of dynamic binding for now, there are many other places that have much greater impact.

If one worry about performance, it is better to specify a typed data instead of using an Edn blob.

den1k commented 1 year ago

Got it. It's probably a non issue then. In my case I can't type the data because it's an arbitrary return value of code running in cells in https://github.com/lumberdev/tesserae