replikativ / datahike

A fast, immutable, distributed & compositional Datalog engine for everyone.
https://datahike.io
Eclipse Public License 1.0
1.62k stars 95 forks source link

[Bug]: delete-database throws NPE for :file backend #616

Open jjttjj opened 1 year ago

jjttjj commented 1 year ago

What version of Datahike are you using?

io.replikativ/datahike {:mvn/version "0.6.1539"}

What version of Java are you using?

17

What operating system are you using?

Ubuntu

What database EDN configuration are you using?

(def cfg {:store              {:backend :file
                                  :id "db"
                                  :path    "datahike-cfg"}
             :name               "db"
             :schema-flexibility :write
             :keep-history?      true
             :attribute-refs?    true})

Describe the bug

delete-database throws a null pointer exception. This seems to happen regardless of if the database was previously created or not, or transacted against and used.

{
 :cause nil
 :via
 [{:type clojure.lang.ExceptionInfo
   :message "java.lang.NullPointerException"
   :data {}
   :at [superv.async$throw_if_exception_ invokeStatic "async.cljc" 93]}
  {:type java.lang.NullPointerException
   :message nil
   :at [java.util.Objects requireNonNull "Objects.java" 208]}]
 :trace
 [[java.util.Objects requireNonNull "Objects.java" 208]
  [sun.nio.fs.UnixFileSystem getPath "UnixFileSystem.java" 263]
  [konserve.filestore$sync_base invokeStatic "filestore.clj" 34]
  [konserve.filestore$sync_base invoke "filestore.clj" 31]
  [konserve.filestore$delete_store invokeStatic "filestore.clj" 60]
  [konserve.filestore$delete_store invoke "filestore.clj" 51]
  [datahike.store$eval47429$fn__47431 invoke "store.cljc" 116]
  [clojure.lang.MultiFn invoke "MultiFn.java" 229]
  [datahike.writing$eval52657$fn__52666 invoke "writing.cljc" 207]
  [datahike.writing$eval52608$fn__52609$G__52597__52614 invoke "writing.cljc" 136]
  [datahike.writing$delete_database invokeStatic "writing.cljc" 223]
  [datahike.writing$delete_database invoke "writing.cljc" 217]
  [clojure.lang.AFn applyToHelper "AFn.java" 154]
  [clojure.lang.AFn applyTo "AFn.java" 144]
  [clojure.core$apply invokeStatic "core.clj" 667]
  [clojure.core$apply invoke "core.clj" 662]
  [datahike.writer$eval52982$fn__52983 doInvoke "writer.cljc" 99]
  [clojure.lang.RestFn invoke "RestFn.java" 408]
  [clojure.lang.MultiFn invoke "MultiFn.java" 229]
  [clojure.lang.AFn applyToHelper "AFn.java" 154]
  [clojure.lang.AFn applyTo "AFn.java" 144]
  [clojure.core$apply invokeStatic "core.clj" 667]
  [clojure.core$apply invoke "core.clj" 662]
  [datahike.api$delete_database invokeStatic "api.cljc" 130]
  [datahike.api$delete_database doInvoke "api.cljc" 129]
  [clojure.lang.RestFn invoke "RestFn.java" 408]
...

What is the expected behaviour?

The database should be deleted,

How can the behaviour be reproduced?


  (def cfg {:store              {:backend :file
                                  :id "db"
                                  :path    "datahike-cfg"}
             :name               "db"
             :schema-flexibility :write
             :keep-history?      true
             :attribute-refs?    true})

  (d/create-database cfg)
  (d/delete-database cfg) ;; NPE 
whilo commented 1 year ago

As discussed on Slack, I think it is best for us to require absolute paths in general, since we want to be able to address stores globally if possible. Currently we infer the hostname for the file store backend automatically, but we could also require to set it manually to fully identify the filesystem. That might be a bit less comfortable and still leaves room to use local hostnames that are not uniquely globally identified.

jsmassa commented 1 year ago

Sorry that this happened. There should be at least a proper exception message that describes the problem and some documentation.

whilo commented 1 year ago

We just need to check whether the user has provided an absolute path to the file store and otherwise throw an informative error message.

jsmassa commented 1 year ago

I think it would be nicer though to accept relative paths as this is what I would expect and apparently I am not alone with this. Btw, ./store-dir works already for deletion, but even if we leave it like this, we should check the path on store creation.

So I would say, we have following options:

1) Disallow creation of relative paths 2) Accept relative paths prefixed with ./ 3) Accept relative paths generally 4) Accept relative paths but throw a warning if a relative path is being used

None of the solutions above require much work, it is merely a design decision.

Which one would you prefer?

TimoKramer commented 1 year ago

I don't get why we need absolute paths to globally address stores. But I would expect to be able to use relative paths.

So I tend to option 3 because relative paths for java.io.File are not prefixed.

I wouldn't mind only using absolute paths with a proper error message though I currently don't see an easy way to know if a path is absolute or not. We would have to check if the first char of the path is either a slash or a letter and if it's a letter then it needs to be followed by a colon. And what if someone wants to use a network drive like a nas or something, is that possible?

But maybe you've got a better way to solve that depending on the implementation details of the filestore.

jsmassa commented 1 year ago

The filestore simply uses the Java classes File and Path, so there is actually a method to identify relative paths, but it is just as simple getting the absolute path from a relative path. The big question, which we talked about today, is where it would be best to perform the translation.

whilo commented 1 year ago

The problem is that the database stores the store configuration in its store and it identifies itself through the path for the filestore backend (see store-identity). Relative paths are not uniquely determined even on the same machine and hence can not be disambiguated unless the path is resolved first. This can create problems now if you try to connect to the relative path database from another relative or absolute path configuration. A simple solution is what @jsmassa described, to replace the relative path in the config before Datahike operates on it.

In general we should aim for having DB configurations that allow to connect from anywhere to the database if you have credentials, then we can store the config in Datahike itself and run recursive queries with Datalog over the databases.

whilo commented 1 year ago

The filestore simply uses the Java classes File and Path, so there is actually a method to identify relative paths, but it is just as simple getting the absolute path from a relative path. The big question, which we talked about today, is where it would be best to perform the translation.

I think it is best to resolve the paths when the config is loaded, e.g. in load-config. konserve can still support relative paths as is, but Datahike should internally operate on fully resolved configurations.