juji-io / datalevin

A simple, fast and versatile Datalog database
https://github.com/juji-io/datalevin
Eclipse Public License 1.0
1.07k stars 60 forks source link

Environment mapsize reached error after transacting #196

Closed Gnurdle closed 1 year ago

Gnurdle commented 1 year ago

utilizing the following snippet:

(defn find-bug [] (let [path (io/file "datalevin" "datoms") cn (d/create-conn (.toString path) {} {:kv-opts {:mapsize 4096}})] (d/update-schema cn {:buggy/key {:db/type :string :db.unique :db.identity}}) (doseq [ix (range 1000000)] (let [m {:buggy/key (format "%20d" ix) :buggy/val (format "bubba-%d" ix) :buggy/time (System/currentTimeMillis)}] (d/transact cn [m]))) (d/clear cn) (println "yippee")))

running this yields the following:

; Execution error (ExceptionInfo) at datalevin.binding.java.LMDB/clear_dbi (java.clj:356). ; Fail to clear DBI: "datalevin/eav" "Environment mapsize reached (-30792)"

huahaiy commented 1 year ago

This is a case where the map is enlarged more than once because the size of the data is more than two order of magnitude larger than the initial map size. Fix is incoming.

mhuebert commented 1 year ago

using v0.8.9 I still get this error. I can't reliably reproduce it; it sometimes happens when I clear the db (not kv) and re-populate, but not always. And after "mucking about" with various transitions it will start working again and transact all the data that failed the first time.

huahaiy commented 1 year ago

OK, this time clear should be fixed.

Gnurdle commented 1 year ago

hi, pulled current master and built. Ra`n into what I think may be a related issue, whereby if I transact a bunch of datoms, close the connection, later reopen, and attempt the d/clear, we get a similar upset:

(defn find-bug [] (let [path (io/file "datalevin" "datoms") get-cn (fn [] (d/create-conn (.toString path) {} {:kvopts {:mapsize 4096}})) cn (get-cn)] (d/update-schema cn {:buggy/key {:db/type :string :db.unique :db.identity}}) (println "adding a bunch of datoms...") (doseq [ix (range 1000000)] (let [m {:buggy/key (format "%20d" ix) :buggy/val (format "bubba-%d" ix) :buggy/time (System/currentTimeMillis)}] (d/transact cn [m]))) (d/close cn) (println "datoms added, closing connection") (let [cn2 (get-cn)] (println "trying to clear freshy re-opened cn") (d/clear cn2) (println "that worked")) (println "yippee")))

Thanks.

huahaiy commented 1 year ago

OK, 0.8.10 will have a fix. Turned out that we need to set a larger :mapsize when open a DB.

Gnurdle commented 1 year ago

currently running more annoying abuse tests (;->

a variation of the preceeding, adding less datoms.

ran this in the repl, and observed the file sizes:

{gni}paris:/d/gni/ace2/datalevin/datoms$ rm -rf *
{gni}paris:/d/gni/ace2/datalevin/datoms$ ls -lh && du -sh
total 472K
-rw-r--r-- 1 chopper chopper 1000M Apr  3 04:47 data.mdb
-rw-r--r-- 1 chopper chopper  8.0K Apr  3 04:47 lock.mdb
476K    .
{gni}paris:/d/gni/ace2/datalevin/datoms$ ls -lh && du -sh
total 472K
-rw-r--r-- 1 chopper chopper  98G Apr  3 04:48 data.mdb
-rw-r--r-- 1 chopper chopper 8.0K Apr  3 04:48 lock.mdb
476K    .
{gni}paris:/d/gni/ace2/datalevin/datoms$ ls -lh && du -sh
total 472K
-rw-r--r-- 1 chopper chopper 9.6T Apr  3 04:48 data.mdb
-rw-r--r-- 1 chopper chopper 8.0K Apr  3 04:48 lock.mdb

and after a few iterations, got this error ; Execution error (ExceptionInfo) at datalevin.binding.java/eval10189$fn (java.clj:746). ; Fail to open database: "Platform constant error code: EFBIG File too large (27)"

trace fragment: clj꞉ace2.store.datalevin꞉>  datalevin.binding.java/eval10189 (java.clj:746) clojure.lang.MultiFn/invoke (MultiFn.java:234) datalevin.storage/open (storage.cljc:676) datalevin.storage/open (storage.cljc:660) datalevin.db/open-store (db.cljc:385) datalevin.db/open-store (db.cljc:381) datalevin.db/empty-db (db.cljc:409) datalevin.db/empty-db (db.cljc:402) datalevin.core/create-conn (core.cljc:608) datalevin.core/create-conn (core.cljc:567) ace2.store.datalevin/find-bug (form-init7619755554288279791.clj:268)

code snippet (note the reduction of mapsize (which didn't change anything as to outcome) and that we are now doing 1K datoms rather than 1M:

(defn find-bug [] (dotimes [lap 1] (println "lap: " lap) (let [path (io/file "datalevin" "datoms") get-cn (fn [] (d/create-conn (.toString path) {} {:kvopts {:mapsize 1024}})) cn (get-cn)] (d/update-schema cn {:buggy/key {:db/type :string :db.unique :db.identity}}) (println "adding a bunch of datoms...") (doseq [ix (range 1000)] (let [m {:buggy/key (format "%20d" ix) :buggy/val (format "bubba-%d" ix) :buggy/time (System/currentTimeMillis)}] (d/transact cn [m]))) (d/close cn) (println "datoms added, closing connection") (let [cn2 (get-cn)] (println "trying to clear freshy re-opened cn") (d/clear cn2) (println "that worked")) (println "yippee"))))

Thanks. C

huahaiy commented 1 year ago

Oh, < should be <=. Will release a new version later today.

Thanks for testing!

huahaiy commented 1 year ago

0.8.12 should fix this.

Gnurdle commented 1 year ago

this looks good. I also ran a subsequent test where I inserted, the retracted (in random order) 100K datoms, and it looks to be stable from the storage point of view. Seems to completely recycle storage, as I'd expect.

mhuebert commented 1 year ago

using 0.8.12 I still got this error when re-populating my db.

In this case I was able to simply re-run the same transactions again and it worked (I am transacting the entities one at a time, so some of them would already have been in the db the 2nd time)

(doseq [e entities]
   (try (db/transact! [e])
        (catch Exception e!
          (prn e)
          (throw e!))))
Execution error (ExceptionInfo) at datalevin.binding.java.LMDB/close_transact_kv (java.clj:460).
Fail to commit read/write transaction in LMDB: "Environment mapsize reached (-30792)"
huahaiy commented 1 year ago

I would appreciate a reproducible test case.

huahaiy commented 1 year ago

Setting a large :mapsize should alleviate some of the problems, as you likely know how big your data set is. The default size is 100 MiB, and we only automatically enlarge 4 times. So if you have a huge data set, you may have gone beyond that.

huahaiy commented 1 year ago

OK, I can reproduce this error. Will address this.

huahaiy commented 1 year ago

The cases that I can reproduce are fixed in 0.8.14 for JVM.

Graal native will segment fault for these cases. Since native command line is not used for long running processes, it should be rare to encounter these situations. We will get around to this in the future.