Closed rhettinger closed 8 months ago
See also #48033
IIRC, I've got a fairly up-to-date branch for this, based on your previous efforts, Raymond.
@rhettinger, would you like to continue the discussion here or on #48033? There is no use in keeping two identical issues open; let's close one of them.
Here are my 50 cents about this.
dbm.dumb
is not slower than SqliteDict
dbm.dumb
is not fail safe, so try to avoid using it until it is fixeddbm.dumb
is not so slow. According to my measurements the existing SqliteDict can be as fast ast dbm.dumb
, but only with some tweaks (using WAL mode). The measurements with code are in this pull request: https://github.com/Dobatymo/lmdb-python-dbm/pull/2shelve
module to allow to use json encoding also, and not only have the option to set pickle protocol version.The biggest problem with dbm.dumb
(at least for me) is that it is not fail safe, see the example below. Moreover, it is not mentioned in the documentation. If you kill the Python process, the database will be corrupted every time. (This is because the key file is updated only on adding and deleting keys, and on close.)
import dbm.dumb
import shelve
import itertools
database = dbm.dumb.open("test", "c")
shelf = shelve.Shelf(database)
if "key" in shelf:
print("The stored value is: " + shelf["key"])
input("Press ENTER to continue")
del shelf["key"]
try:
for index in itertools.count():
print(f"Persist value {index} to database")
shelf["key"] = f"value {index} " + ("x" * index)
except:
print("Exiting gracefully. Try killing the Python process.")
Sorry. in Windows I do not have dbm.gnu. Copied from https://github.com/Dobatymo/lmdb-python-dbm/pull/2#issuecomment-1345534594 The numbers are seconds required to do the mentioned amount of work.
items | lmdbm | vedis | unqlite | rocksdict | dbm.gnu | semidbm | pysos | dbm.dumb | sqlite-wal | sqlite-autocommit | sqlite-batch | dummypickle | dummyjson |
---|---|---|---|---|---|---|---|---|---|---|---|---|---|
10 | 0.015 | 0.000 | 0.000 | 0.031 | - | 0 | 0 | 0 | 0.015 | 0.015 | 0 | 0.015 | 0 |
100 | 0.141 | 0.000 | 0.000 | 0.047 | - | 0 | 0 | 0.031 | 0.031 | 0.109 | 0.094 | 0.156 | 0.172 |
1000 | 1.609 | 0.000 | 0.016 | 0.062 | - | 0.016 | 0.031 | 0.328 | 0.328 | 1.015 | 0.937 | 1.781 | 4.250 |
10000 | 15.359 | 0.062 | 0.062 | 0.172 | - | 0.140 | 0.250 | 3.219 | 3.125 | - 17.922 | 19.453 | 95.188 | - |
100000 | - | 0.719 | 0.672 | 1.390 | - | 1.562 | 2.687 | - | 31.172 | - | - | - | - |
1000000 | - | 9.234 | 9.140 | 16.594 | - | 17.484 | 27.390 | - | - | - | - | - | - |
items | lmdbm | vedis | unqlite | rocksdict | dbm.gnu | semidbm | pysos | dbm.dumb | sqlite-wal | sqlite-autocommit | sqlite-batch | dummypickle | dummyjson |
---|---|---|---|---|---|---|---|---|---|---|---|---|---|
10 | 0.000 | 0.000 | 0.000 | - | - | - | - | 0 | 0.016 | 0 | 0 | 0 | 0 |
100 | 0.000 | 0.000 | 0.000 | - | - | - | - | 0.032 | 0.031 | 0.078 | 0 | 0 | 0 |
1000 | 0.000 | 0.015 | 0.000 | - | - | - | - | 0.312 | 0.109 | 0.719 | 0.109 | 0 | 0 |
10000 | 0.031 | 0.047 | 0.046 | - | - | - | - | 3.140 | 1.046 | - 16.156 | 1.047 | 0 | 0.047 |
100000 | 0.422 | 0.609 | 0.594 | - | - | - | - | - | 10.875 | - | 10.985 | 0.375 | 2.859 |
1000000 | 4.188 | 8.500 | 8.406 | - | - | - | - | - | - | - | - | 41.760 | - |
items | lmdbm | vedis | unqlite | rocksdict | dbm.gnu | semidbm | pysos | dbm.dumb | sqlite-wal | sqlite-autocommit | sqlite-batch | dummypickle | dummyjson |
---|---|---|---|---|---|---|---|---|---|---|---|---|---|
10 | 0.000 | 0.000 | 0.000 | 0.031 | - | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 |
100 | 0.000 | 0.000 | 0.000 | 0.031 | - | 0 | 0 | 0 | 0.031 | 0.016 | 0.031 | 0 | 0 |
1000 | 0.000 | 0.000 | 0.000 | 0.046 | - | 0.015 | 0 | 0.078 | 0.218 | 0.297 | 0.282 | 0 | 0 |
10000 | 0.047 | 0.047 | 0.047 | 0.093 | - | 0.109 | 0.141 | 0.765 | 2.079 | 2.890 | 3.013 | 0.031 | 0.015 |
100000 | 0.610 | 0.641 | 0.640 | 0.735 | - | 1.422 | 1.797 | - | 23.797 | - | 29.375 | 0.187 | 0.218 |
1000000 | 6.297 | 8.609 | 8.765 | 20.391 | - | 15.891 | 20.485 | - | - | - | - | 2.141 | - |
items | lmdbm | vedis | unqlite | rocksdict | dbm.gnu | semidbm | pysos | dbm.dumb | sqlite-wal | sqlite-autocommit | sqlite-batch | dummypickle | dummyjson |
---|---|---|---|---|---|---|---|---|---|---|---|---|---|
10 | 0.156 | 0.000 | 0.015 | 0.047 | - | 0 | 0 | 0.078 | 0.235 | 0.390 | 0.406 | 0.156 | 0.156 |
100 | 0.156 | 0.015 | 0.000 | 0.047 | - | 0.016 | 0 | 0.078 | 0.235 | 0.390 | 0.390 | 0.156 | 0.203 |
1000 | 0.156 | 0.000 | 0.000 | 0.047 | - | 0.015 | 0.015 | 0.094 | 0.235 | 0.406 | 0.390 | 0.188 | 0.672 |
10000 | 0.156 | 0.015 | 0.015 | 0.047 | - | 0.015 | 0.031 | 0.203 | 0.234 | 0.469 | 0.406 | 0.594 | 5.141 |
100000 | 0.171 | 0.000 | 0.000 | 0.047 | - | 0.140 | 0.250 | - | 0.266 | - | 0.484 | 5.547 | 49.750 |
1000000 | 0.172 | 0.015 | 0.016 | 0.047 | - | 1.297 | 2.407 | - | - | - | - | 90.718 | - |
@rhettinger, would you like to continue the discussion here
If you want to run with this, do whatever makes sense for you. The other issue is mostly stale and likely needs a fresh start. A new first draft is likely just an afternoon project. If you don't want to run with it, let me know and I'll work on it.
I don't know that there needs to be much discussion because we have collections.abc.MutableMapping
to structure the implementation and because the APIs for shelve and dbm are already established. This is a clearly desirable buildout that addresses the known issues with dumbdbm
. With Sqlite3, we get down to a single file in a standard format, we get ACID guarantees, concurrency and compaction, and we get a dbm that will be present in every build.
If you're interested, here's some code that can be used as a starting point:
import sqlite3
from contextlib import suppress
from collections.abc import MutableMapping
BUILD_TABLE = "CREATE TABLE Dict (key TEXT NOT NULL, value BLOB NOT NULL, PRIMARY KEY (key));"
GET_SIZE = "SELECT COUNT (*) FROM Dict;"
LOOKUP_KEY = "SELECT value FROM Dict WHERE key = ?;"
STORE_KV = "REPLACE INTO Dict (key, value) VALUES (?, ?);"
DELETE_KEY = "DELETE FROM Dict WHERE key = ?;"
ITER_KEYS = "SELECT key FROM Dict;"
class SQLiteDict(MutableMapping):
def __init__(self, dbname=':memory:'):
self.conn = sqlite3.connect(dbname, isolation_level=None)
self.conn.execute('PRAGMA journal_mode = wal')
with suppress(sqlite3.OperationalError):
self.conn.execute(BUILD_TABLE)
def __len__(self):
return self.conn.execute(GET_SIZE).fetchone()[0]
def __getitem__(self, key):
value_tuple = self.conn.execute(LOOKUP_KEY, (key,)).fetchone()
if value_tuple is None:
raise KeyError(key)
return value_tuple[0]
def __setitem__(self, key, value):
self.conn.execute(STORE_KV, (key, value));
self.conn.commit()
def __delitem__(self, key):
if key not in self:
raise KeyError(key)
self.conn.execute(DELETE_KEY, (key,));
self.conn.commit()
def __iter__(self):
curs = self.conn.execute(ITER_KEYS)
while (value_tuple := curs.fetchone()):
yield value_tuple[0]
curs.close()
def close(self):
self.conn.close()
self.conn = None
def __enter__(self):
return self
def __exit__(self, *args):
self.close()
Other thoughts:
dbm
API. That would nicely decouple the core dictionary logic from the needs of dbm
.If you want to run with this, do whatever makes sense for you. The other issue is mostly stale and likely needs a fresh start.
I prefer closing the old issue; we definitely need a fresh start here. I'll take care of it.
A new first draft is likely just an afternoon project. If you don't want to run with it, let me know and I'll work on it.
I'll give it a shot; it'll be a nice chance for me to get to know the shelve and dbm modules better.
If you're interested, here's some code that can be used as a starting point:
Thanks!
Finally got back to this; I added @rhettinger's patch, amended it so the generic test_dbm
test suite passed, and created PR #114481. Go nuts.
For Let's keep this simple and remove the file (current behaviour of #114481); the added complexity is not worth it.flag='n'
, if there is an existing database file, should we remove the existing database file (current behaviour) and create a new one, or should we try to re-use the existing database file and instead remove only the existing Dict
table if there is one ("DROP TABLE Dict" then "CREATE TABLE ..."). The docs says to "always create a new database", but that may be understood as "always create a new [dbm] database", not necessarily "always create a new database [file]".
This seems to have broken the beaker library: https://github.com/bbangert/beaker/issues/242 . Its test suite, which passes fine on Python 3.12, fails on Python 3.13 in dbm-based tests, with errors about sqlite3 thread violations. Isn't it a bit bad if code using dbm which worked fine with the previous default backend fails with the new one?
of course, I guess it's possible beaker is just not handling threads correctly, but the other backends weren't smart enough to notice, and beaker's tests aren't good enough to catch it either. I'm not sure which is the case.
Right now we support
ndbm
andgnu.dbm
which might or might not be part of a given build. The fallback is the super slowdumbdbm
. Not the sqlite3 is part of the standard build, we can do better.The module docstring says:
The future is now. Let's provide a fast, stable, robust, always available alternative.
This can be done will pure python calls to the existing
sqlite3
module, or there can be a C extension that calls the Sqlite3 C API directly.This would automatically be available to the
shelve
module, giving us a high quality, persistent key-value store.Linked PRs