wbolster / plyvel

Plyvel, a fast and feature-rich Python interface to LevelDB
https://plyvel.readthedocs.io/
Other
530 stars 75 forks source link

Request: __contains__ and __getitem__ interfaces #26

Closed bjmgeek closed 4 months ago

bjmgeek commented 10 years ago

Assuming db is instance of plyvel.DBInterface, it would be useful to do something like:

db['key']

or

if 'key' in db

wbolster commented 10 years ago

I'm not really sure what that would add over the current interface. Yes, a LevelDB database is a mapping, but it's not the same as a dictionary.

The key in db encourages constructs like if key in db: do_something(db[key]), which is actually bad practice: LevelDB doesn't provide a way to check for a key's existence other than retrieving the value, so this does two lookups whereas one would have sufficed.

If you really want such an interface, you can easily use something like this (untested):

class PlyvelDictLike(object):
    def __init__(self, db):
        self.db = db

    def __getitem__(self, key):
        value = self.db.get(key)
        if value is not None:
            return value
        raise KeyError("no such key: %r" % key)

    def __setitem__(self, key, value):
        self.db.put(key, value)

    def __delitem__(self, key):
        self.db.delete(key)
bjmgeek commented 10 years ago

Thanks. I didn't realize that about checking for a key's existence. In my particular case, I'm trying to come up with a fast on-disk equivalent of python's set type, so I am only interested in checking existence.

On Sun, Dec 8, 2013 at 8:18 AM, Wouter Bolsterlee notifications@github.comwrote:

I'm not really sure what that would add over the current interface. Yes, a LevelDB database is a mapping, but it's not the same as a dictionary.

The key in db encourages constructs like if key in db: do_something(db[key]), which is actually bad practice: LevelDB doesn't provide a way to check for a key's existence other than retrieving the value, so this does two lookups whereas one would have sufficed.

If you really want such an interface, you can easily use something like this (untested):

class PlyvelDict(object): def init(self, db): self.db = db

def __getitem__(self, key):
    value = self.db.get(key)
    if value is not None:
        return value
    raise KeyError("no such key: %r" % key)

def __setitem__(self, key, value):
    self.db.put(key, value)

def __delitem__(self, key):
    self.db.delete(key)

— Reply to this email directly or view it on GitHubhttps://github.com/wbolster/plyvel/issues/26#issuecomment-30081357 .

wbolster commented 10 years ago

In that case you can try something like this (again, untested):

from collections.abc import MutableSet

class PlyvelSet(MutableSet):
    def __init__(self, db):
        self.db = db

    def __contains__(self, key):
        return self.db.get(key) is not None

    def __iter__(self):
        return self.db.iterator(include_value=False)

    def __len__(self):
        raise NotImplementedError(
            "Cannot efficiently determine size of LevelDB databases")

    def add(self, item):
        self.db.put(item, b"")

    def discard(self, item):
        self.db.delete(item)
wiegerw commented 4 months ago

I just lost an hour because I did not expect that the syntax key in db doesn't work the same as with dictionaries. To me this was so unexpected, that it was the last thing that I looked at when debugging my code. The argument that this check promotes bad practice (if key in db: do_something(db[key])) is debatable: this equally applies to dictionaries.

wbolster commented 4 months ago

ouch, at first i thought that your comment didn't make any sense, since thing in db is not supposed to work at all since plyvel does not offer a dict-like interface…

… until it occurred to me that ‘dict-like’ involves more than __contains__, __getitem__, __setitem__, and __delitem__, etc.

notably, the ‘in’ operator has fallback behaviour for objects that do not specify __contains__, see the section about the in operator in the language reference:

the in operator accidentally triggers the __iter__ behaviour!

this means that thing in db actually iterates over the db, which yields (key, value) pairs, none of which will match thing. for comparison, (key, value) in db actually works… accidentally! for non-matches such as an innocent looking key lookup the whole database is iterated (extremely slow for larger databases), and will always return False in the end.

not good 😨

that said, i stand by my 10+ year old opinion that plyvel.DB (and iterators, and prefixed DB, etc.) should not offer __contains__ and __getitem__ to suggest a dict-like interface. it's not just the performance anti-pattern, but also other expectations like .keys(), thing in db, iter(db) yielding just keys (not the case in plyvel!), etc.

my proposed solution is to explicitly break thing in db by adding a __contains__ method to each relevant class that will simply raise TypeError.

wdyt? @wiegerw

wiegerw commented 4 months ago

@wbolster I agree that it is best to explicitly forbid the syntax key in db, since the current behavior is quite unexpected.

Personally I would like to see an interface that is as close to dictionaries as possible. But I understand now that there are important differences between the two, and then it makes sense to be careful with this.