wbolster / plyvel

Plyvel, a fast and feature-rich Python interface to LevelDB
https://plyvel.readthedocs.io/
Other
530 stars 75 forks source link

Weird behavior with RawIterator.seek #32

Closed k4nar closed 10 years ago

k4nar commented 10 years ago

Hi, I'm trying to implement a function checking a given key exists in the database without getting the value. With the native API it can be done by creating an iterator, seeking to the key and checking the return of valid(). However, I don't manage to do it with Plyvel, as I'm getting this strange behavior :

>>> import plyvel
>>> db = plyvel.DB('test', create_if_missing=True)
>>> db.put('foo', 'bar')
>>> list(db.iterator())
[('foo', 'bar')]
>>> it = db.raw_iterator()
>>> it.seek('foo')
>>> it.valid()
True
>>> it = db.raw_iterator()
>>> it.seek('toto')
>>> it.valid()
False
>>> it = db.raw_iterator()
>>> it.seek('aaa')
>>> it.valid()
True
>>> it.key()
'foo'
>>> it.value()
'bar'

If I seek to a key < to an existing one, it actually seeks to the first existing key. If the given key is > to an existing one, the behavior is ok.

I'm using Plyvel 0.8 with LevelDB 1.15.0. I didn't check if I get the same kind of result in C++, but I don't think it'll be the case.

wbolster commented 10 years ago

I don't think this works differently if your do it in C++. .seek() will seek to the key at or after the one specified:

  // Position at the first key in the source that at or past target
  // The iterator is Valid() after this call iff the source contains
  // an entry that comes at or past target.
  virtual void Seek(const Slice& target) = 0;

See https://code.google.com/p/leveldb/source/browse/include/leveldb/iterator.h

k4nar commented 10 years ago

Actually yes, you're right. The following code illustrate it :

#include <iostream>

#include "leveldb/db.h"

int main(int argc, char** argv)
{
  leveldb::DB* db;
  leveldb::Options options;
  options.create_if_missing = true;
  leveldb::Status status = leveldb::DB::Open(options, "./test", &db);
  assert(status.ok());

  db->Put(leveldb::WriteOptions(), std::string("foo"), std::string("bar"));

  std::string value;
  db->Get(leveldb::ReadOptions(), std::string("foo"), &value);

  leveldb::Iterator* it = db->NewIterator(leveldb::ReadOptions());
  it->Seek(std::string("abc"));
  std::cout << it->Valid() << std::endl;
  delete it;

  delete db;
}

However, do you know any tip to check if a key exists without getting the value ? I was following this one but I don't know how it could work due to the behavior of Seek.

wbolster commented 10 years ago

Well, db.get(key) is not None will tell you. That will load the value (and immediately discard it), but I don't think the extra allocation (for the value string) will have a noticeable performance impact, unless your values are huge.

k4nar commented 10 years ago

Yes, that's how I'm doing it at the moment. Well, thanks for your time !

wbolster commented 10 years ago

You're welcome. My guess is that db.get() will be slightly faster than creating an iterator and seeking, and that the value creation (python memory allocation and bytes object construction) is negligible for most real-world scenarios.

wbolster commented 10 years ago

See also https://github.com/wbolster/plyvel/issues/26#issuecomment-30081357 for an earlier comment of mine on why Plyvel doesn't have db.exists() (or db.__contains__() as in that issue).