liuis / leveldb

Automatically exported from code.google.com/p/leveldb
BSD 3-Clause "New" or "Revised" License
0 stars 0 forks source link

Add a Get API which allows for caller-allocated result buffer storage ( e.g. on stack ) #135

Open GoogleCodeExporter opened 9 years ago

GoogleCodeExporter commented 9 years ago
The current API provides:
virtual Status Get(const ReadOptions& options,
                     const Slice& key, std::string* value) = 0;

A downside of this API is that it uses dynamic memory allocation in std::string

It would be nice to have an additional API call:
virtual Status Get(const ReadOptions& options,
                     const Slice& key, char* buf, size_t* bufsize ) = 0;

This would allow a stack-based buffer. It would require a new status code 
"buffer too small", the bufsize parameter is a pointer to receive the size of 
the data copied / required

Original issue reported on code.google.com by jvb...@gmail.com on 3 Jan 2013 at 5:35

GoogleCodeExporter commented 9 years ago
Yes, this is something I've found problematic when using LevelDb. Not only does 
it force a memory allocation in many cases, but it also makes it awkward to 
portably mutate the value which was read. For example, imagine a struct value:

struct Foo {
   int counter;
   int offset;
   int size;
};

...

std::string value;

db.Get(leveldb::ReadOptions(), key, &value);

// Does not work since we're not allowed to mutate the std::string buffer
Foo* fooPtr = reinterpret_cast<Foo*>(value.c_str()); 

// So we have to make a copy first, which is annoying and wasteful

Foo foo;
memcpy(&foo, value.data(), sizeof foo);

This use-case would be much simplified if it was possible to pass in our own 
buffer pointer instead.

Original comment by ste...@boberg.org on 6 Jan 2013 at 12:28

GoogleCodeExporter commented 9 years ago
The right way to do this would be to have a version of get that returns 
"locking" slice. Something like:

struct LockingSlice : private Slice, private {
    inline LockingSlice(Slice& aSlice, port::Mutex aMutex)
       : Slice(aSlice), lock(aMutex) {}
    inline operator Slice& { return static_cast<Slice>(*this); }

private:
    MutexLock lock;
};

Now you can have:

LockingSlice DB::Get(const ReadOptions& options, const Slice& key);

and released the lock on DB::mutex_ in its destructor.

The next best would be to have a templated version of DB::Get which worked with 
anything that implemented operator=(const Slice&).

However, I think the main advice is that if you are concerned about that 
overhead, the API provides a perfectly fine way for you to avoid it. You can 
use slices right now, just not using the Get method (because of the locking).

You just have to use an Iterator. With a bit of syntactic sugar you can make it 
pretty clean.

struct Foo {
    int counter;
    int offset;
    int size;

    using Slice = leveldb::Slice;

    Foo(int aCounter, int anOffset, int aSize)
        : counter(aCounter), offset(anOffset), size(aSize) {}

    Foo(const leveldb::Slice& aSlice) {
        assert (aSlice.size() == sizeof(Foo));
        memcpy(this, aSlice.data(), aSlice.size());
    }

    Foo(const std::string& aString) {
        assert (aString.size() == sizeof(Foo));
        memcpy(this, aString.data(), aString.size());
    }

    operator const leveldb::Slice() const {
        return leveldb::Slice(reinterpret_cast<const char*>(this), sizeof(Foo));
    }

    operator leveldb::Slice() {
        return Slice(reinterpret_cast<const char*>(this), sizeof(Foo));
    }

    operator std::string() const {
        return std::string(reinterpret_cast<const char*>(this), sizeof(Foo));
    }
};

Foo GetRecord(DB& db, const ReadOptions& options, const Slice& key) {
    std::auto_ptr<Iterator> iter(db.NewIterator(options));
    return GetRecord(*iter, key);
}

Foo GetRecord(Iterator& iter, const Slice& key) {
    iter->Seek(key);
    if (!iter)
        throw std::runtime_error("No record found");
    return iter.value();
}

Now if you do something like:

Foo foo = GetRecord<Foo>(*db, ReadOptions(), key);

You get a fairly clean syntax and exactly one copy from what's in the DB, and 
no locking issues to speak of.

I usually go one step further and make a templated DB wrapper (template 
<typename key_t, typename value_t> Db;) to make sure keys & values are of 
consistent types, at which point you can just say "Foo foo = 
db->GetRecord(ReadOptions(), someKeyType)", which is even more convenient.

Original comment by cbsm...@gmail.com on 30 Sep 2013 at 9:21