Did I use 'Each' correctly?

roblatham00 commented 6 years ago

Wanted to get famliar with pmemkv api so wrote a writer and reader pair. The writer reads words out of a dictionary (like /usr/share/dict/words) and stores them in pmemkv. then the reader tries to read them back. Except my reader isn't reporting anything:

Writer:

#include <libpmemkv.h>

using namespace pmemkv;

#define MAX_WORD 256

int main(int argc, char **argv)
{
    FILE *dict;
    char buffer[MAX_WORD];
    int32_t buflen;

    KVEngine *kv = KVEngine::Open("kvtree2", argv[1], PMEMOBJ_MIN_POOL);

    dict = fopen(argv[2], "r");

    while (fgets(buffer, MAX_WORD, dict) != NULL) {
        buflen = strlen(buffer);
        kv->Put(buffer, "");
    }
    delete kv;

    return 0;
}

reader:


#include <iostream>
#include <libpmemkv.h>

using namespace pmemkv;

int main(int argc, char **argv)
{
    KVEngine  *kv = KVEngine::Open("kvtree2", argv[1], PMEMOBJ_MIN_POOL);

    auto cb = [](void * context, int32_t keybytes, int32_t valuebytes, const char *key, const char * value) {
        std::cout << "HI" << std::endl;
        std::cout<< key << " " << value << std::endl;
    };
    kv->Each(cb);
}

robfromboulder commented 6 years ago

Hi @roblatham00, please try the btree engine in place of kvtree2, and I bet this will work.

the kvtree2 engine doesn't implement Each (as you've seen), but supports arbitrary large values
the btree engine implements Each -- but only supports small values (<500 bytes) by default, however you can recompile with a larger limit
we're working on a kvtree3 engine that supports Each and large values, and this will take over as our new default engine

Please post back if you need any more help getting this running, happy to help!

Thanks, RobD

roblatham00 commented 6 years ago

Switched to btree but now I get a bad_allocation exception from make_persistent_atomic when storing the 64th word. I am only storing an empty string along with the keys so I don't think i'm hitting any 500 byte limit.

or did you mean btree only supports 500 bytes worth of anything?

robfromboulder commented 6 years ago

Hmm, please make sure you delete your persistent pool that's already around. I suspect this is due to opening the kvtree2 binary format using the btree engine, which is not something we actively prevent at this time.

With btree the values are limited to 500 bytes each, so you should be fine there.

roblatham00 commented 6 years ago

File is definitely zapped. I am faking this with a plain old file on a file system, so perhaps that is complicating things.

terminate called after throwing an instance of 'std::bad_alloc'
  what():  std::bad_alloc
zsh: abort (core dumped)  ./writer kv.db words

robfromboulder commented 6 years ago

Oh duh, it's the pool filling up. Increase PMEMOBJ_MIN_POOL to a larger value. :smile:

roblatham00 commented 6 years ago

#define PMEMOBJ_MIN_POOL ((size_t)(1024 * 1024 * 8)) /* 8 MiB */

Just so I understand clearly: each key/value pair in pmemkv's btree engine consumes 128 KiB ?

roblatham00 commented 6 years ago

I doubled the pool size and put values in a loop until breaking out once std::bad_alloc is caught. Reader reads back 100 keys now.

thanks for the help.

robfromboulder commented 6 years ago

I don't think btree will be very space-efficient for this case, it's not variable-length so it's always going to write out a full entry even for an empty value. That said, I agree with you that space usage looks rather high here for the number of keys. I'm curious to dig into this a bit more and understand this better.

Anyway, great to hear that things are working, and thanks for trying out the API!

robfromboulder commented 6 years ago

Oh, I also wanted to mention -- using plain old files is totally ok for prototyping, but a few tips if you're going to do any benchmarking with larger pools:

put your pool on /dev/shm to bypass the kernel page cache
set PMEM_IS_PMEM_FORCE=1 in your environment so PMDK treats your file like persistent memory
if you don't care about power-fail safety, additionally set PMEM_NO_FLUSH=1 to speed up writes by skipping all the flushes that are normally done for strict consistency

Thanks!

robfromboulder commented 6 years ago

Hey @roblatham00, closing out this issue with a few follow-up notes:

Our new kvtree3 engine supports the Each operation, so you don't have to use the more experimental btree engine for this kind of test.
I logged a defect against the btree engine for the core dump that you saw during this testing when the engine ran out of space (#126)
I confirmed that the storage efficiency of the btree engine is pretty bad in its current form, but even the kvtree engine is showing low storage efficiency for very small values as you were testing in this case. Storage efficiency also drops as the size of the pool is reduced. It's fair to admit that much of our testing centers on larger pools (>1GB) and larger values (800 bytes is typical for our benchmarks). I'm curious if we can add an optimization to improve storage efficiency for small values. (#127)

pmem / pmemkv

Did I use 'Each' correctly? #123