yahoo / mdbm

MDBM a very fast memory-mapped key/value store.
BSD 3-Clause "New" or "Revised" License
993 stars 111 forks source link

partition locking problem (N threads writing to the same DB) #41

Closed zed-0xff closed 4 years ago

zed-0xff commented 9 years ago

Hi there. I'm trying to make N threads write to the same DB, and I keep getting some strange errors. I'm using latest MDBM from github on Ubuntu 14 x86_64.

I use following code:

#include <string.h>
#include <sys/fcntl.h>
#include <mdbm/mdbm.h>
#include <stdlib.h>
#include <pthread.h>
#include <thread>
#include <iostream>
#include <vector>

struct mdbm_key_t {
    uint32_t k1;
};

pthread_mutex_t mutex = PTHREAD_MUTEX_INITIALIZER;

int nthreads    = 2;
int niterations = 100000;

void threadfunc(MDBM* db, int idx) {
    std::thread::id tid = std::this_thread::get_id();

    srand(time(NULL));

    // main processing here
    mdbm_key_t key;

    uint32_t data1, data2;
    datum d_key, d_val;

    d_key.dptr  = (char*)&key;
    d_key.dsize = sizeof(key);
    d_val.dptr  = (char*)&key;
    d_val.dsize = sizeof(key);

    for(int i=0; i<niterations; i++) {
        key.k1 = rand(); //%10000;
        data2++;

        //std::cout << "thread " << tid << " locking key " << key.k1 << " .." << std::endl;

        if( mdbm_plock(db, &d_key, 0) != 1 ){
            perror("mdbm_plock");
            exit( EXIT_FAILURE );
        }

        //std::cout << "thread " << tid << " storing key " << key.k1 << " .." << std::endl;

        if( mdbm_store(db,d_key,d_val,MDBM_INSERT) == -1 ) {
            perror("mdbm_store");
            exit(1);
        }

        //std::cout << "thread " << tid << " unlocking key " << key.k1 << " .." << std::endl;

        if( mdbm_punlock(db, &d_key, 0) != 1 ) {
            perror("mdbm_punlock");
            exit( EXIT_FAILURE );
        }
    }
    // end main processing

    std::cout << "thread " << tid << " done" << std::endl;
}

int main(int argc, char **argv)
{
    char fn[2048];
    char buf[2048];
    int ndups;
    std::vector <std::thread> threads;
    std::vector <MDBM*>       pdbms;

    if( argc == 3 ){
        nthreads = atoi(argv[1]);
        niterations = atoi(argv[2]);
    }

    std::cout << "nthreads = " << nthreads << ", niterations = " << niterations << std::endl;

    snprintf(fn,sizeof(fn),"%s","example.mdbm");
    MDBM* db = mdbm_open(fn,MDBM_O_RDWR|MDBM_O_CREAT|MDBM_SINGLE_ARCH|MDBM_PARTITIONED_LOCKS,0666,0,0);
    if (!db) {
        perror("Unable to create database");
        exit( EXIT_FAILURE );
    }

    pthread_mutex_lock(&mutex);
    for( int i=0; i<nthreads; i++ ){
        MDBM* clone = mdbm_dup_handle(db, 0);
        if( clone == NULL ){
            perror("mdbm_dup_handle");
            exit( EXIT_FAILURE );
        }
        pdbms.push_back( clone );
    }
    pthread_mutex_unlock(&mutex);

    for( int i=0; i<nthreads; i++ ){
        threads.emplace_back( threadfunc, pdbms[i], i );
        //threads.emplace_back( threadfunc, db );
    }

    for( auto& thread : threads )
        thread.join();

    std::cout << "all threads done" << std::endl;

    mdbm_close(db);
    return 0;

and in 9 out of 10 runs I get errors like: 1:

3:55708863:528fc:07df8 mdbm_lock.cc:201 /home/adfox/adfox/mdbm/example.mdbm: another partition locking conflict (locked=? want=1) excl_nest=0, part_nest=1, part_count=128   : Operation not permitted
Begin Lock state (for non-zero locks) (pid:32244, tid:32248 self:7ffff65b5700 uuid:32248 ):
  BaseLocks:1 Core:1 Shared/Partitioned:128 mode:MLOCK_INDEX (2)
End Lock state
mdbm_store: Operation not permitted

2: SEGFAULT in MDBM_PAGE_PTR (pagenum=268439516, db=0x60dee0)

Am I doing anything wrong? Or is it a bug? Thanks!

areese commented 9 years ago

There is a handle pool for making it easier to use handles from multiple threads safely: https://github.com/yahoo/mdbm/blob/c8f098b77fee8e644fd0d127ba7a511ab19e411e/include/mdbm_handle_pool.h

timrc-git commented 9 years ago

Hi Zed, I see at least a couple problems: a) you use MDBM_INSERT which randomly returns an error on duplicate numbers b) you exit on that store failure, which bypasses the punlock The exit also may interrupt the other thread mid-write.

In general, for C++, I would recommend using a locking class that unlocks automatically when it goes out of scope. But you should also just return from the affected thread, flagging or signalling the other threads to shut-down if needed. And if you want to be able to change existing keys (vs only inserting new), then you need to use MDBM_REPLACE instead of MDBM_INSERT.

Thanks! .timrc

zed-0xff commented 9 years ago

Thank you for your replies! I updated my example using MDBM_REPLACE and handle pool: https://gist.github.com/zed-0xff/8f1db57c4d5127f9f282 Nevertheless, results are the same :( Any suggestions?

zed-0xff commented 9 years ago

Guys, please help me, or tell me that multithread MDBM write is impossible. Thanks.

timrc-git commented 9 years ago

Hi Zed- It's definitely possible, we have lots of products using it. There's something funny going on in your example, and I'm still trying to understand the exact cause. I got rid of the C++11 features, and removed the compiler line for that, and the problem occurs less often, but still happens.

Because you're seeding random based on the time (1 sec resolution), both threads almost always go thru the same set of numbers. And because you don't presize, a lot of splits happen. When a split happens, the partitioned lock that's being held has to be upgraded to an exclusive lock. Because the two threads running the same set of numbers, they interfere a lot, and some deadlock avoidance code kicks in.

If you pre-size your MDBM, last argument to mdbm_open(), the problem goes away, lending credence to my theory. (It also runs much faster, because it's not doing all the incremental splits.)

So, as a temporary workaround, presize your MDBM.

Thanks! .timrc

zed-0xff commented 9 years ago

What db size should guarantee absence of this kind of collisions, even if I'll use 32 threads ?

ejc3 commented 9 years ago

You'll want a size large enough to hold all you values, so there is no need to ever resize. That makes the partitioned locking much simpler.

zed-0xff commented 9 years ago

But what if my DB grows constantly each day? May be I need to compare some value (e.g. number of pages) before and after the mdbm_plock call?

ejc3 commented 9 years ago

Can you set it the maximum value it ever will be?

zed-0xff commented 9 years ago

It might me several terabytes. But it also might grow after the year or two. btw, I've got a stacktrace from SEGFAULT-ed app:

#0  check_db_header (db=db@entry=0x25af8a0, h=0x7f4ed6882010, verbose=verbose@entry=1) at mdbm.c:550
550         if (h->h_magic != _MDBM_MAGIC_NEW2) {
(gdb) p h->h_magic
Cannot access memory at address 0x7f4ed6882010

(gdb) bt
#0  check_db_header (db=db@entry=0x25af8a0, h=0x7f4ed6882010, verbose=verbose@entry=1) at mdbm.c:550
#1  0x00007f4edaabc031 in mdbm_internal_remap (db=db@entry=0x25af8a0, dbsize=4419584, flags=flags@entry=0) at mdbm.c:1769
#2  0x00007f4edaabc3cc in resize_db (npages=1079, db=0x25af8a0) at mdbm.c:1826
#3  resize (db=db@entry=0x25af8a0, new_dirshift=10, new_dirshift@entry=0, new_num_pages=new_num_pages@entry=1079) at mdbm.c:2527
#4  0x00007f4edaabcb9f in alloc_chunk (db=db@entry=0x25af8a0, type=type@entry=1, npages=4, n0=n0@entry=0, n1=n1@entry=0,
    lock=lock@entry=1, map=1) at mdbm.c:1891
#5  0x00007f4edaac3182 in expand_page (pagenum=33, page=0x7f4ed6bf7000, db=0x25af8a0) at mdbm.c:2963
#6  mdbm_store_r (db=0x25af8a0, key=key@entry=0x7f4ed76b8d90, val=val@entry=0x7f4ed76b8d80, flags=1, iter=iter@entry=0x0) at mdbm.c:5089
#7  0x00007f4edaac567a in mdbm_store (db=<optimized out>, key=..., val=..., flags=<optimized out>) at mdbm.c:5278
zed-0xff commented 9 years ago

Could you introduce some special flag for mdbm_store, say, MDBM_DO_NOT_SPLIT, which will return -1 if db split is necessary, so the calling thread may obtain an exclusive db lock, split the database, release exclusive lock, and continue multithreaded insertion ?

timrc-git commented 9 years ago

Hi Zed-

Sorry for the delay in responding, I was out on vacation in the wilderness. The functionality you're looking for already exists ( mdbm_limit_size_v3): http://yahoo.github.io/mdbm/api/group__ConfigurationGroup.html#gafb7b88fcd40a3971136dc88ecfea0cab Note: you have to call it every time you open the MDBM.

Thanks! .timrc

zed-0xff commented 9 years ago

Thank you. But mdbm_limit_size_v3 alone does not help. mdbm_pre_split helps, but.. it can only be called once and only on empty database :(

ejc3 commented 9 years ago

The box your database is running on has a certain capacity, so if you set it the limits of the mdbm to the machine + the other software running on it, that should be able to avoid resizing? Also, if you don't have several terabytes of ram, some of the advantages of mdbm will go away as compared to other key / value stores.

zed-0xff commented 9 years ago

it has 128GB RAM. Should I limit_size + pre_split to 100GB db size ?

timrc-git commented 9 years ago

Hi Zed- As long as everything else on the box uses less than the remaining 28G, that should be fine. You want to make sure you're not swapping, or performance will suffer noticeably.