Closed zed-0xff closed 4 years ago
There is a handle pool for making it easier to use handles from multiple threads safely: https://github.com/yahoo/mdbm/blob/c8f098b77fee8e644fd0d127ba7a511ab19e411e/include/mdbm_handle_pool.h
Hi Zed, I see at least a couple problems: a) you use MDBM_INSERT which randomly returns an error on duplicate numbers b) you exit on that store failure, which bypasses the punlock The exit also may interrupt the other thread mid-write.
In general, for C++, I would recommend using a locking class that unlocks automatically when it goes out of scope. But you should also just return from the affected thread, flagging or signalling the other threads to shut-down if needed. And if you want to be able to change existing keys (vs only inserting new), then you need to use MDBM_REPLACE instead of MDBM_INSERT.
Thanks! .timrc
Thank you for your replies!
I updated my example using MDBM_REPLACE
and handle pool: https://gist.github.com/zed-0xff/8f1db57c4d5127f9f282
Nevertheless, results are the same :(
Any suggestions?
Guys, please help me, or tell me that multithread MDBM write is impossible. Thanks.
Hi Zed- It's definitely possible, we have lots of products using it. There's something funny going on in your example, and I'm still trying to understand the exact cause. I got rid of the C++11 features, and removed the compiler line for that, and the problem occurs less often, but still happens.
Because you're seeding random based on the time (1 sec resolution), both threads almost always go thru the same set of numbers. And because you don't presize, a lot of splits happen. When a split happens, the partitioned lock that's being held has to be upgraded to an exclusive lock. Because the two threads running the same set of numbers, they interfere a lot, and some deadlock avoidance code kicks in.
If you pre-size your MDBM, last argument to mdbm_open(), the problem goes away, lending credence to my theory. (It also runs much faster, because it's not doing all the incremental splits.)
So, as a temporary workaround, presize your MDBM.
Thanks! .timrc
What db size should guarantee absence of this kind of collisions, even if I'll use 32 threads ?
You'll want a size large enough to hold all you values, so there is no need to ever resize. That makes the partitioned locking much simpler.
But what if my DB grows constantly each day?
May be I need to compare some value (e.g. number of pages) before and after the mdbm_plock
call?
Can you set it the maximum value it ever will be?
It might me several terabytes. But it also might grow after the year or two. btw, I've got a stacktrace from SEGFAULT-ed app:
#0 check_db_header (db=db@entry=0x25af8a0, h=0x7f4ed6882010, verbose=verbose@entry=1) at mdbm.c:550
550 if (h->h_magic != _MDBM_MAGIC_NEW2) {
(gdb) p h->h_magic
Cannot access memory at address 0x7f4ed6882010
(gdb) bt
#0 check_db_header (db=db@entry=0x25af8a0, h=0x7f4ed6882010, verbose=verbose@entry=1) at mdbm.c:550
#1 0x00007f4edaabc031 in mdbm_internal_remap (db=db@entry=0x25af8a0, dbsize=4419584, flags=flags@entry=0) at mdbm.c:1769
#2 0x00007f4edaabc3cc in resize_db (npages=1079, db=0x25af8a0) at mdbm.c:1826
#3 resize (db=db@entry=0x25af8a0, new_dirshift=10, new_dirshift@entry=0, new_num_pages=new_num_pages@entry=1079) at mdbm.c:2527
#4 0x00007f4edaabcb9f in alloc_chunk (db=db@entry=0x25af8a0, type=type@entry=1, npages=4, n0=n0@entry=0, n1=n1@entry=0,
lock=lock@entry=1, map=1) at mdbm.c:1891
#5 0x00007f4edaac3182 in expand_page (pagenum=33, page=0x7f4ed6bf7000, db=0x25af8a0) at mdbm.c:2963
#6 mdbm_store_r (db=0x25af8a0, key=key@entry=0x7f4ed76b8d90, val=val@entry=0x7f4ed76b8d80, flags=1, iter=iter@entry=0x0) at mdbm.c:5089
#7 0x00007f4edaac567a in mdbm_store (db=<optimized out>, key=..., val=..., flags=<optimized out>) at mdbm.c:5278
Could you introduce some special flag for mdbm_store
, say, MDBM_DO_NOT_SPLIT
, which will return -1
if db split is necessary, so the calling thread may obtain an exclusive db lock, split the database, release exclusive lock, and continue multithreaded insertion ?
Hi Zed-
Sorry for the delay in responding, I was out on vacation in the wilderness. The functionality you're looking for already exists ( mdbm_limit_size_v3): http://yahoo.github.io/mdbm/api/group__ConfigurationGroup.html#gafb7b88fcd40a3971136dc88ecfea0cab Note: you have to call it every time you open the MDBM.
Thanks! .timrc
Thank you. But mdbm_limit_size_v3
alone does not help. mdbm_pre_split
helps, but.. it can only be called once and only on empty database :(
The box your database is running on has a certain capacity, so if you set it the limits of the mdbm to the machine + the other software running on it, that should be able to avoid resizing? Also, if you don't have several terabytes of ram, some of the advantages of mdbm will go away as compared to other key / value stores.
it has 128GB RAM. Should I limit_size + pre_split to 100GB db size ?
Hi Zed- As long as everything else on the box uses less than the remaining 28G, that should be fine. You want to make sure you're not swapping, or performance will suffer noticeably.
Hi there. I'm trying to make N threads write to the same DB, and I keep getting some strange errors. I'm using latest MDBM from github on Ubuntu 14 x86_64.
I use following code:
and in 9 out of 10 runs I get errors like: 1:
2: SEGFAULT in MDBM_PAGE_PTR (pagenum=268439516, db=0x60dee0)
Am I doing anything wrong? Or is it a bug? Thanks!