vmxdev / tkvdb

Trie key-value database
ISC License
336 stars 24 forks source link

Does tkvdb lock the db file when open? #7

Closed canadaduane closed 5 years ago

canadaduane commented 5 years ago

I see that there is a note on tkvdb not being multithreaded, but can it be multi-process when writing to disk? i.e. if several processes open the same "data.tkvdb" file, is it guaranteed that they will wait on each other to put, commit & close?

canadaduane commented 5 years ago

To be more specific: I've created a command-line tool that opens a tkvdb database, writes to it, and exits. When I run 4 of these processes in parallel, it seems to be working; however, that surprises me because of the multithreading caveat, and I'm wondering if it isn't really working as I intend.

vmxdev commented 5 years ago

No, there is no any guarantees about it. File locking is highly OS-specific, it's hard to correctly implement locks even with Linux, see http://0pointer.de/blog/projects/locking.html for example.

Looks like correct writes happened accidentally. We're writing the transaction to disk as one chunk, using one write syscall, so the OS probably scheduled writes in correct order and did them atomically.

If you will use your OS synchronization primitives, everything should be fine. tkvdb does not use global variables, mutex in shared memory for multiprocess locks should be enough. For POSIX it may looks like

    /* setup mutex with pthread_mutexattr_setpshared(attr, PTHREAD_PROCESS_SHARED) attribute */

    pthread_mutex_lock(...);
    transaction->begin(transaction);
    transaction->get/put/cursor operations
    transaction->commit(transaction);
    pthread_mutex_unlock(...);

There is no need for locks on open() and close()

vmxdev commented 5 years ago

If you want to increase DB performance with multiple processes, locks will not help.

There is one resource (disk file) shared between multiple processes, access performance will be even slightly less than with one process.

It's possible to speed up DB part of program if each process (or thread) will fill it's own small database, and master process will merge them into one bigger database.

We have long-term plans on multithreading and multiprocessing, but currently it's just a plans.