yahoo / mdbm

MDBM a very fast memory-mapped key/value store.
BSD 3-Clause "New" or "Revised" License
992 stars 111 forks source link

Partition Locking Performance Improvent #62

Closed mnagaya closed 7 years ago

mnagaya commented 8 years ago

Hello,

I’m Masakazu Nagaya from Yahoo! JAPAN.

This patch will improve partition locking performance when we have a lot of CPU cores.

Currently, Partition Locking provides only 128 mutexes, but I believe this locking granularity is too small in today’s systems.

Benchmark Host

2 x Xeon E5-2630L v3 1.80GHz (HT enabled, 16 cores, 32 threads)

Benchmark Command

mdbm_bench -y -d 128 -l 2 -h 10 -k 128 -v 1024 -w 10.0 -n 512:512

Benchmark Result

mdbm_bench_result

Thanks,

zwoop commented 8 years ago

This is pretty impressive results. Are there any reasons not to do this? Should the x16 multiplier be configurable?

areese commented 8 years ago

@zwoop There is an internal discussion that's not on this pr, about the impact. @steve-carney Can provide details here. :)

IIRC, it was we should be able to disable this, and configure the multiplier as well as test it on some of our more intensive processes.

timrc-git commented 8 years ago

Perhaps the best option is to have one Makefile variable for the multiplier, and another variable that, if set, forces the lock count to a constant.

I can imagine cases (e.g. virtual machines/containers that misrepresent the number of cores) where flexibility may be needed for tuning optimal performance.

timrc-git commented 8 years ago

Also, if there's nothing proprietary in the discussion, perhaps you could loop-in mdbm-users@yahoogroups.com. Thanks!

areese commented 8 years ago

https://bugs.openjdk.java.net/browse/JDK-6515172 is a good example of how the jdk is dealingwith containers.

steve-carney commented 8 years ago

If OSS users are happy with building binaries with a static setting, Tim’s suggestion is fine. I’m happy with this as the solution for now.

However, if we want the behavior to be dynamically configurable, we’d have to fan it out to the various utilities (i.e., a command-line option). I don’t see enough real benefits for doing this.

Increasing the number of locks does increase mdbm_open time for creating/opening lock files. For some apps (ex., those that do per-query mdbm_opens), this is noticeable. We can consider bumping the default up from 128, but that would probably be based on some application instead of a synthetic benchmark.

Steve

From: Tim [mailto:notifications@github.com] Sent: Thursday, 05 May, 2016 14:46 To: yahoo/mdbm mdbm@noreply.github.com Cc: Steve Carney carney.git@scubadoo.com; Mention mention@noreply.github.com Subject: Re: [yahoo/mdbm] Partition Locking Performance Improvent (#62)

Perhaps the best option is to have one Makefile variable for the multiplier, and another variable that, if set, forces the lock count to a constant.

I can imagine cases (e.g. virtual machines/containers that misrepresent the number of cores) where flexibility may be needed for tuning optimal performance.

— You are receiving this because you were mentioned. Reply to this email directly or view it on GitHub https://github.com/yahoo/mdbm/pull/62#issuecomment-217289292 https://github.com/notifications/beacon/AHJP7PXg0xg5wjL3Zf2dyrdo9pxl0wdYks5p-mUVgaJpZM4INjtW.gif

steve-carney commented 7 years ago

I'm uneasy with the idea that the number of locks is strictly tied to the number of cpus.  Because we don't have any experience with this, it would be a good idea to have it disabled by default until we have experience or someone does a conclusive investigation across load types and cpu counts.  Maybe this should be a configurable number to mdbm_open (or a separate call). An immediate downside is that increasing the number of locks will increase mdbm_open latency to create the additional locks.  Even if the locks are already created, the latency is still higher.  We got incredible blowback a few years ago from users that did an mdbm_open per-user query (yes, a bad design, but a common one up in web services land).  They were angered at having to refactor their apps to have a persistent process to keep the mdbms open. Anything that has to iterate over all locks will be impacted.  If there are lock upgrades/downgrades (ex., from some mode to exclusive mode) that might have higher latency.  I think this happens during a file expansion.  Or, possibly when a shared (read) lock wants to write. If there aren't any functional tests for tracking mdbm locking latency against a baseline, they need to be added.  We need to see latency results for configurations with 128, 256, 384 (typical 24 cpu host in production), and 512 locks. SteveOn Saturday, June 24, 2017, 9:04:00 AM PDT, Edmund Troche notifications@github.com wrote:

@edmund-troche commented on this pull request.

In src/lib/mdbm_lock.cc:

@@ -352,7 +352,10 @@ struct mdbm_locks open_locks_inner(const char dbname, int flags, int do_lock, count = get_cpu_count() * 2; } else if (flags & MDBM_PARTITIONED_LOCKS) { type = MLOCK_INDEX;

  • count = 128;
  • count = get_cpu_count() * 16;
  • if (count < 128) {
  • count = 128;
  • }

minor thing, but maybe it could look a bit cleaner if it was: count = max(128, get_cpu_count() * 16)

— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub, or mute the thread.

zwoop commented 7 years ago

Agreed. So reiterating, why not make the multiplier (and/or fixed lock count) configurable?

steve-carney commented 7 years ago

Because you don't know (and haven't specified) all of the actual use cases, an initial implementation could skip using a multiple and directly set the number of locks (which should probably be a multiple of 16, or whatever size fits well into system/mdbm pages).  This forces the decision back onto the app that knows its resource needs and access profile.  The default should remain at 128 locks to avoid backward compatibility issues, as Allen points out. An example of when you might not want to use a multiplier is when an app has limited concurrent readers and writers, but it's hosted on a large multi-tenant box. SteveOn Tuesday, June 27, 2017, 9:06:37 PM PDT, Leif Hedstrom notifications@github.com wrote:

Agreed. So reiterating, why not make the multiplier (and fixed lock count) configurable?

— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub, or mute the thread.

timrc-git commented 7 years ago

Can this PR be closed now? Or do folks think we need to make mdbm_open and tools take an (optional) lock-count argument? If so, I'd suggest making it an Issue (vs PR).

areese commented 7 years ago

@timrc-git I think there is merit in rebasing this off of the changes you have made with the ifdef, and allowing the # of locks to be specified during the open.

Does that seem reasonable?

timrc-git commented 7 years ago

Oh, I thought PR#78 completely covered this PR's functionality.

Passing the lock-count to mdbm_open is a separate thing, but it's also a much larger change. (I was assuming it would be a separate PR).

areese commented 7 years ago

@timrc-git yes you are correct. #78 added an ifdef to add this code in, instead of hard coding it here.

Thanks, I hadn't looked closely and for some reason thought this was passing it in for mdbm_open which would also be really nice, because in many cases, we know ahead of time more information when calling open and can make informed decisions.

Thanks!