kohler / masstree-beta

Beta release of Masstree.
Other
430 stars 115 forks source link

mtd: add an option for configuring epoch interval #14

Closed mitake closed 8 years ago

mitake commented 8 years ago

Hi @kohler , nice to meet you.

Current mtd increases epoch in fixed interval (1s) and doesn't provide a way to configure it. However, the interval is an important parameter for the epoch based reclamation of mastree. For example, the parameter will affect a size of the limbo lists. Therefore it will affect space efficiency and latency of masstree.

This commit adds a new option --epoch-interval to mtd for making experiments with various epoch interval easy. Users can pass custom epoch interval with the option in millisec unit.

Example usage (a case of 100ms interval): $ ./mtd --epoch-interval 100

Currently I don't have meaningful results of this option because I cannot use manycore machines for now. If I can hear your opinion about this commit, I'm really glad.

kohler commented 8 years ago

Hi @mitake, nice to meet you too, and thanks for this!

This seems like a useful option but I have a couple line notes. I'll add them shortly.

kohler commented 8 years ago

Also, did you run into a problem with the larger epoch interval in some test? If so, could you describe that test—? Maybe we should merge something into mttest.

mitake commented 8 years ago

Hi @kohler , thanks a lot for your review!

I updated this PR for declaring epoch_interval_ms as double. Could you take a look?

Currently, I'm not seeing real problems because of the interval. So mttest doesn't need to be enhanced. I'm interested in understanding how the epoch value affects the performance, especially latency and amount of garbage objects, of masstree (and also silo). I added the option just for this purpose :)

If I understand correctly, every worker thread that processes requests from clients must call threadinfo::rcu_stop() and reap objects in its limbo list. If the interval is configured larger, the size of limbo list will be larger and it will enlarge pause time because of the cost of free(). If the interval is shorter, the size of limbo list will be smaller and the pause time will be also shorter. However, throughput would be worse in this case.

The above relation would be an essential tradeoff introduced by epoch based reclamation. And I'm considering that improving latency and spacial efficiency without sacrificing multicore scalability is possible or not. For example, if we can have an assumption that key access is distributed enough, using different GC for different depth level would be effective. RCU for deeper (frequently accessed by many threads) nodes and reference counting for shallower (not so frequently accessed) nodes?

But... I'm not sure the problem of latency is realistic :) I'm really glad if I can hear your opinion about this point too!

mitake commented 8 years ago

I measured latency of put in the same workload of mtclient with two different epoch intervals, because it would produce amount of garbage values and stresses GC process of RCU. The result were below:

default (1s) interval

minimum put latency: 0.000000 sec
maximum put latency: 0.034582 sec
minimum put latency: 0.000000 sec
maximum put latency: 0.035371 sec
minimum put latency: 0.000000 sec
maximum put latency: 0.038241 sec

0s (no GC) interval

minimum put latency: 0.000000 sec
maximum put latency: 0.016929 sec
minimum put latency: 0.000000 sec
maximum put latency: 0.008216 sec
minimum put latency: 0.000000 sec
maximum put latency: 0.007510 sec

I'm still not fully sure the difference of the above latencies came from GC. But it seems that there are differences which shouldn't be ignored.

(I measured latency with the change in this branch: https://github.com/mitake/masstree-beta/tree/latency )

kohler commented 8 years ago

Merged, thanks!

Those are interesting latency numbers. How does a different epoch interval perform?

Mtd's setup also is not designed to preserve low latency for GC. In particular, every thread listens on its own port, and during a GC pause, the GCing thread's port sits idle. There are likely better ways to handle this. During GC, a “buddy” thread could take over the GCing thread's port. Or the GC freeing process could happen on a different cleanup thread entirely.

I also took a look at the latency branch. Cool! Something like that would be quite interesting to merge in! There would be a couple changes I'd like to work with you about first.

  1. It's important that we be able to configure whether and how latency testing happens. Latency testing will hurt throughput, particularly if done on every request, as you do now. Test results be comparable across versions (so rw1 numbers could be compared). I could imagine adding support for latency tests directly to the client interface, and then using templates to compile away latency tests unless they're specifically required.
  2. We should change how latency testing results are reported.

But that's just some random thoughts. :)

mitake commented 8 years ago

Thanks for merging!

Those are interesting latency numbers. How does a different epoch interval perform?

I'll work on more detailed benchmark (including various workloads not only single key update) and would like to share the results in the near future.

For excluding the overhead of latency testing, I'll follow the template method as you say. But changing the reporting wouldn't be required. Because if the members related to latency results are not set in the result JSON object, the main process of mtclient can ignore the metrics simply. How do you think? Of course I might be missing something :)

mitake commented 8 years ago

Sorry, I misunderstood about your strategy for latency testing. If I add the testing to client class directly, the reporting method should be changed ;) I'll consider it.