Support QPS/TPS reporting

mpetri commented 4 years ago

Describe the solution you'd like The bitfunnel paper contained some additions to the partitioned_elias_fano codebase which allowed measuring queries per second with multiple threads. see here: https://github.com/BitFunnel/partitioned_elias_fano/blob/master/Runner/QueryLogRunner.cpp#L56

It would be nice in addition to reporting the metrics by queries if pisa could provide such a multi threaded QPS benchmark tool.

Additional context

In practice one would be interested in metrics such as "if my p95 requirement is 100ms, how much QPS can I achieve with this index and this hardware configuration"

JMMackenzie commented 4 years ago

@amallia @elshize @mpetri

I am thinking about this at the moment. Do we want to try to use TBB to facilitate the multi threading, or should we try to follow the example from the BitFunnel experiments (they seem to roll their own thread synchronization so they avoid timing the overhead of spooling up threads prior to processing). I suspect we can make a simple one-function addition to the queries binary to allow threads.

I know that evaluate_queries does parallel processing but it uses the tbb::parallel_for approach which might add some overhead to the timings? Not entirely sure...

elshize commented 4 years ago

@JMMackenzie When you're talking about overhead, do you mean the overhead of setting up thread pool and such, or also overhead of starting each thread?

If I understand correctly, the goal is this: given a set of queries and a number of threads, run the queries in the specified order and report the query throughput. Is this correct?

If this is the case, we could spin up a number of workers that would wait on a concurrent queue for a query in a loop. This way we could measure even in different modes:

begin to end,
each execution separately,
report data for each query: time of arrival, time finished (so also latency)

I personally think that measuring just the query processing without any synchronization is a bit unfair, because concurrent execution always will have overhead over sequential. But as I said, we can have different ways of measuring if we really want.

JMMackenzie commented 4 years ago

@JMMackenzie When you're talking about overhead, do you mean the overhead of setting up thread pool and such, or also overhead of starting each thread?

Right, I mean starting each thread.

If I understand correctly, the goal is this: given a set of queries and a number of threads, run the queries in the specified order and report the query throughput. Is this correct?

I agree, this is the goal.

If this is the case, we could spin up a number of workers that would wait on a concurrent queue for a query in a loop. This way we could measure even in different modes:
* begin to end,

* each execution separately,

* report data for each query: time of arrival, time finished (so also latency)
I personally think that measuring just the query processing without any synchronization is a bit unfair, because concurrent execution always will have overhead over sequential. But as I said, we can have different ways of measuring if we really want.

Okay, I agree and I am in favor of your suggested approach. I think we have capacity to do this via TBB right?

elshize commented 4 years ago

Yes, I believe TBB should do just fine. There are two concurrent queues: one unbounded and one bounded. Since currently all queries are loaded to memory at the beginning anyway, we don't need to bound the queue.

pisa-engine / pisa

Support QPS/TPS reporting #400