timescale / tsbs

Time Series Benchmark Suite, a tool for comparing and evaluating databases for time series data
MIT License
1.26k stars 297 forks source link

Benchmark workflow #164

Open jonatas opened 3 years ago

jonatas commented 3 years ago

The objective here is to bring a wide discussion about the ideal model to run multiple benchmarks and allow us to correlate them later.

Discussing the idea with @ryanbooz and @zseta , generally, we start with a simple plan, like:

Let's test database A against database B with scenarios X and Z. Let's also run the same benchmark in different machine sizes to compare throughput and efficiency. Plus, let's see how the performance goes with different parallelization levels and so on.

And then we start our journey in a few steps:

  1. Provide the machines - set up the tsbs machine that will send data and queries to the targets with different configuration
  2. Setup the target machines installing OS/database and providing a common-auth to connect
  3. Setup the initial configuration we want to benchmark: how many rows of data, how dense is the time-series (rows/day), and the type of data IoT or DevOps.
  4. Later we adapt the initial config with the derived configuration targetting different machines and parallelization levels.

Machines ready, we're done to start using tsbs_load with --config and synchronize each run to not mess with parallel benchmarks in the master machine that could affect the performance.

To run every command, we should be in a screen session to keep it detached from our ssh connection.

Later, we need to collect all text reports and manually capture and move the data to a spreadsheet to allow us to correlate the benchmarks. Allowing us to better understand what params work best in which context.

The most important info we collect is the throughput of rows/sec and metrics/sec of each scenario: IoT/DevOps.

This is the first part, that covers load but we also have run_queries which also allows using different sets of configs like parallelization and different types of queries in the context.

From the queries, we have an identifier per query and we also can run several queries in parallel with variants of the initial configuration, like before and after compressing data.

After getting metadata from all the queries, we need to manually capture the performance information of each configuration set and move to a spreadsheet to later correlate the data.

So, several of these steps are done manually and we also need to keep an eye in the pipeline to reuse the same tsbs machine to push the data.

We don't have a specific issue here but an open space to see as a community how we can approach the problem and improve the way we work to have a better flow and also allow us to reuse previous benchmarks without needing to rerun everything again.

jonatas commented 3 years ago

Configuration ideas

One idea I have about running multiple configurations is to expand our regular config file to allow us to run some options that can multiply the configuration and serialize it.

Example of config:

data-source:
  type: SIMULATOR
  ... # skipping details here
loader:
  db-specific:
    # ...
    host: "some-ip-here"
    partition-index: true
    partitions: 1
    time-partition-index: false
    use-hypertable: true
    use-jsonb-tags: false
    # ... a lot more configs here
  runner:
    batch-size: 10000
    channel-capacity: "0"
    db-name: benchmark
    do-abort-on-exist: false
    do-create-db: true
    do-load: true
    flow-control: false
    hash-workers: true
    limit: 2016000000
    reporting-period: 30s
    seed: 135
    workers: 24

Now let's imagine we would like to test the following scenarios:

I could do the following changes, only transforming the configs into an array of values:

data-source:
  type: SIMULATOR
  ... # skipping details here
loader:
  db-specific:
    # ...
    host:
      - "some-ip-here"
      - "second-machine-ip-here"
      - "third-machine-ip-here"
    # ... a lot more configs here
  runner:
    batch-size: [1000, 5000, 10000]
    workers: [8, 16, 24]

And then, the tsbs_load could prepare the plan of configs and run sequentially in the target machines.

Collecting and storing results

From the storage perspective, I'd love if we set a TSBS server where people around the world could push different benchmarks to our server. Dumping different time-series related to benchmark, even tracking how CPU and IO behave while executing some specific benchmark task.

It can be the first step to have a tsbs website with all database benchmarks exposed. It can be a rich source to learn what technology works better in what scenario.