influxdata / influxdb

Scalable datastore for metrics, events, and real-time analytics
https://influxdata.com
Apache License 2.0
28.33k stars 3.52k forks source link

Load generator tests on the last cache #25127

Open hiltontj opened 2 weeks ago

hiltontj commented 2 weeks ago

Once the last cache has been implemented we will want to run a series of load generator tests to see how it performs compared to SQL queries that would be used in its absence.

The only setup required in the load generator should be to create the specs that exercise the queries below. We could have the load generator create the last cache, but will probably just be easy enough to write some data in and create the cache using the CLI, before running the load gen tests.

Scenarios

Here are some scenarios we want to test.

1. Basic last value queries

In this case, the cache does not need to be keyed on any columns.

Using SQL

SELECT * FROM 'table' ORDER BY time DESC LIMIT 1

Using Last Cache

SELECT * FROM last_cache('table')

2. Multi-level tag/key hierarchy

In this case, the data has a hierarchical tag set, e.g., region/host/cpu. The last cache is keyed using the hierarchy region -> host -> cpu, and we want to compare query performance when using different combinations of predicates.

Using SQL

In general, situations where attempting to pull the N-most-recent values for a set of time series, we can use a combination of a ranking function, e.g., ROW_NUMBER() and PARTITION BY like so:

WITH ranked AS (
   SELECT
      *,
      ROW_NUMBER() OVER (PARTITION BY host ORDER BY time DESC) AS rn
   FROM cpu
   WHERE <predicate>
)
SELECT * FROM ranked
WHERE rn <= N

Here, predicate can be, e.g.,

Using Last Cache

Here, the last cache is doing the work for us, so we really just need to provide the above predicates like so:

SELECT * FROM last_cache('cpu') WHERE host IN ('_', '_', ...)
SELECT * FROM last_cache('cpu') WHERE region IN ('_', '_', ...)
SELECT * FROM last_cache('cpu') WHERE host = '_' AND cpu = '_'
SELECT * FROM last_cache('cpu') WHERE host = '_'
SELECT * FROM last_cache('cpu') WHERE region = '_'
pauldix commented 2 weeks ago

I'm guessing that the amount of data you write into the table before running the query test is going to have a significant impact on the performance of the SQL queries. It would be interesting to run it with some different amounts of historical data.

hiltontj commented 2 weeks ago

It would be interesting to run it with some different amounts of historical data.

Yes, definitely. I would also like to compare performance/memory usage between the two when using higher cardinality key columns.

hiltontj commented 1 week ago

Ran a preliminary test yesterday to compare query performance between the following two queries:

  1. No Cache (yellow line):
    SELECT * FROM data WHERE t1 = $t1_id ORDER BY time LIMIT 1
  2. With Cache (blue line):
    SELECT * FROM last_cache('data') WHERE t1 = $t1_id
    image

Details

The query latency for the (2.) with cache case doesn't show up because it is getting rounded down to 0 ms, but you can see from the queries per second that it is hovering around the 4-500 µs mark.

hiltontj commented 1 week ago

Oh, one detail omitted:

pauldix commented 1 week ago

Will be interesting to see what it looks like if you have 100 values you're pulling back like WHERE t1 in [...]. That query is going to be gnarly on the SQL side. And then an example that uses the last cache hierarchy to pull back an entire group of last values.

hiltontj commented 6 days ago

I filed https://github.com/influxdata/influxdb/issues/25174 to add support for more filter Exprs - we can definitely support the IN clause.

That query is going to be gnarly on the SQL side.

💯 not looking forward to composing that.

hiltontj commented 5 days ago

Update: I figured out the general query structure to select N-recent values from multiple time series. Note, that there is an open issue in Datafusion to optimize such queries (see https://github.com/apache/datafusion/issues/6899), so we should re-run this analysis when that optimization is implemented.

The issue description was updated with the relevant details.

hiltontj commented 3 days ago

Test: Last 5 Values - 1M Cardinality - Grouping by tag/key

Setup

In this test I compared the following two queries under the same write load.

In either case, the query test was run for 60 minutes, with the same write load running in parallel. Each query test uses a single querier that fires queries sequentially, one after the other, as fast as it can.

1. No Cache (Yellow)

WITH ranked AS (
   SELECT
      *,
      ROW_NUMBER() OVER (PARTITION BY t3 ORDER BY time DESC) AS rn
   FROM data
   WHERE t2 = $t2_id
   AND time > now() - INTERVAL '5 minutes'
)
SELECT * FROM ranked
WHERE rn <= 5

2. With Cache (Blue)

SELECT * FROM last_cache('data') WHERE t2 = $t2_id

Write Load / Data Spec

Parameter Detail
Tags: name (cardinality) t1 (10), t2 (1k), t3 (1M)
Fields: name (type) f1 (random string), f2 (random float)
Lines per sample 10,000
Writer Count 5
Writes per second 50,000
Segment Duration 5 minutes

Results

image

Discussion

Because of the data layout, there should be ~5k t3 values for each unique t2, so there should be ~5k rows returned per query.

The result above isn't awesome, but I don't want to over analyze this yet; here are some of the next tests I plan to try:

hiltontj commented 3 days ago

Test: Last Value - 1M Cardinality - Grouping by tag/key

Setup

This test setup is almost the same as the previous (https://github.com/influxdata/influxdb/issues/25127#issuecomment-2247968491), with the exception that the SQL query was changed to return a single value, and the cache to only store a count of one value.

1. No Cache (Yellow)

WITH ranked AS (
   SELECT
      *,
      ROW_NUMBER() OVER (PARTITION BY t3 ORDER BY time DESC) AS rn
   FROM data
   WHERE t2 = $t2_id
   AND time > now() - INTERVAL '5 minutes'
)
SELECT * FROM ranked
WHERE rn <= 1 -- <- changed to 1

2. With Cache (Blue)

SELECT * FROM last_cache('data') WHERE t2 = $t2_id

Write Load / Data Spec

Parameter Detail
Tags: name (cardinality) t1 (10), t2 (1k), t3 (1M)
Fields: name (type) f1 (random string), f2 (random float)
Lines per sample 10,000
Writer Count 5
Writes per second 50,000
Segment Duration 5 minutes

Results

image

This is a zoom in on the (2.) With Cache scenario to see the range of query latencies a bit more clearly:

image

Discussion

It is still difficult to see, but the query latency when using the cache is almost a square wave, toggling between ~5ms and ~100ms. It would be worth profiling to see what might be causing the slowness during the 100ms periods.

Although its gradual, the (1.) No Cache query latency is degrading over time, and did not saturate during the test, while the (2.) With cache queries look stable and also reduce the load on the CPU by a factor of 5-8x.

pauldix commented 3 days ago

The SQL query you're executing is only looking back 5 minutes, which is almost never what users do when they're looking for last values. That only works if they've actually written a value in the last 5 minutes. Also, your SQL query isn't grouping by the actual tag, so I don't believe you're actually pulling in the last value for every unique t3 in the given t2 group.

pauldix commented 3 days ago

Oh, maybe the partition by t3 accomplishes the grouping?

hiltontj commented 3 days ago

The SQL query you're executing is only looking back 5 minutes, which is almost never what users do when they're looking for last values. That only works if they've actually written a value in the last 5 minutes.

Yeah, the 5 minute look back might be a bit optimistic in the general sense, I figured it was acceptable given the write load should have values written in that time. I could try with a more conservative value - I originally was going to use the same as the cache TTL, which is 4 hours, but wanted to give the SQL a fighting chance 😅.

Also, your SQL query isn't grouping by the actual tag, so I don't believe you're actually pulling in the last value for every unique t3 in the given t2 group.

Yeah, it is a bit odd, but the PARTITION BY t3 is what does the grouping. WHERE t2 = $t2_id fixes on a given t2, for which there are many t3 values, then the PARTITION BY t3 groups by unique t3, i.e., by partitioning for each unique t3, and then ranks within the partition.


I am currently running a similar test to the above, but using an IN predicate that selects a fixed set of 100 t3 values. Once I am done that, I can try using a longer look back in the SQL queries.

hiltontj commented 2 days ago

Test: Last Value - 1M Cardinality - WHERE t3 IN (...)

This test setup is similar to previous, but uses an IN clause to select a specific subset of 100 t3 values (out of the total 1M t3 values in the data spec).

1. No Cache (Yellow)

WITH ranked AS (
   SELECT
      *,
      ROW_NUMBER() OVER (PARTITION BY t3 ORDER BY time DESC) AS rn
   FROM data
   WHERE t3 IN ('t3-id-10000', 't3-id-20000', ..., 't3-id-999999') -- 100 specific values total
   AND time > now() - INTERVAL '5 minutes'
)
SELECT * FROM ranked
WHERE rn <= 1

2. With Cache (Blue)

SELECT * FROM last_cache('data') WHERE t3 IN ('t3-id-10000', 't3-id-20000', ..., 't3-id-999999') -- 100 specific values total

Write Load / Data Spec

Parameter Detail
Tags: name (cardinality) t1 (10), t2 (1k), t3 (1M)
Fields: name (type) f1 (random string), f2 (random float)
Lines per sample 10,000
Writer Count 5
Writes per second 50,000
Segment Duration 5 minutes

Results

image

This is a zoom in on the (2.) With Cache scenario to see the range of query latencies a bit more clearly:

image

Discussion

When looking at the actual measured latencies in the (2.) With Cache case, there is a similar trend to before, where the latencies are like a square wave, this time toggling between ~3ms and ~100ms. The graph doesn't really capture this because of the windowing of the latencies the analysis app is doing to produce the graph.

So, again, probably worth profiling to see what is going on during those ~100ms stretches.

pauldix commented 2 days ago

Definitely worth a look on the profiling. I think the new WAL and write buffer refactor might have a big impact here because the write locking behavior is going to change quite a bit.