danxmoran commented 3 years ago

The window-aggregate queries use the iot case's data. I've structured the generation so that every aggregate is its own query "type", so each one can be benchmarked and tracked independently. I'm not sure if that's helpful or more complicated than it needs to be; I know the Flux team tracked each aggregate separately, and there have been some single-aggregate bugs in the past.

williamhbaker commented 3 years ago

Was the 1.x influxql different at all from the 2.x flux performance with this? Just wondering because the query is very similar to the one that is being worked on in #182, except for that one has a keep(...) in it prior to the aggregate function, and it had a noticeable difference in performance between 1.x and 2.x.

danxmoran commented 3 years ago

Was the 1.x influxql different at all from the 2.x flux performance with this?

My impression from reviewing the results on Friday was that the results were roughly identical. I'll take another pass over it and write up the results as a table here.

danxmoran commented 3 years ago

Summarization of results

Setup

Data was generated using this command:

bulk_data_gen -use-case window-agg -scale-var 1000 -timestamp-end 2018-01-07T00:00:00Z

The output contained 55615680 records for the temperature field in the air_condition_room measurement, spanning a total of 1 week's time.

Queries were generated using:

for format in influx-http influx-flux-http; do
  for agg in mean min max first last count sum; do
    bulk_query_gen -timestamp-end 2018-01-07T00:00:00Z -use-case window-agg -format ${format} -query-type ${agg} -query-interval 3h
  done
done

I ran 1000 test queries per format/agg pair.

Results

count

Query format	Min	Mean	Max
influxql	472.141628	1357.6695292420025	1926.343201
flux	828.459959	985.2083041890005	1224.295411

first

Query format	Min	Mean	Max
influxql	543.244747	1378.0107262420004	1981.510501
flux	855.365037	1009.241931350001	1269.175665

last

Query format	Min	Mean	Max
influxql	498.179682	1389.1785750859965	2034.104212
flux	940.718645	1068.5060996840034	1137.785102

max

Query format	Min	Mean	Max
influxql	490.856071	1383.3908101379996	1959.479736
flux	893.805569	1026.1591959339985	1092.8559

mean

Query format	Min	Mean	Max
influxql	561.930075	1365.015655534	1978.11617
flux	910.039721	1038.9072154770006	1094.484401

min

Query format	Min	Mean	Max
influxql	581.910006	1388.8716602459997	2335.457797
flux	891.629088	1038.1678828050008	1116.738726

sum

Query format	Min	Mean	Max
influxql	505.987293	1386.5963802960011	2123.065591
flux	949.914877	1050.6677056470014	1146.576404

Summary

On average, Flux outperformed InfluxQL (though not by much). InfluxQL seemed less consistent overall, with bigger outliers in both directions. Since I ran these on my laptop I wouldn't definitively say Flux is "better" than InfluxQL for these operations, but I am confident in saying the two appear to have roughly equivalent performance for the window-aggregate case. I'd want to see it reproduced in our CI environment before declaring victory, though.

influxdata / influxdb-comparisons

feat: add window aggregate query-gen logic (and alias for data gen) #183

Summarization of results

Setup

Results

count

first

last

max

mean

min

sum

Summary