influxdata / influxdb-comparisons

Code for comparison write ups of InfluxDB and other solutions
MIT License
306 stars 112 forks source link

feat: add window aggregate query-gen logic (and alias for data gen) #183

Closed danxmoran closed 3 years ago

danxmoran commented 3 years ago

The window-aggregate queries use the iot case's data. I've structured the generation so that every aggregate is its own query "type", so each one can be benchmarked and tracked independently. I'm not sure if that's helpful or more complicated than it needs to be; I know the Flux team tracked each aggregate separately, and there have been some single-aggregate bugs in the past.

williamhbaker commented 3 years ago

Was the 1.x influxql different at all from the 2.x flux performance with this? Just wondering because the query is very similar to the one that is being worked on in #182, except for that one has a keep(...) in it prior to the aggregate function, and it had a noticeable difference in performance between 1.x and 2.x.

danxmoran commented 3 years ago

Was the 1.x influxql different at all from the 2.x flux performance with this?

My impression from reviewing the results on Friday was that the results were roughly identical. I'll take another pass over it and write up the results as a table here.

danxmoran commented 3 years ago

Summarization of results

Setup

Data was generated using this command:

bulk_data_gen -use-case window-agg -scale-var 1000 -timestamp-end 2018-01-07T00:00:00Z

The output contained 55615680 records for the temperature field in the air_condition_room measurement, spanning a total of 1 week's time.

Queries were generated using:

for format in influx-http influx-flux-http; do
  for agg in mean min max first last count sum; do
    bulk_query_gen -timestamp-end 2018-01-07T00:00:00Z -use-case window-agg -format ${format} -query-type ${agg} -query-interval 3h
  done
done

I ran 1000 test queries per format/agg pair.

Results

count

Query format Min Mean Max
influxql 472.141628 1357.6695292420025 1926.343201
flux 828.459959 985.2083041890005 1224.295411

first

Query format Min Mean Max
influxql 543.244747 1378.0107262420004 1981.510501
flux 855.365037 1009.241931350001 1269.175665

last

Query format Min Mean Max
influxql 498.179682 1389.1785750859965 2034.104212
flux 940.718645 1068.5060996840034 1137.785102

max

Query format Min Mean Max
influxql 490.856071 1383.3908101379996 1959.479736
flux 893.805569 1026.1591959339985 1092.8559

mean

Query format Min Mean Max
influxql 561.930075 1365.015655534 1978.11617
flux 910.039721 1038.9072154770006 1094.484401

min

Query format Min Mean Max
influxql 581.910006 1388.8716602459997 2335.457797
flux 891.629088 1038.1678828050008 1116.738726

sum

Query format Min Mean Max
influxql 505.987293 1386.5963802960011 2123.065591
flux 949.914877 1050.6677056470014 1146.576404

Summary

On average, Flux outperformed InfluxQL (though not by much). InfluxQL seemed less consistent overall, with bigger outliers in both directions. Since I ran these on my laptop I wouldn't definitively say Flux is "better" than InfluxQL for these operations, but I am confident in saying the two appear to have roughly equivalent performance for the window-aggregate case. I'd want to see it reproduced in our CI environment before declaring victory, though.