opensearch-project / opensearch-benchmark-workloads

Official workloads used by OpenSearch Benchmark (OSB)
https://opensearch.org/docs/latest/benchmark/
11 stars 58 forks source link

[Big5 Update] Update grouping and order in which operations run #298

Open IanHoang opened 1 month ago

IanHoang commented 1 month ago

Since Big5 runs the big 5 areas of search, we should group the operations by area and run them by groups. That way, when users export the results to a spreadsheet or CSV, they won't need to reorder the results to have the operations grouped.

For example, instead of the current order:

default
desc_sort_timestamp
asc_sort_timestamp
desc_sort_with_after_timestamp
asc_sort_with_after_timestamp
desc_sort_timestamp_can_match_shortcut
desc_sort_timestamp_no_can_match_shortcut
asc_sort_timestamp_can_match_shortcut
asc_sort_timestamp_no_can_match_shortcut
term
multi_terms-keyword
keyword-terms
keyword-terms-low-cardinality
composite-terms
composite_terms-keyword
composite-date_histogram-daily
range
range-numeric
keyword-in-range
date_histogram_hourly_agg
date_histogram_minute_agg
scroll
query-string-on-message
query-string-on-message-filtered
query-string-on-message-filtered-sorted-num
sort_keyword_can_match_shortcut
sort_keyword_no_can_match_shortcut
sort_numeric_desc
sort_numeric_asc
sort_numeric_desc_with_match
sort_numeric_asc_with_match
range_field_conjunction_big_range_big_term_query
range_field_disjunction_big_range_small_term_query
range_field_conjunction_small_range_small_term_query
range_field_conjunction_small_range_big_term_query
range-auto-date-histo
range-auto-date-histo-with-metrics

We can run it by groups like

# Text Querying
default
scroll
query-string-on-message
query-string-on-message-filtered
query-string-on-message-filtered-sorted-num
term

# Sorts
desc_sort_timestamp
asc_sort_timestamp
desc_sort_with_after_timestamp
asc_sort_with_after_timestamp
desc_sort_timestamp_can_match_shortcut
desc_sort_timestamp_no_can_match_shortcut
asc_sort_timestamp_can_match_shortcut
asc_sort_timestamp_no_can_match_shortcut
sort_keyword_can_match_shortcut
sort_keyword_no_can_match_shortcut
sort_numeric_desc
sort_numeric_asc
sort_numeric_desc_with_match
sort_numeric_asc_with_match
multi_terms-keyword

# Terms Aggregations
keyword-terms
keyword-terms-low-cardinality
composite-terms
composite_terms-keyword

# Range Queries
range
range-numeric
keyword-in-range
range_field_conjunction_big_range_big_term_query
range_field_disjunction_big_range_small_term_query
range_field_conjunction_small_range_small_term_query
range_field_conjunction_small_range_big_term_query

# Date Histograms
date_histogram_hourly_agg
date_histogram_minute_agg
composite-date_histogram-daily
range-auto-date-histo
range-auto-date-histo-with-metrics

Open to other alternatives as well.

gkamat commented 1 month ago

I think this is a good idea. Splitting up the test-procedure into parts and including with benchmark.collect Jinja2 macro should help with organizing the queries. This way, it would be possible to run only one category of queries as well, if so desired.