The OpenSearch Piped Processing Language (PPL) currently lacks some advanced statistical aggregation capabilities similar to those provided by the eventstats command in Splunk Search Processing Language (SPL).
This feature request proposes adding new functions and syntax to PPL to enable statistical calculations and aggregations on event data.
Proposed Functionality:
Aggregate statistical calculations:
Calculate common statistical measures like sum, count, min, max, avg, etc., on specific fields or expressions.
Support grouping events by one or more fields and performing statistical calculations within each group.
Allow renaming the calculated fields with custom names.
Conditional aggregations:
Perform statistical calculations based on conditional expressions or filters.
Evaluate conditional expressions for each event and aggregate the results (e.g., sum of a conditional expression).
Chaining and nesting:
Enable chaining and nesting of statistical calculations, similar to how eventstats commands can be chained in SPL.
Allow performing multiple levels of aggregations and calculations in a single query.
Integration with existing PPL syntax:
Seamlessly integrate the new statistical aggregation capabilities with the existing PPL syntax and functions.
Ensure compatibility with other PPL features and maintain the overall usability and readability of the language.
Examples:
Calculate the sum of a conditional expression grouped by a field:
stats sum(if(field1 = "value" and field2 like "%pattern%", 1, 0)) as conditional_sum by group_field
Calculate minimum and maximum values of a field grouped by another field:
stats min(latency_field) as min_latency, max(latency_field) as max_latency by operation_id
Chain multiple statistical calculations:
stats sum(count) as total_count by client_id | stats sum(total_count) as overall_total
Support for PPL extended stats functionality is required for both:
High level Review
The OpenSearch Piped Processing Language (PPL) currently lacks some advanced statistical aggregation capabilities similar to those provided by the
eventstats
command in Splunk Search Processing Language (SPL). This feature request proposes adding new functions and syntax to PPL to enable statistical calculations and aggregations on event data.Proposed Functionality:
Aggregate statistical calculations:
sum
,count
,min
,max
,avg
, etc., on specific fields or expressions.Conditional aggregations:
Chaining and nesting:
eventstats
commands can be chained in SPL.Integration with existing PPL syntax:
Examples:
Support for PPL extended
stats
functionality is required for both:OpenSearch based PPL engine
Spark based PPL engine