opensearch-project / sql

Query your data using familiar SQL or intuitive Piped Processing Language (PPL)
https://opensearch.org/docs/latest/search-plugins/sql/index/
Apache License 2.0
121 stars 140 forks source link

[META]Add PPL `eval` CASE funcitonality support #3008

Open YANG-DB opened 2 months ago

YANG-DB commented 2 months ago

Is your feature request related to a problem? Current CASE functionality syntax is only available in SQL query, this functionality is important for actual use cases for Observability and in addition more general cases where eval CASE add helpful functionality to the query.

Examples:

The next example shows the case function evaluates the HTTP error codes stored in the error field:

 ... | eval error_msg = case(error == 404, "Not found", error == 500, "Internal Server Error", error == 200, "OK")

The next example shows the case function evaluates the sort_field wd field type stored in the field:

 ... | eval sort_field=case(wd=="SUPPORT",1, wd=="APPLICATION",2, wd=="STORAGE",3)

Existing SQL syntax support has the following definition:

specificFunction
   : CASE expression caseFuncAlternative+ (ELSE elseArg = functionArg)? END     # caseFunctionCall
   | CASE caseFuncAlternative+ (ELSE elseArg = functionArg)? END                # caseFunctionCall
   | CAST '(' expression AS convertedDataType ')'                               # dataTypeFunctionCall
   ;

caseFuncAlternative
   : WHEN condition = functionArg THEN consequent = functionArg
   ;

We need the same support for PPL as shown in the above example.

Here is an example based on the existing SQL case syntax:

source = my_index
| eval status_category = CASE status_code
    WHEN status_code >= 200 AND status_code < 300 THEN 'Success'
    WHEN status_code >= 300 AND status_code < 400 THEN 'Redirection'
    WHEN status_code >= 400 AND status_code < 500 THEN 'Client Error'
    WHEN status_code >= 500 THEN 'Server Error'
    ELSE 'Unknown'
  END
| stats count() by status_category

What solution would you like?

We need support for PPL eval CASE functionality for both:

- OpenSearch based PPL engine

Do you have any additional context?

YANG-DB commented 2 months ago

@LantaoJin I'll be happy to get your feedback here

LantaoJin commented 2 months ago

The CASE function is very useful whatever in SQL or PPL. I think we need it in PPL. Simple is one of important design concepts of PPL, it would be redundant to completely copy the SQL syntax in PPL. Instead of adding the WHEN, THEN, ELSE and END keywords, I prefer to replace the branches with list of predicate-value pairs. My proposal of CASE function is:

CASE(<predicate1>, <value1>, [<predicate2>, <value2>, ...] [ELSE] <default>)

ELSE could be optional, removing it will save an additional keyword in PPL. It is also more like a function than a clause. The example in description could be change to

source = my_index
| eval status_category =
    case(status_code >= 200 AND status_code < 300, 'Success',
         status_code >= 300 AND status_code < 400, 'Redirection'
         status_code >= 400 AND status_code < 500, 'Client Error'
         status_code >= 500, 'Server Error',
         'Unknown')
| stats count() by status_category
YANG-DB commented 2 months ago

@LantaoJin thanks for your comments - I agree this is more pipeline oriented and looks in corespondent to other PPL commands @penghuo @dai-chen - LMK what you think ...