opensearch-project / opensearch-spark

Spark Accelerator framework ; It enables secondary indices to remote data stores.
Apache License 2.0
22 stars 33 forks source link

[FEATURE] Add PPL support to unnest arrays #644

Open A-Gray-Cat opened 2 months ago

A-Gray-Cat commented 2 months ago

Is your feature request related to a problem? Many log sources include arrays in a log line, and to efficiently extract and analyze these data, it would be very helpful to have a function in place to separate one log line that contains an n-element array to n log lines that contain one element from the array.

What solution would you like? It's similar to the explode function in Spark SQL, and the expand function in Splunk.

What alternatives have you considered? A clear and concise description of any alternative solutions or features you've considered.

Do you have any additional context? Add any other context or screenshots about the feature request here.

YANG-DB commented 2 months ago

@A-Gray-Cat thanks for your request - can you please add some context here if possible for an example command syntax or other language examples for such functionality ?

A-Gray-Cat commented 2 months ago

It's similar to the explode function in spark: explode(expr) Separates the elements of array `expr` into multiple rows, or the elements of map `expr` into multiple rows and columns. Unless specified otherwise, uses the default column name `col` for elements of the array or `key` and `value` for the elements of the map.

Link: https://spark.apache.org/docs/latest/sql-ref-functions-builtin.html#generator-functions

When using this in Spark SQL, it would be used with LATERAL VIEW at the same time:

SELECT r
FROM securitylake.amazon_security_lake_glue_db_us_east_1.amazon_security_lake_table_us_east_1_sh_findings_2_0 a
LATERAL VIEW EXPLODE(a.resources) as r