opensearch-project / opensearch-spark

Spark Accelerator framework ; It enables secondary indices to remote data stores.
Apache License 2.0
22 stars 33 forks source link

[FEATURE]Support PPL Query as Projection #928

Open YANG-DB opened 3 days ago

YANG-DB commented 3 days ago

Is your feature request related to a problem?

PPL currently only supports read based queries from the underlying datastore.

In some cases it is required to construct a projection view (materialized into a view) of the query results. This projection can be later used as a source of continued queries for further slicing and dicing the data, in addition such tables can be also saved into a MV table that are pushed into OpenSearch and can be used for visualization and enhanced performant queries.

The command can also function as an ETL process where the original datasource will be transformed and ingested into the output projected view using the ppl transformation and aggregation operators:

What solution would you like? A new project as PPL command should take the following shape:

project newTableName as |
   source = table | where fieldA > value | stats count(fieldA) by fieldB

project ipRanges as |
       source = table | where isV6 = true | eval inRange = case(cidrmatch(ipAddress, '2003:db8::/32'), 'in' else 'out') | fields ip, inRange

project avgBridgesByCountry as |
       source = table | fields country, bridges | flatten bridges | fields country, length | stats avg(length) as avg by country

project ageDistribByCountry as |
       source = table | stats avg(age) as avg_city_age by country, state, city | eval new_avg_city_age = avg_city_age - 1 | stats 
            avg(new_avg_city_age) as avg_state_age by country, state | where avg_state_age > 18 | stats avg(avg_state_age) as 
            avg_adult_country_age by country

Do you have any additional context?