add project table command to allow materializing queries into a concrete table / view that can later be efficiently queried or stored in OpenSearch MV
PPL project command
Overview
Using project command to materialize a query into a dedicated view:
In some cases it is required to construct a projection view (materialized into a view) of the query results.
This projection can be later used as a source of continued queries for further slicing and dicing the data, in addition such tables can be also saved into a MV table that are pushed into OpenSearch and can be used for visualization and enhanced performant queries.
The command can also function as an ETL process where the original datasource will be transformed and ingested into the output projected view using the ppl transformation and aggregation operators
Syntax
project <viewName> [using datasource] As <query>
viewName
Specifies a view name, which may be optionally qualified with a database name.
USING datasource
Data Source is the input format used to create the table. Data source can be CSV, TXT, ORC, JDBC, PARQUET, etc.
AS query
The table is populated using the data from the select statement.
Usage Guidelines
The project command produces a view based on the resulting rows returned from the query.
Any query can be used in the AS <query> statement and attention must be used to the volume and compute that may incur due to such queries.
As a precautions an explain cost | source = table | ... can be run prior to the project statement to have a better estimation.
Examples:
project newTableName as |
source = table | where fieldA > value | stats count(fieldA) by fieldB
project ipRanges as |
source = table | where isV6 = true | eval inRange = case(cidrmatch(ipAddress, '2003:db8::/32'), 'in' else 'out') | fields ip, inRange
project avgBridgesByCountry as |
source = table | fields country, bridges | flatten bridges | fields country, length | stats avg(length) as avg by country
project ageDistribByCountry as |
source = table | stats avg(age) as avg_city_age by country, state, city | eval new_avg_city_age = avg_city_age - 1 | stats
avg(new_avg_city_age) as avg_state_age by country, state | where avg_state_age > 18 | stats avg(avg_state_age) as
avg_adult_country_age by country
Effective SQL push-down query
The project command is translated into an equivalent SQL create table <viewName> [Using <datasuorce>] As <statement> as shown here:
[x] Implemented tests for combination with other commands
[x] New added source code should include a copyright header
[x] Commits are signed per the DCO using --signoff
By submitting this pull request, I confirm that my contribution is made under the terms of the Apache 2.0 license.
For more information on following Developer Certificate of Origin and signing off your commits, please check here.
Description
add project table command to allow materializing queries into a concrete table / view that can later be efficiently queried or stored in OpenSearch MV
PPL
project
commandOverview
Using
project
command to materialize a query into a dedicated view: In some cases it is required to construct a projection view (materialized into a view) of the query results. This projection can be later used as a source of continued queries for further slicing and dicing the data, in addition such tables can be also saved into a MV table that are pushed into OpenSearch and can be used for visualization and enhanced performant queries.The command can also function as an ETL process where the original datasource will be transformed and ingested into the output projected view using the ppl transformation and aggregation operators
Syntax
project <viewName> [using datasource] As <query>
viewName Specifies a view name, which may be optionally qualified with a database name.
USING datasource Data Source is the input format used to create the table. Data source can be CSV, TXT, ORC, JDBC, PARQUET, etc.
AS query
The table is populated using the data from the select statement.
Usage Guidelines
The project command produces a view based on the resulting rows returned from the query. Any query can be used in the
AS <query>
statement and attention must be used to the volume and compute that may incur due to such queries.As a precautions an
explain cost | source = table | ...
can be run prior to theproject
statement to have a better estimation.Examples:
Effective SQL push-down query
The project command is translated into an equivalent SQL
create table <viewName> [Using <datasuorce>] As <statement>
as shown here:References
Related Issues
928
Check List
--signoff
By submitting this pull request, I confirm that my contribution is made under the terms of the Apache 2.0 license. For more information on following Developer Certificate of Origin and signing off your commits, please check here.