Piped Processing Language (PPL), enables users with exploration and discovery of their data, and finding search patterns in data stored in multiple locations (S3, OpenSearch, Prometheus) using a set of commands delimited by pipes (|).
During the past year the SQL/PPL team was focusing on the following tasks:
Transforming PPL to become OpenSearch default query language (specifically for logs/traces/metrics signals)
Promoting PPL as a viable candidate for the proposed CNCF Observability universal query language.
Seamlessly Interact with different datasources such as S3 / Prometheus / data-lake leveraging spark execution.
Using spark's federative capabilities as a general purpose query engine to facilitate complex queries including joins
Improve and promote PPL to become extensible and general purpose query language to be adopted by the community
For historical reasons, PPL language specifications is currently located in the OpenSearch SQL repository.
In addition the PPL specifications code & documents are not present as an independed (jar) artifact but is bundled with the SQL plugin as an OpenSearch Zip file.
History of PPL
PPL has become a general purpose pipeline language that finds attraction and usage in many places in the log analytics echo-system.
Its originated as a language that has a dedicated OpenSearch driver that was the only execution engine that could run the language inside opensearch.
This PR has the goal of decoupling the PPL language specifications and documentation away from the OpenSearch SQL Plugin and move it (back) into the PPL dedicated repository.
This repository should contain the following:
ANTLR specifications
Documentations
Planned changes and general language issues
The repository should release a jar artifact which is not coupled with the OpenSearch release cadence and should have its own versioning.
A major advantage for this approach would be to allow different execution engines (drivers) such as OpenSearch, Spark, Prometheus and more to be decoupled from the SQL repository and maintain a independed trail of support for PPL commands and features.
Is your feature request related to a problem?
Description
Piped Processing Language (PPL), enables users with exploration and discovery of their data, and finding search patterns in data stored in multiple locations (S3, OpenSearch, Prometheus) using a set of commands delimited by pipes (|).
During the past year the SQL/PPL team was focusing on the following tasks:
For historical reasons, PPL language specifications is currently located in the OpenSearch SQL repository. In addition the PPL specifications code & documents are not present as an independed (jar) artifact but is bundled with the SQL plugin as an OpenSearch Zip file.
History of PPL
PPL has become a general purpose pipeline language that finds attraction and usage in many places in the log analytics echo-system. Its originated as a language that has a dedicated OpenSearch driver that was the only execution engine that could run the language inside opensearch.
Since PPL has evolved and is now able to run on top of Spark as a fully qualified query language.
What solution would you like?
This PR has the goal of decoupling the PPL language specifications and documentation away from the OpenSearch SQL Plugin and move it (back) into the PPL dedicated repository.
This repository should contain the following:
The repository should release a jar artifact which is not coupled with the OpenSearch release cadence and should have its own versioning.
A major advantage for this approach would be to allow different execution engines (drivers) such as OpenSearch, Spark, Prometheus and more to be decoupled from the SQL repository and maintain a independed trail of support for PPL commands and features.
Do you have any additional context?
https://github.com/opensearch-project/sql/blob/main/docs/user/ppl/index.rst
https://github.com/opensearch-project/piped-processing-language
https://github.com/opensearch-project/sql/issues/1222
https://github.com/opensearch-project/opensearch-spark/issues/30