[META] API Based Spark Connector and Benchmars

Is your feature request related to a problem? Machines generating operational data continue to grow in number and storage size and to present new use cases. Companies rely on logs to aid in troubleshooting and to better understand how their digital offerings are being used. Not all data is equal, though. As community members grow, they have to decide what high-value data to store in OpenSearch and what to store in object stores, due to cost/scale. Customers need to search both OpenSearch and object stores using separate tools and do so with the low latency required for log analytics use cases such as observability and security analytics. It is expensive, or impossible, to perform complex joined queries with filtered analytics at scale, not to mention to provide a unified view across use cases. Customers are left incurring costs on data stored elsewhere that they cannot easily put to use.

What solution would you like? OpenSearch to query object stores using Apache Spark and build indexes based on object store data so that search query latency is suitable for those troubleshooting live problems in their system.

What alternatives have you considered? Apache Spark has the compute required to power complex queries and aggregations that OpenSearch does not and has a large open source community. The team believes that compute should be modular in the future, but is starting with Apache Spark.

Do you have any additional context? Add any other context or screenshots about the feature request here.

opensearch-project / sql

[META] API Based Spark Connector and Benchmars #1746