opensearch-project / OpenSearch

🔎 Open source distributed and RESTful search engine.
https://opensearch.org/docs/latest/opensearch/index/
Apache License 2.0
9.54k stars 1.75k forks source link

OpenSearch on Spark (without an OpenSearch cluster) - has this been contemplated? #8566

Open schenksj opened 1 year ago

schenksj commented 1 year ago

Is your feature request related to a problem? Please describe.

I am contemplating partnering with some folks to deliver rich support for Lucene on Spark (EMR, Databricks, etc...) as a cost-effective alternative to needing a separate OpenSearch/elastic cluster for enabling fast search against large quantities of log data. This ideally would include indexing (on dataframe.write), retrieval (w/partition filter-based pre-scan searching ), and eventually ACID transactions and optimization (compact shards?) supported by delta-io log protocol. Extending the applicability of a solution like OpenSearch ultra warm to this use case could be an exciting alternative to starting from scratch with something like plain Lucene.

I think a solution like this would have significant applicability in the security domain as well as in application observability and support.

I'm curious to understand if anything like this has been contemplated by the community to date, and if any existing art/POC work exists that serve to catalyze the effort.

Describe the solution you'd like

I would appreciate community feedback as to whether there has been existing research/work that could be leveraged in this effort, or if this is truly novel.

Describe alternatives you've considered

Plain Lucene file providers...

Additional context Add any other context or screenshots about the feature request here.

penghuo commented 1 year ago
MaxKsyunz commented 1 year ago

@penghuo I can't access the opensearch-spark repo. Is it private?

schenksj commented 1 year ago

@penghuo I can't access the opensearch-spark repo. Is it private?

@penghuo i have the same issue! is there someone we can reach out to on this?

penghuo commented 1 year ago

Fix the link https://github.com/opensearch-project/sql/issues/1875. opensearch-spark is private repo now, we tansfer the issue from SQL repo to opensearch-reop by accident.