opensearch-project / index-management

🗃 Automate periodic data operations, such as deleting indices at a certain age or performing a rollover at a certain size
https://opensearch.org/docs/latest/im-plugin/index/
Apache License 2.0
53 stars 111 forks source link

[FEATURE] Transforms should handle bucket script #1205

Open nicerloop opened 3 months ago

nicerloop commented 3 months ago

Is your feature request related to a problem? I want to have some computation done inside OpenSearch in a continuous manner, using transforms. My use-case is very similar to the one described foe ElasticSearch in this blog post: https://xeraa.net/blog/2021_elasticsearch-transforms-duration-status-updates/ The query without transform produces the expected result, but I want this result updated and stored inside OpenSearch each minute. When trying to declare the transform with a bucket script aggregation, I get an error message corresponding to https://github.com/opensearch-project/index-management/issues/671

What solution would you like? OpenSearch Index Transforms handle bucket scripts aggregations https://opensearch.org/docs/latest/aggregations/pipeline-agg/#bucket_script-bucket_selector I can then port the use-case in https://xeraa.net/blog/2021_elasticsearch-transforms-duration-status-updates/ to OpenSearch

What alternatives have you considered? We can externally query OpenSearch each minute and update the target index with the retrieved result. This requires an external component, and superfluous network exchanges, data extraction from OpenSearch, and data storage in OpenSearch, all realized without any support for optimizing the data scope to handle. It is a crude brute-force workaround that will put down our OpenSearch storage.

Do you have any additional context? If necessary guidance is provided, considering I can code in Java, I could envision doing the work, if it not too broad for a single person.

dblock commented 2 months ago

[Catch All Triage - 1, 2, 3, 4]