opensearch-project / index-management

🗃 Automate periodic data operations, such as deleting indices at a certain age or performing a rollover at a certain size
https://opensearch.org/docs/latest/im-plugin/index/
Apache License 2.0
52 stars 107 forks source link

[FEATURE] Search Optimization for Index Transform #1134

Open sarthakaggarwal97 opened 3 months ago

sarthakaggarwal97 commented 3 months ago

Is your feature request related to a problem? Currently, whenever the transform job is executed, the search phase is executed first before any compute or indexing processes are initiated.

Now, job scheduler schedules the job at specific intervals. Once the interval is over, the job is again initiated. If the search phase is taking up a lot of time, possibly more than the duration of the interval itself, the search process will keep on continuing during the transform job's every restart, till all the checkpoints / buckets / documents are visited.

In cases of time series data, where the transform job is unable to keep up with the indexing in source index, the transform job keeps on searching without computing and indexing into the source index.

Since the queried data is loaded into memory, the node could experience circuit breaker exceptions, due to which the job fails. Without circuit breakers, the node can go into OOMs as well.

What solution would you like? This is to propose a change in a way transform job is executed currently. Instead of waiting for the search phase execution to complete, we should keep on computing the data based on aggregations and thus indexing into target index. This would allow us to release some of the computed buckets from the memory, thus freeing up memory from time to time.

dblock commented 2 weeks ago

Catch All Triage - 1 2 3 4 5