opensearch-project / index-management

🗃 Automate periodic data operations, such as deleting indices at a certain age or performing a rollover at a certain size
https://opensearch.org/docs/latest/im-plugin/index/
Apache License 2.0
52 stars 107 forks source link

[FEATURE] Share global shard checkpoints across Index Transform #1135

Open sarthakaggarwal97 opened 3 months ago

sarthakaggarwal97 commented 3 months ago

Is your feature request related to a problem? Currently, each of the transform job is independent of each other. There is no way where they interact with each other or share any information.

But, there could be scenarios where we would want the transform job to share its already process shard checkpoints with other transform jobs.

In cases, where we would like to split the current transform job (which maybe processes multiple indices at once), into new transform jobs to process over say individual indices. Right now, if we create the new transform jobs, they would re-process the already computed buckets by the old transform job.

There is no way to currently continue the work of old/parent transform job.

What solution would you like? This issue is to track to ability to share the global checkpoints across transform job in order to continue the work done by the old/ parent transform job.

Transform metadata internally maintains the global shard checkpoints to track the documents it needs to process upon run. If we are able to share this metadata from one transform job to another, we should be able to continue or split the work of the old transform job into new ones without worrying about data duplicacy or consistency.

dblock commented 2 weeks ago

Catch All Triage - 1 2 3 4 5