opensearch-project / opensearch-spark

Spark Accelerator framework ; It enables secondary indices to remote data stores.
Apache License 2.0
12 stars 18 forks source link

[FEATURE] Enhance handling of Flint index data deletion to prevent dangling metadata log entry #356

Open dai-chen opened 1 month ago

dai-chen commented 1 month ago

Is your feature request related to a problem?

Currently, users can directly delete an OpenSearch index without using DROP and VACUUM index statement. This action can lead to issues where the index cannot be recreated because the metadata log entry persists. This scenario occurs frequently when indices are created with auto-refresh enabled, which can lead to errors.

Although the recent improvements in pre-validation (as seen in PR#297) have reduced these incidents, it is still possible for an OpenSearch index to be left dangling after a forceful deletion.

What solution would you like?

Enhances the handling of direct Flint index data deletions, particularly one that can intelligently manage or clean up metadata logs to avoid leaving dangling indices. This could possibly involve additional checks or cleanup processes when an index is deleted.

What alternatives have you considered?

One alternative is to improve user education regarding the management of indices. Users could be encouraged to use Flint SQL for index manipulations and avoid direct interactions with the OpenSearch index. However, this approach relies heavily on user compliance.

Do you have any additional context?

We've added such cleanup logic in recoverIndex API in https://github.com/opensearch-project/opensearch-spark/pull/241.