opensearch-project / OpenSearch

🔎 Open source distributed and RESTful search engine.
https://opensearch.org/docs/latest/opensearch/index/
Apache License 2.0
9.62k stars 1.76k forks source link

[Feature Request] Rename index API #5949

Open msfroh opened 1 year ago

msfroh commented 1 year ago

Is your feature request related to a problem? Please describe. There are many reasons to want to rename an index. Maybe you had a typo. Maybe you decided to change naming conventions and want to update old indices to follow the new convention. Maybe you want to reuse the index name for some new data, but want to keep the old data under a different name.

Unfortunately, all existing options for index renaming are pretty invasive/heavy. See the "alternatives you've considered" section below.

Describe the solution you'd like

I would like an API that lets me change the name of an index without moving a single byte of the underlying Lucene indices.

One possible solution might build on the index alias APIs. Maybe whenever an index gets created, OpenSearch could treat the index name as an alias and generate its own internal name that the user doesn't need to worry about. Then a "rename" would really just mean adding a new alias and removing the old one.

Describe alternatives you've considered

Helpful folks over at Stack Overflow offered various possible suggestions: https://stackoverflow.com/questions/28626803/how-to-rename-an-index-in-a-cluster

To run through why I don't like those options:

Additional context N/A

dblock commented 1 year ago

This is a good ask! I like the idea of detaching the index name from the index and only updating that. What are all the places where the index name manifests itself (e.g. file name?)? Maybe we start by replacing those with uuids until there's only 1 reference?

dbwiddis commented 1 year ago

Musing about this, there's another potential type of implementation to consider. "Promote" an alias to the name (and the name then becomes an alias). Since both name and alias can be used simultaneously, this could all happen over time. It'd be similar to clone, I think, possibly without the need for locking.

msfroh commented 1 year ago

@dbwiddis -- I think your solution is much better than clone (for renaming), since the name and alias would continue to point to the same thing.

As I understand it, a clone is more like a fork in the update history (or a fork in the timeline if you're into multiverses). You pause updates (by closing the index) and create a clone with a new name. If you reopen the original index the two copies are allowed to diverge. (The hard-linking implementation of clone is just taking advantage of the write-once nature of Lucene segments, but updates would produce new and different segments in the two directories.)

Back to your suggestion, one downside (maybe small?) is that there are almost certainly places where the old name had extra significance. One case that I can think of is that your next snapshot is likely to be a full (versus incremental, new-segments-only) backup, since (AFAIK) the snapshot API passes the index name as an identifier to the repository. There could be workarounds for that, though (make snapshots alias aware? modify just the snapshot logic to use the UUID strategy described by @dblock?)