[Feature Request] Rename index API

msfroh commented 1 year ago

Is your feature request related to a problem? Please describe. There are many reasons to want to rename an index. Maybe you had a typo. Maybe you decided to change naming conventions and want to update old indices to follow the new convention. Maybe you want to reuse the index name for some new data, but want to keep the old data under a different name.

Unfortunately, all existing options for index renaming are pretty invasive/heavy. See the "alternatives you've considered" section below.

Describe the solution you'd like

I would like an API that lets me change the name of an index without moving a single byte of the underlying Lucene indices.

One possible solution might build on the index alias APIs. Maybe whenever an index gets created, OpenSearch could treat the index name as an alias and generate its own internal name that the user doesn't need to worry about. Then a "rename" would really just mean adding a new alias and removing the old one.

Describe alternatives you've considered

Helpful folks over at Stack Overflow offered various possible suggestions: https://stackoverflow.com/questions/28626803/how-to-rename-an-index-in-a-cluster

To run through why I don't like those options:

Reindex means rebuilding my whole index from source. Seriously? Just to change a name that points to a collection of shards, I need to rerun all my document sources through their analyzers and rebuild all those fancy Lucene files? That's like if a legal name change involved cloning yourself, giving the clone a new name, and raising the clone from childhood to your present age so that they could replace you.
Clone index is not so bad, but it means putting the index in read-only mode and it's still creating a copy (with hard links, so it's not actually writing new bytes). The alias approach described above can be applied to a writeable index, where you add the new alias and only remove the old alias once everyone stops using it.
Restore snapshot is conceptually pretty similar to clone index, but it's copying a read-only snapshot from the past. That's potentially a lot of bytes getting copied from your repository.

Additional context N/A

dblock commented 1 year ago

This is a good ask! I like the idea of detaching the index name from the index and only updating that. What are all the places where the index name manifests itself (e.g. file name?)? Maybe we start by replacing those with uuids until there's only 1 reference?

dbwiddis commented 1 year ago

Musing about this, there's another potential type of implementation to consider. "Promote" an alias to the name (and the name then becomes an alias). Since both name and alias can be used simultaneously, this could all happen over time. It'd be similar to clone, I think, possibly without the need for locking.

msfroh commented 1 year ago

@dbwiddis -- I think your solution is much better than clone (for renaming), since the name and alias would continue to point to the same thing.

As I understand it, a clone is more like a fork in the update history (or a fork in the timeline if you're into multiverses). You pause updates (by closing the index) and create a clone with a new name. If you reopen the original index the two copies are allowed to diverge. (The hard-linking implementation of clone is just taking advantage of the write-once nature of Lucene segments, but updates would produce new and different segments in the two directories.)

Back to your suggestion, one downside (maybe small?) is that there are almost certainly places where the old name had extra significance. One case that I can think of is that your next snapshot is likely to be a full (versus incremental, new-segments-only) backup, since (AFAIK) the snapshot API passes the index name as an identifier to the repository. There could be workarounds for that, though (make snapshots alias aware? modify just the snapshot logic to use the UUID strategy described by @dblock?)

opensearch-project / OpenSearch

[Feature Request] Rename index API #5949