opensearch-project / OpenSearch

🔎 Open source distributed and RESTful search engine.
https://opensearch.org/docs/latest/opensearch/index/
Apache License 2.0
8.89k stars 1.63k forks source link

[Writable Warm] initialize a writeable warm index from a snapshot #13675

Open mch2 opened 2 weeks ago

mch2 commented 2 weeks ago

Is your feature request related to a problem? Please describe

This is an optimization related to the Writeable Warm feature.

When the writeable warm feature is introduced we will have the ability to create indices and then migrate them to a warm tier. A use case this does not cover is to create a warm index from a snapshot without having to go through expensive segment download/re-upload as a remote backed index.

Describe the solution you'd like

I think we can make this happen with a new RemoteDirectory implementation that conditionally fetches from a blob store wired to an existing snapshot or another with the remote store directory. This new dir could be injected into RemoteSegmentStoreDirectory as its data directory. All metadata continues to push to the remote store as normal it is only when fetching a file that we would interface with the original snapshot if necessary. In a way its similar to a searchable snapshot dir however those code paths would not be reusable with the incoming writeable warm CompositeDirectory implementation. That dir handles block fetch above RemoteSegmentStoreDirectory.

This would look something like below with new writes would push to the remote store as normal and reads flowing through FilteredRemoteDirectory.

image

A requirement here would be to enforce some level of deletion protection on the original snapshot for the lifetime of the index or at least until all segments from the original snapshot are merged away. We could do this with some new index level settings to validate at snapshot deletion time to ensure its not backing any existing index, similar to searchable snapshots.

Related component

Storage:Remote

Describe alternatives you've considered

Nothing - restore from snapshot as a hot index then migrate. Migrate data off cluster and somehow wire it up when the dir initializes. This is risky because remote store paths are determined at index creation.

Additional context

No response

peternied commented 2 weeks ago

[Triage - attendees 1 2 3 4 5 6 7 8] @mch2 Thanks for creating this issue, looking forward to seeing a pull request to add this functionality