opensearch-project / OpenSearch

🔎 Open source distributed and RESTful search engine.
https://opensearch.org/docs/latest/opensearch/index/
Apache License 2.0
9.52k stars 1.75k forks source link

[Feature Request] Stronger directory abstraction for segment storage #13075

Open sachinpkale opened 5 months ago

sachinpkale commented 5 months ago

Is your feature request related to a problem? Please describe

RemoteDirectory abstraction was created to upload and download segments to and from the configured remote store. This abstraction was created to make it consistent with existing Directory interface that Lucene uses for segment operations (create, read, delete). But the RemoteDirectory abstraction is incomplete as it has no knowledge of FsDirectory implementation that handles segment operations for local store. IndexShard contains two instances of Store: store which contains FsDirectory instance for local disk and remoteStore which contains RemoteDirectory instance for remote store. Except the common IndexShard parent, store and remoteStore do not know anything about each other. Sync between these two stores is scattered across various code flows. This makes the entire abstraction leaky and error-prone. As we plan to add more features on top of remote backed storage (1. Searchable Remote Index 2. Writeable Warm), we need to come up with stronger directory abstraction to avoid non-maintainable code.

Describe the solution you'd like

Ideally, Store should encapsulate all the segment storage related constructs and corresponding syncs between these constructs. For the other operations like indexing or search, storage should be seen as a black box and can be accessed with provided interface. This also aligns with broader modularity vision with the next step of abstracting out storage as a separate module. For this RFC, we limit the discussion on segment storage abstraction only.

We propose to provide all the segment storage abstractions in the form of Directory. We call it OpenSearchDirectory. OpenSearchDirectory will have two components:

On top of existing Directory interface, the OpenSearchDirectory implementation will provide further abstractions like stats around remote store interaction, whether a segment file present in cache or storage or both etc. The actual sync between cache and storage would be hidden from the core. The OpenSearch core should not be aware whether a given segment is getting served from a cache or from storage as long as the right directory implementations for cache and storage are used.

Based on the use case, the role of cache and storage can be changed. For some known use cases, we can define cache and storage components as given below:

Related component

Storage:Remote

Describe alternatives you've considered

There were few alternative approaches proposed mostly to tackle the same problem:

  1. https://github.com/opensearch-project/OpenSearch/issues/7764
  2. https://github.com/opensearch-project/OpenSearch/pull/3703#discussion_r907584127

Additional context

ToDo:

  1. Add more details on existing segment upload/download flow and how it is tightly coupled with core
  2. Add details around the end goal state
mch2 commented 5 months ago

Thanks @sachinpkale I think there is definitely some much needed refactoring here.

How does a new Directory implementation here solve the issue of multiple Stores in IndexShard. Are you planning to move the remote store specific functionality currently in IndexShard to this new directory? ex. syncSegmentsFromRemoteSegmentStore? Where would these fit with directory API?

From what I can tell the only use of the remoteStore from within IndexShard is to fetch its remoteDirectory instance and do things with it. We could maybe extend the existing (somewhat already bloated) public API in store?

How does the OpenSearchDirectory fit with the plans for a CompositeDirectory? The injection of a cache & other dir implementations sounds very similar to the intent there.

sachinpkale commented 5 months ago

How does a new Directory implementation here solve the issue of multiple Stores in IndexShard

With new directory, we don't need multiple stores in IndexShard. One store instance that contains OpenSearchDirectory instance.

Are you planning to move the remote store specific functionality currently in IndexShard to this new directory? ex. syncSegmentsFromRemoteSegmentStore? Where would these fit with directory API?

Yes, syncSegmentsFromRemoteSegmentStore as well as logic in RemoteStoreRefreshListener will move to new directory. (answering second question at the last)

From what I can tell the only use of the remoteStore from within IndexShard is to fetch its remoteDirectory instance and do things with it. We could maybe extend the existing (somewhat already bloated) public API in store?

We can do this but the tight coupling of core and storage remains. For example, in replica promotion or recovery flows, we have if conditions added at multiple places to check if the index is remote store enabled or not. With the new directory, we want to remove these checks as well. So, core's interaction with segment storage does not change with or without remote store.

How does the OpenSearchDirectory fit with the plans for a CompositeDirectory? The injection of a cache & other dir implementations sounds very similar to the intent there.

Yes, it is similar on pattern where it contains 2 directories but it doesn't handle sync between cache and storage. Also, it does not provide any extensions to cache so only one type of cache can be present. But CompositeDirectory itself can be evolved into OpenSearchDirectory by adding right set of abstractions.

Where would remote store specific functionality fit with directory API?

We don't have clear answers to it but this is what I have thought about:

Cache to Storage Sync options:

Storage to Cache Sync:

sachinpkale commented 5 months ago

With OpenSearchDirectory, we are exploring the feasibility of OpenSearch core interacting with only one interface: OpenSearchDirectory and all the existing and future use cases around segment storage would be encapsulated within OpenSearchDirectory interface. Even though this issue talks about remote store based use cases, OpenSearchDirectory can be used where remote store is not used.

ankitkala commented 5 months ago

Thanks Sachin for the writeup. I definitely agree on keeping a single composite store object which a single directory.

How does the OpenSearchDirectory fit with the plans for a CompositeDirectory? The injection of a cache & other dir implementations sounds very similar to the intent there.

Yes, it is similar on pattern where it contains 2 directories but it doesn't handle sync between cache and storage. Also, it does not provide any extensions to cache so only one type of cache can be present. But CompositeDirectory itself can be evolved into OpenSearchDirectory by adding right set of abstractions.

Yes, That was the intent behind the current design for CompositeDirectory. Though it doesn't handle cache & storage sync it can definitely be evolved in this direction.

Storage to Cache Sync:

  • There could be a new method introduced in OpenSearchDirectory, say init() which will be called whenever we need to pull data from source of truth. Examples are replication flow, failover flow, recovery flow etc.

Just to add, I think OpenSearchDirectory should also be able to operate in Read only v/s write mode(or master/slave) for primary & replicas. Directionality of sync can be depending on how directory is configured. And we should be able to flip the behavior at runtime.

sachinpkale commented 5 months ago

@Thanks @ankitkala for reviewing.

I think OpenSearchDirectory should also be able to operate in Read only v/s write mode(or master/slave) for primary & replicas. Directionality of sync can be depending on how directory is configured. And we should be able to flip the behavior at runtime.

I am not sure if I understand it completely. Directory should not be knowing if it is a part of primary or replica, right?

ankitkala commented 5 months ago

@thanks @ankitkala for reviewing.

I think OpenSearchDirectory should also be able to operate in Read only v/s write mode(or master/slave) for primary & replicas. Directionality of sync can be depending on how directory is configured. And we should be able to flip the behavior at runtime.

I am not sure if I understand it completely. Directory should not be knowing if it is a part of primary or replica, right?

Correct. I just meant that directory should still be able to distinguish whether the remote is writable or not (i.e. replica's directory shouldn't be able to write).