opensearch-project / OpenSearch

🔎 Open source distributed and RESTful search engine.
https://opensearch.org/docs/latest/opensearch/index/
Apache License 2.0
8.89k stars 1.63k forks source link

Use pull based model for Segment Replication #4577

Open ankitkala opened 1 year ago

ankitkala commented 1 year ago

Pull based model for Segment Replication We're working on using the existing implementation of Segment replication for cross cluster replication. We've identified 2 major areas in the existing Segment Replication design which doesn't work well for cross cluster replication. Technically we can get around these by overriding those part of implementation just for CCR, but wanted to check if it is possible to maintain a single solution that works for both cases.

Main crux of the issue is that current approach relies on a 2 way communication where primary can talk to replica and the vice versa. This doesn't work well for cross cluster use cases as we establish uni-directional connections where follower polls the leader and get all the data. We did explore whether we can create bi-directional cross cluster connections but decided to go against it.


Why not bi-directional connection for CCR? Pros:

Cons:


These are the 2 changes that would align the existing Segment Replication implementation with CCR. Kudos to the folks working on the original design who ensured such implementation swap can be done easily without modifying other other sub-components.

1. Use a pull mechanism for downloading the segments (instead of push) Current implementation for Segment Replication relies on the peer recovery constructs where it pushes the segments from primary's node to replica's node using the MultiChunkTransfer by invoking write chunk on the replica's node. We can change the direction here where replica requests each chunk from leader and stores locally. This might require some additional book-keeping on the replica side but should be do-able. We're already doing something similar for CCR where we fetch the segment from leader cluster by treating it as a snapshot repository here Also, this align with the remote store integration where replica would need to pull the segments from primary's remote store.

2. Use a polling mechanism on replica instead of listeners on primary refresh Instead of listening to refreshes on Primary, we can model it as a periodic polling where each replica polls for the latest checkpoint and trigger the replication event. The change in itself should be simple to do but we might want to do a benchmark comparison to ensure that there is no regression.

mch2 commented 1 year ago

@ankitkala Thanks for raising this. Imo the right idea long-term is to let users configure the behavior based on need & add support for a pull model. However, with segrep having two sources of copy in the near future (primary & remote store), is it worth adding a pull model with the primary only implementation of CCR now or should we start with CCR+segrep support only with a remote store?

Bukhtawar commented 1 year ago

Thanks @mch2 I think we should also think about how the interactions would look like if we just had remote store integration with primary(even outside CCR use case). Wouldn't that inherently be a pull based mechanism, where replicas poll for checkpoints and download segments diffs. So I think pull based mechanism will inherently be used even in local cluster with remote store. Now the only place we would be doing a push is in-cluster segrep with a local store. That's something we need to analyse if there are any known implications of changing that to pull. Let me know if we've done a similar analysis to conclude one way or the other

gbbafna commented 1 year ago

How are we envisioning OpenSearch users to use the SegRep feature :

  1. in-cluster segrep with a local store.
  2. in-cluster segrep with remote store

If the ~first~second one is the way forward for most of them , we might not need to solve for problems like https://github.com/opensearch-project/OpenSearch/issues/4245

ankitkala commented 1 year ago

is it worth adding a pull model with the primary only implementation of CCR now or should we start with CCR+segrep support only with a remote store

@mch2 For CCR, SegRep with remote store is definitely the first choice for us. Whether we want to support only one combination or both needs to be decided(will bring this up during feature brief). But if we want to keep the option to support segrep without remote store open, it might be worthwhile to do these refactors early on.

For local SegRep, are there any pros of not relying on remote store? The only reason i can think of is the additional effort required to setup the repository which might affect the adoption.

ankitkala commented 1 year ago

How are we envisioning OpenSearch users to use the SegRep feature :

  1. in-cluster segrep with a local store.
  2. in-cluster segrep with remote store

If the first one is the way forward for most of them , we might not need to solve for problems like #4245

Isn't it other way round? With remote store, we won't have to solve for the network bandwidth.

gbbafna commented 1 year ago

@ankitkala : Thanks , corrected my comment.

For local SegRep, are there any pros of not relying on remote store? The only reason i can think of is the additional effort required to setup the repository which might affect the adoption.

Most of the users already use snapshot for durability. So setting up repository shouldn't be a hindrance IMO. Cost might be a reason not to rely on remote store . @mch2 @Bukhtawar would know better on this .

I prefer that there are few modes for OpenSearch to operate for ease of operations and maintenance.

ankitkala commented 1 year ago

Reviving this discussion again since now we've been thinking towards segments and remote store integration. To disambiguate the discussion, the directionality(push or pull) can be thought either in terms of data flow or control flow.


Control flow Control flow would determine the trigger point for replicas to fetch the changes.

Pull based A pull based approach would rely on a polling mechanism where replica keeps polling(leader node or remote store) for new changes. Pros:

Cons:

Push based A push based approach would imply using event driven mechanism where primary shard would notify the replica after the new segments are available(i.e. refresh incase of local segrep and segemnt uploaded to remote store for segrep with remote store).

Pros:

Cons:


Data flow: For data flow,


Compatibility with CCR: Talking about the primary use case which is using segment replication with remote store, CCR can keep parity with the approach for local segrep with remote store, where primary sends a notification to the replica/follower and then replica/follower downloads the new segments from remote store.

One major blocking issue can the the bi-directional nature of the communication between leader and follower. As of now, we've not been thinking about CCR without remote store, so the data flow of local segrep shouldn't be an issue.

Just to conclude, we should still keep the control flow a push based mechanism. We'll need additional handling for cases like wait_until though. Data flow for segrep with remote store will be a pull based mechanism. For local segrep cases, we can explore whether we want to align it with segrep + remote implementation but making it pull, but i don't see enough ROI and is definitely not urgent to picked up.

anasalkouz commented 1 year ago

Closing this, we don't see a need for this at the moment.

Bukhtawar commented 2 weeks ago

Re-opening this as we want to evaluate and streamline mechanisms across local and remote replicas