opensearch-project / OpenSearch

🔎 Open source distributed and RESTful search engine.
https://opensearch.org/docs/latest/opensearch/index/
Apache License 2.0
9.63k stars 1.77k forks source link

[Remote Store] Primary term validation with replicas - New approach POC #5033

Closed ashking94 closed 1 year ago

ashking94 commented 1 year ago

Is your feature request related to a problem? Please describe.

Primary term validation with replicas

In reference to https://github.com/opensearch-project/OpenSearch/issues/3706, with segment replication and remote store for storing translog, storing translog on replicas becomes obsolete. Not just that, the in sync replication call to the replicas that happens during a write call becomes obsolete. And as we know, the replication call serves 2 use cases - 1) to replicate the data for durability and 2) primary term validation, while the 1st use case is taken cared off with using remote store for translog, the 2nd use case still needs to be handled.

Challenges

Until now, we were planning to achieve the no-op by modifying the TransportBulk/WriteAction call and making it no-op. While we do not store any data, there was still one concern as these replication calls modifies the replication tracker state in the replica shard. With the older approach, we would have needed a no-op replication tracker proxy and needed to cut off all calls to replicas that updates the replication tracker on the replicas. This is to make sure that we are not updating any state on the replica - be it the data part (segment/translog) or the (global/local) checkpoints during the replication call or the async calls. This is a bit cumbersome on implementation making the code a lot intertwined putting a lot of logic on when to update checkpoints (replication tracker) vs when to not. This approach is a bit messier. Following things hinder or adds to complexity with older approach -

Proposal

However, this can be handled with a separate call for primary term validation (lets call it primary validation call) along side keeping the tracked and inSync as false in the CheckpointState in the ReplicationTracker. Whenever the cluster manager publishes the updated cluster state, the primary term would get updated on the replicas. When the primary term validation happens, the primary term supplied over the request is validated against the same. On incorrect primary term, the acknowledgement fails and the assuming isolated primary can fail itself. This also makes the approach and code change a bit cleaner.

With the new approach, we can do the following -

The above approach requires further deep dive and a POC -

Open Questions -

Describe the solution you'd like Mentioned above.

Describe alternatives you've considered A clear and concise description of any alternative solutions or features you've considered.

Additional context Add any other context or screenshots about the feature request here.

ashking94 commented 1 year ago

Draft PR showing prospective changes - https://github.com/opensearch-project/OpenSearch/pull/5061/commits/dc278705527c3c332c6d426ae3a7efe5ac5d0465. In addition to it, we need to place the separate primary term validation call too.

ashking94 commented 1 year ago

Couple of points that can be updated in the proposal -

With this approach,

ashking94 commented 1 year ago

This has been implemented.