opensearch-project / OpenSearch

🔎 Open source distributed and RESTful search engine.
https://opensearch.org/docs/latest/opensearch/index/
Apache License 2.0
9.47k stars 1.74k forks source link

[Feature Request] Reduce refresh lag for remote store #11898

Open Bukhtawar opened 8 months ago

Bukhtawar commented 8 months ago

Is your feature request related to a problem? Please describe

With remote store, each refresh call requires primary to upload the local segments to remote and replica then needs to download the same on it's end. This leads to significant delay and adds significant lag on the replica leading to data staleness when queried on replica

Describe the solution you'd like

We can optimise search queries to use an optimistic protocol to send a request on the primary to see if it has the blocks of data needed to serve real time query results. If yes, then the data blocks can be sent over to replica over the wire avoiding an S3 upload/download path and real time queries served from replica. There are caveats with data sync and the amount of data that needs to be copied over based on ingestion volume and refresh rates. The approach needs to be benchmarked and tested for scale before this can be fully realised.

Related component

Storage:Remote

Describe alternatives you've considered

No response

Additional context

No response

andrross commented 8 months ago

Do you have some idea of what the user experience would be here? Would this be something for a user to opt in to? For example, use this feature if you want replication delays equal to or better than document replication, but do not use this feature if you want to maximize ingest and search throughput.

linuxpi commented 4 months ago

[Storage Triage - attendees 1 2 3 4 5 6 7 8 9 10 11 12]

@Bukhtawar thanks for opening this issue. This looks promising for certain usecases.

We would require deeper investigation and some benchmarks to make a decision here.