opensearch-project / OpenSearch

🔎 Open source distributed and RESTful search engine.
https://opensearch.org/docs/latest/opensearch/index/
Apache License 2.0
9.79k stars 1.82k forks source link

[Feature Request] Design the unified download interface #11687

Open kotwanikunal opened 10 months ago

kotwanikunal commented 10 months ago

Is your feature request related to a problem? Please describe

Segment download mechanism for Snapshot-Restore, SearchableSnapshot, RemoteStore are currently fragmented across the core code base.

Describe the solution you'd like

Design a unified handler for segment downloads across the core code base.

Related component

Search:Remote Search

Describe alternatives you've considered

No response

Additional context

No response

kotwanikunal commented 9 months ago

Unified Download Manager

Overview

Parallel/MultiStream downloads need to accommodate for a unified DownloadManager to perform downloads from the repository for remote indices (remote store/remote index/searchable snapshot)

Currently, these download managers are fragmented where each feature has duplicated code with separate job queue to perform the required operations.

The following sections propose a design for an unified download manager where all the jobs will be submitted to a unified queue and prioritized based on the nature of download, as well as maximize the concurrency utilization and bounds for the download thread pool(s).

Current State of Downloads

This section outlines the current flow for file download use cases in a TL;DR format.

How does the remote translog download mechanism work?

How does the snapshot restore segment copy work?

How does the remote store segment copy work?

How does the searchable snapshot block fetching work?

Proposal

A new, unified DownloadManager will help with the remote index (remote store/remote index/searchable snapshot) based requests.

Design

image

APIs

APIs for DownloadManager will evolve over a few phases -

Phase 1: Introduction of a new, unified DownloadManager

image

Phase 2: Introduction of standard, re-usable APIs

image Phase 3: Introduction of a unified executor queue for download orchestration