opensearch-project / OpenSearch

šŸ”Ž Open source distributed and RESTful search engine.
https://opensearch.org/docs/latest/opensearch/index/
Apache License 2.0
9.69k stars 1.79k forks source link

[Proposal] Writable Warm Status API Model #14640

Open e-emoto opened 3 months ago

e-emoto commented 3 months ago

Is your feature request related to a problem? Please describe

The Status API in Writable Warm will be for listing the in-progress and failed index tierings. Since Writable Warm will eventually support other types of tierings for both dedicated and non-dedicated warm node clusters, the API needs to be extensible to cover those cases. The part of the design below describes the API model for the Status API, and the design for the rest of the API will be part of a follow-up task once some details about tiering metadata are figured out.

Describe the solution you'd like

API Model:

The API will use a source and target as input to filter which tierings are shown. It will validate that both inputs are valid tiers, and then use them to find any tierings that match the described type. The API should still work if only one of the source or target is given, and will find any tierings with that input, allowing for more flexible queries. In the default case if no source or target is given as input, the status API should return all in progress or failed tierings for the specified indices, regardless of the tiering change.

API Request:

GET /<indexNameOrPattern>/_tier?source=hot&target=warm GET /<indexNameOrPattern>/_tier?status=ongoing GET /<indexNameOrPattern>/_tier?verbose=true/false

The API would be a get request that has a few parameters. The index name in the path will be required, but can support using ā€˜_allā€™ or ā€˜*ā€™ to get migrations from all indices that match the parameters. The API will also support comma separated index names.

API Parameters:

source = hot / warm (optional, no default) target = hot / warm (optional, no default)

The values for the source and target parameters are the tiers, with source being the tier the index started in and target being the tier it is moving to.

status = failed / ongoing (optional, no default)

The values of the status parameter represent the state of the tiering. failed indicates that the tiering has failed and ongoing means the tiering process is in progress.

verbose = true / false (default false)

The verbose parameter determines whether the API response should include details like the shard relocation status, failure reason, and tiering start time.

API Response:

Success:

{
    "tiering_status" : [{
        "index" : "test1"
        "source": "hot",
        "target": "warm",
        "status" : "failed/ongoing",
    }]
}

With Verbose Flag:

{
    "tiering_status" : [{
        "index" : "test1"
        "source": "hot",
        "destination": "warm",
        "status" : "ongoing",
        "start_time" : "2024-06-27T00:00:00Z",
        "failure_time" : "2024-06-27T10:00:00Z",
        "duration" : "10:00:00",
        "shards" : {
                "total" : 10, 
                "successful" : 3, 
                "failed" : 2, 
                "ongoing" : 5,
            },
        "ongoing" : [{
                shard_id: 3, 
                node_id: "", 
                reason: ""
            }, 
            ... 
        ],
        "failed" : [{
                shard_id: 1, 
                node_id: "", 
                reason: ""
            }, 
            ... 
        ],
    }]
}

Failure:

{
    "error": {
        "root_cause": [
            {
                "type": "",
                "reason": "",
            }
        ],
    },
    "status": xxx
}

Example API Use Cases:

Get All Ongoing Tierings:

GET /_all/_tier?status=ongoing

{
    "tiering_status" : [{
        "index" : "test1"
        "source": "hot",
        "target": "warm",
        "status" : "ongoing",
    },{
        "index" : "test2"
        "source": "warm",
        "target": "hot",
        "status" : "ongoing",
    },

    ...

    ]
}

Get All Failed Hot To Warm Tierings:

GET /_all/_tier?source=hot&target=warm&status=failed

{
    "tiering_status" : [{
        "index" : "test3"
        "source": "hot",
        "target": "warm",
        "status" : "failed",
    },{
        "index" : "test4"
        "source": "hot",
        "target": "warm",
        "status" : "failed",
    },

    ...

    ]
}

Get Shard Details for a Specific Index Tiering:

GET /target_index/_tier?verbose=true

{
    "tiering_status" : [{
        "index" : "target_index"
        "source": "hot",
        "target": "warm",
        "status" : "ongoing",
        "start_time" : "2024-06-27T00:00:00Z",
        "failure_time" : "",
        "duration" : "",
        "shards" : {
                "total" : 10, 
                "successful" : 4, 
                "failed" : 0, 
                "ongoing" : 6,
            },
        "ongoing" : [{
                shard_id: 3, 
                node_id: "", 
                reason: ""
            }, 
            ... 
        ],
        "failed" : [],
    }]
}

Related component

Search:Remote Search

Describe alternatives you've considered

No response

Additional context

This issue is for getting feedback on the API structure, and will be followed up with a PR for the API spec and a low level design description.

Related issues: https://github.com/opensearch-project/OpenSearch/issues/14679 https://github.com/opensearch-project/OpenSearch/issues/13294

e-emoto commented 3 months ago

@andrross @dblock @mch2 @ankitkala @rayshrey Any feedback on this would be greatly appreciated, thanks!

jed326 commented 3 months ago

Thanks @e-emoto! 2 quick questions from me:

Looking forward to any PRs!

e-emoto commented 3 months ago

Thanks @e-emoto! 2 quick questions from me:

  • Do we need a pagination mechanism or do we typically expect the number of tiering_status items to be relatively low?
  • Do we need a corresponding tabular API? Like GET _cat/snaphots vs. GET _snapshot?

Looking forward to any PRs!

Hi @jed326, thanks for your response.
Regarding your first question, we don't think we need a pagination mechanism because the number of items returned is not unbounded and should usually not be very high. As for the second question, we are considering having a flag that makes the response tabular.

dblock commented 3 months ago

Overall the API as {index}/_tier looks consistent with other APIs to me.

sohami commented 3 months ago

Notice that it's _cat/snapshots and _snapshot, do we need something similar like tier vs. tiers?

@e-emoto @dblock I think we can align this status API similar to recovery API which shows different index shard recovery status. There are 2 variants: a) /_cat /recovery, /_cat/recovery/{index} and b) GET /_recovery, GET /{index}/_recovery. So for index status for tiering we can have:

1) /_cat/tier or even /_cat/tiering. I think we can have it without s since it is showing status about the action and not the resource like all possible snapshots or indices. 2) GET /_tiering, GET /{index}/_tiering

Should the response contain tier(s) and not tiering_status?

I think we can remove the tiering_status from response and have it in below format (sort of a map) where index name is key and status is in the value.

{
    "test1" : {
        "source": "hot",
        "target": "warm",
        "status" : "ongoing",
    },
    "test2": {
        "source": "warm",
        "target": "hot",
        "status" : "ongoing",
    },
}

Is it possible that in the future we'd want to tier something other than an index? In which case would we prefer storagetier or storage_tier? Or is storage_tier clearer regardless?

That is a good question. Index is a logical entity and we are tiering the data but in the unit of index, hence we are keeping it generic like tier. There can be compute nodes as well which are configured only for hot or warm tiers. So I think tier will fit well from that perspective too. Whether tier is referenced in context of index or node that will be determined by index/node level setting/attribute.

neetikasinghal commented 3 months ago

thanks @e-emoto. Instead of overriding the status api to display one variant(with verbose flag) in json and the other in tabular format(without the verbose), we can have other APIs as suggested by @sohami. It would be good to add in the details around the _cat and get tiering APIs (can be in a separate issue). We could also think on making the verbose as true for the status API if the high-level status can be given by the _cat/get tiering APIs.

Couple of other suggestions for the status API-