Open kotwanikunal opened 9 months ago
thanks @kotwanikunal, also posting some of the alternative options that can be considered:
Utilize a new dynamic property on index settings for tiering which can be set at index creation or be updated to trigger migration.
The new index dynamic setting key can be index.access.type
with values as HOT / WARM
PUT /index/_settings
{
"index" : {
"store": {
"access" : {
"tier" : "HOT/WARM"
}
}
}
}
Tracking the migration An alternative to tracking is specifying the current migration tier of an index as follows -
Request:
GET /_tiering/_status?type=hot_to_warm
GET /_tiering/{index}/_status?verbose=<true/false>
This would require exposing transitionary statuses like hot_to_warm
to the end user which can be prohibitive when considering extensibility and addition of new tiers.
An alternative cancellation API can be added for convenience which will be useful mainly for admin operations so that a user can cancel/remove the tiering requests which are pending completion.
Request:
API: POST /<index_name>/_tiering/<tiering_id>/_cancel
(in case we decide to expose a new id for tiering to the cx)
API: POST /<index_name>/_tiering/_cancel
We would love to get feedback from the community for this: @andrross @sohami @reta @mch2 @shwetathareja @Bukhtawar @rohin
Makes sene, intent should always be to take an extensible approach.
With _tiering
you have multiple options, like listing cluster wide migration status where you don't need to pass index pattern, while when you start the API with index pattern you won't be able to get a top-level view(unless you want * for the cluster wide view), which looks hacky. The thing to note here is the security considerations with these API. The index pattern based API structure offers more fine-grained security semantics, meaning if only the user has permission on certain indices will the user be able to view the status. So we need to think if tiering should be an administrative action or individual users should have that control. IMO it should be an administrative action.
I would go with the _tiering
or better _tier/<optional-index-pattern>/_<action>
API if the common pattern is executing API cluster wide and index-pattern/_tier
if the common use case is operating tiers at index levels.
Thanks for the proposal @kotwanikunal @neetikasinghal
index.store.type
v/s index.store.access.tier
? Reusing index.store.type
does makes sense semantically, but we'll be overloading the setting beyond its intended purpose. It'd to good to have more feedback around this. max_cache_usage
, we haven't accounted for index/shard level usage. The FileCache as of now is only expected to have node level view. @Bukhtawar thanks for your feedback.
while when you start the API with index pattern you won't be able to get a top-level view(unless you want * for the cluster wide view), which looks hacky
_cat/indices
could also serve the use-case of showing the top-level view of the indices. _cat/indices, along with the tier of the current index can also show the MIGRATING status similar to how we see RELOCATING status in the _cat/shards
API.
@Bukhtawar Good points around the security consideration. These actions like tiering an index from hot to warm or vice versa are performed at index level and IMO will be more natural choice to keep it an index level action versus cluster level. For cluster admin one can define a role to provide access to this action on all the indices as needed.
Coming to status API, this will again be at index level, as different user should be able to view the status of indices which they are managing. If there are multiple indices they are managing then a pattern based input can be provided and based on security role/permission configuration it will be allowed or rejected based on how pattern resolve. In this world as well, a cluster admin will have access to this API but for all the indices so it can see the cluster view. *
pattern is not necessarily hacky as it is providing a way to control different personas like cluster_admin having access to all the indices vs selective users are limited to a specific index pattern.
@ankitkala
Should we evaluate pros & cons for using index.store.type v/s index.store.access.tier? Reusing index.store.type does makes sense semantically, but we'll be overloading the setting beyond its intended purpose. It'd to good to have more feedback around this.
Regarding index.store.access.tier
, the main motivation is to avoid associating a separate index property to treat an index as hot/warm, instead deduce that from different index properties. That way the definition of hot/warm can evolve over time (if needed) depending upon which properties are used to categorize the indices.
Regarding max_cache_usage, we haven't accounted for index/shard level usage. The FileCache as of now is only expected to have node level view.
I think we can build index/shard level limits later as well. But one thing that will be useful is to see if we can configure an optional max limit on the space used by warm indices on a shared node setup.
Thanks for the great write up around this feature, I like seeing the API calls and alternatives considered. I've got some very naïve question;
POST */_tier?type=cold
did they just block all indexing traffic?Thanks @peternied, please find the answers inline:
- If a sysadmin was to execute
POST */_tier?type=cold
did they just block all indexing traffic?
Read/write availability is ensured when the tier for an index is changed from hot to warm or warm to hot at all times. Talking specifically for change of tier to cold, it could potentially mean that the indexing traffic is blocked as cold tier would just have the archival data which has no writes. However, this feature restricts the scope of tier for an index to be hot/warm.
- Are resources allocated differently between hot/warm in such a way that sysadmins might want to know if a tiering change would reach a threshold?
Resource allocation will differ in terms of the disk usage or hot/warm. GET /_nodes/stats
API can help with monitoring of the disk usage for hot/warm.
- [Access control] Should the tiering type be considered sensitive - or is that safe for universal read (if you can see the index)?
It would be safe for universal read. It would be the choice of the customers to keep an index in hot/warm tier depending on their use-case.
Thanks for the proposal @kotwanikunal @neetikasinghal
Thinking more on the access pattern for tiering APIs or tier information
index-pattern/_tier
. Also, if a customer is triggering tiering action, they will also do it like this.If a sysadmin was to execute POST */_tier?type=cold
However, this feature restricts the scope of tier for an index to be hot/warm.
@neetikasinghal Following up on this - this feature won't support tiers other than hot/warm, or isn't in scope for the current plan? Or is this more of an OpenSearch core vs ISM plugin boundary?
If a sysadmin was to execute POST */_tier?type=cold
However, this feature restricts the scope of tier for an index to be hot/warm.
@neetikasinghal Following up on this - this feature won't support tiers other than hot/warm, or isn't in scope for the current plan? Or is this more of an OpenSearch core vs ISM plugin boundary?
@peternied This is just not in the scope of current plan, however the API options presented above are extensible to support other tiers like cold in future.
Is your feature request related to a problem? Please describe
The feature described below is related to #11703
Describe the solution you'd like
Background
Coming in from #11703 - A new mechanism is needed to configure the storage properties for an index, which can be used to modify the type of underlying node storage used by an index.
This mechanism should trigger a change on the index properties and should also provide the user with a mechanism to track the progress or cancel the operation.
Goals
Proposed Solution
How can the user create an index with the new properties?
The process to create an index with the new properties would utilize the existing
store
attribute on index settings.Simplified approach This approach will auto configure properties of the
store
with predefined configuration for the corresponding tier. API:**PUT** /<index_name>?tier=WARM
Body:Expert user approach: This approach can be used by expert users to define individual values for the corresponding tier properties on the store. API:
**PUT** /<index_name>/
Body:How can the user know the current tier attributes of an index? API:
**GET** /<index_name>/_settings
On similar lines, we can also list indices with their current state as follows -
Request:
Sample Response: includes an extra tier parameter to show the tier of the index
How can the user migrate an existing index to a different tier?
The client will use a new, custom API to perform the migration as follows -
API:
POST
/<indexNameOrPattern>/_tier
OR
(Preferred) API:
POST
/<indexNameOrPattern>/_tier/_warm
How can the user track migrations for indexes?
This API would show the on-going or failed migrations across different tiers.
Request:
GET /_tiering/_status?source=hot&target=warm
GET /_tiering/{index}/_status?verbose=<true/false
OR Preferred:
GET
/<indexNameOrPattern>/_tier?source=hot&target=warm
GET
/<indexNameOrPattern>/_tier?verbose=true/false
Sample Response:
How can the user cancel a migration?
The user can utilize the tiering APIs to cancel the migration by providing the original state of the index. If there is a current hot to warm migration going on, in order to cancel the migration, the user can trigger a warm to hot migration and return to the original state.
API:
POST
/<indexNameOrPattern>/``_tier
OR
(Preferred) API:
POST
/<indexNameOrPattern>/_tier/_hot
Contributors: @kotwanikunal, @neetikasinghal
Related component
Search:Remote Search