opensearch-project / OpenSearch

🔎 Open source distributed and RESTful search engine.
https://opensearch.org/docs/latest/opensearch/index/
Apache License 2.0
9.47k stars 1.74k forks source link

[ABC Templates] Proposal on Implementation Details #14649

Open mgodwan opened 2 months ago

mgodwan commented 2 months ago

Is your feature request related to a problem? Please describe

Github RFC: https://github.com/opensearch-project/OpenSearch/issues/12683

Overview

The proposed templates in RFC https://github.com/opensearch-project/OpenSearch/issues/12683 aim to provide a pre-defined set of system templates (made available as part of the distribution), which can allow users to create indices/index-templates for specific use cases without worrying about the granular tuning that are required for those use cases. This document covers certain decisions being proposed to come up with a final proposal around how these templates can be implemented.

Requirements

  1. Application Based Configuration (ABC) templates should provide a sensible set of defaults for use-cases to tune their performance (compute, storage) and usability (index state management)
  2. Suggest Schema based on standards so that different types of integrations can leverage the default settings.
  3. Changes to core settings should be taken care of in the ABC templates as part of new version releases.
  4. New features applicable to the use cases catered by these templates should reflect on these, and apply to the index/index-template as well.
  5. ABC templates should be validated and made available for use as part of the bootstrap process. Any plugin dependent settings should be inferred and validated along with fallback options.

Proposed Implementation Details

Reusing Composable Index/Component Templates Resource

Background: Component Templates allow to declare settings and mappings which can be then used to create composable index templates. Composable Index Templates are defined using a schema which allows a template definition, along with multiple component templates which it can be composed of. These are applied to an index which matches the pattern defined the index template definition. In case of multiple composable index templates matching the index name pattern, the one with the highest priority is used. Some of the properties of index templates today are:

  1. Component Templates today don’t allow to parameterization at all. The settings/mappings can be declared, but there is no way to ensure that users provide certain settings during index creation. Since these are used by index templates today through a composed_of mechanism, we cannot parametrize them without breaking compatibility
  2. Index Templates are used only prior to index creation, and any changes to the index templates are not used once the index has been created.
  3. Only the latest version of these templates is stored in the cluster state.

Proposal: Use component template as the building block (storage model) and filter these templates from being used in composed_of declaration of composable index templates.

If we go ahead with this modeling, we can reuse the existing component template storage model, but we will need to add support to parameterize setting values in case to the template definition within the application logic.

We can’t add paramterization support in compsable index templates directly as of today as they break the auto create index flow during rollovers due to which this new internal template layer is needed. Hence, any index template tied with an ABC template will need to declare concrete values for parameters provided in the ABC templates.

e.g.

logs-basic.json // ABC Template

{
  "template": {
    "settings": {
      "codec": "best_compression",
      "merge.policy": "log_byte_size",
      "refresh_interval": "60s"
    }
  },
  "_meta": {
    "__abc_template": true,
    "_version": "1.0.0"
  },
  "version": 1
}

# Users can do the following. Order of application is component template followed by declared template followed by context definition (Context definition supersedes all declarations)
PUT _index_template/my-logs
{
    "pattern": "my-logs-*"
    "template": {
        "settings: {
            "refresh_interval": "60s"
        }
    },
    "context": { 
        "name": "logs-basic", 
        "version": "_latest" # Version is optional, default is latest with upgrades enabled,
        "params": {} # Required if params are present in ABC template
    }
}

Storage Model

Applying Template Upgrades

Based on the entity referencing the context for ABC template, we will allow to have the context of the entity to be updated when templates are upgraded to a new version.

Any updates to component templates are applied on the index templates referring them today as well. With ABC templates, we will extend this to be applicable for indices as well. By default, the template upgrades will apply to the entity using them in the context.

Template hosting

Overrides through other Templates/Index Settings

Handling Cluster upgrades

Initial Template Repository

Proposed Defaults

Template Repository will be focused on 2 things:

  1. Applying performance optimizations
  2. Adding default applications

Performance Optimization Settings [In Progress]

Request Logs

refresh_interval: "60s"
index.codec: "zstd_no_dict || best_compression"
merge_policy: "LOG_BYTE_SIZE"

Metrics

refresh_interval: "60s"
index.codec: "zstd_no_dict || best_compression"
merge_policy: "LOG_BYTE_SIZE"

Events

index.optimize_doc_id_lookup.fuzzy_set.enabled: true
index.optimize_doc_id_lookup.fuzzy_set.false_positive_probability: 0.10
index.merge.policy.deletes_pct_allowed: 5.0
merge_policy: "LOG_BYTE_SIZE",
index.codec: "zstd_no_dict || best_compression"

Related component

Indexing:Performance

getsaurabh02 commented 1 month ago

@mgodwan Do we need to update the label as 2.17?