opensearch-project / opensearch-catalog

The OpenSearch Catalog is designed to make it easier for developers and community to contribute, search and install artifacts like plugins, visualization dashboards, ingestion to visualization content packs (data pipeline configurations, normalization, ingestion, dashboards).
Apache License 2.0
21 stars 19 forks source link

[FEATURE]Add Schema Generate API #28

Open YANG-DB opened 1 year ago

YANG-DB commented 1 year ago

Is your feature request related to a problem?

A user would like to generate the physical representation of the schema as its manifested in the database -

What solution would you like?

As part of the user-flow the domain users would like to generate the physical mapping for their schema representing the templates from which the indices (schema instances) would be created and used.

API

GET _plugins/catalog/schema/{id}/generate?version=1.0&prefix=ss4o&suffix=*

This generate API takes all the schematic components belonging to schema id defined in the catalog repository and generates the appropriate physical mappings representation of the schema.

We are using the Naming-convention that will automatically create the names for the index mapping template according to their type.

Possible parameters

Example 1:

This following call GET _plugins/catalog/schema/observability/generate?version=1.0&prefix=ss4o will generate the following templates:

Component Templates

Index Templates


Example 2:

This following call GET _plugins/catalog/schema/observability/generate?version=1.0&prefix=ss4o&type=[traces,services]&namespace=us will generate the following templates:

Component Templates All the components that are needed to the generate the types described in the type list:

Index Templates


Example 3:

This following call GET _plugins/catalog/schema/observability/generate?version=1.0&prefix=ss4o&type=logs&components=[http,communication] will generate the following templates:

Component Templates All the components that are needed to the generate the types described in the type list:

Index Templates


Metadata -embedded

As part of the creation of the mapping template (both index and component) the schema metadata that specifies the logical schema origin and version is embedded into the mapping template structure under the _meta field:

  "template": {
    "mappings": {
      "_meta": {
        "version": "1.0.0",
        "catalog": "observability",
        "type": "logs",
        "component": "http"
      },

See examples :

version - mandatory semantic version of the schema (1.0 / 1.2 ) type - indicated the observability high level types "logs", "metrics", "traces" (this is taken from the category within the catalog) dataset - a field can contain anything that classify the source of the data - such as nginx - (If none specified * will be used). namespace - a user defined namespace - useful to allow grouping of data such as production grade, geography classification

Prefix - optional prefix name identifies the context of the entire naming Suffix - optional suffix name identifies the context of the specific index pattern (may be date or sequence )

The sso_1.0_{type}-{dataset}-{namespace} Pattern address the capability of differentiation of similar information structure to different indices accordingly to customer strategy.

For example a customer may want to route the nginx logs from two geographical areas into two different indices:

This type of distinction also allows for creation of crosscutting queries by setting the next index query pattern sso_1.0_logs-nginx-* or by using a geographic based crosscutting query sso_1.0_logs-*-eu.

Do you have any additional context?

Swiddis commented 1 year ago

Some initial questions:

YANG-DB commented 1 year ago

Once a index / component template was created - we can distinguish its source and version using the _meta details embedded within the mapping file itself: