opensearch-project / OpenSearch

🔎 Open source distributed and RESTful search engine.
https://opensearch.org/docs/latest/opensearch/index/
Apache License 2.0
9.77k stars 1.82k forks source link

[Feature Request] Mapping fashion configuration for pipeline processors #13160

Open zane-neo opened 7 months ago

zane-neo commented 7 months ago

Is your feature request related to a problem? Please describe

Background

Current OpenSearch Core support field value configuration in multiple processors, e.g. AppendProcessor, SetProcessor etc. An example like below:

{
  "append": {
    "field": "your_target_field",
    "value": "{{{tenure}}}"
  }
}

With this configuration, user can append a new field or transform an existing field, but sometimes, user needs another pattern: create a new field based on an existing field value. E.g. below example has multiple new fields created with existing fields:

  1. Based on existing field a to generate a new field a_wc, a_wc records the words count field a has.
  2. Based on existing field b to generate a new field b_wc, b and b_wc are list type.
  3. Based on existing field c -> d to generate a new field d_wc, c -> d and d_wc are map type.
    {
    "a": "hello world",
    "a_wc": 2,
    "b": ["hello", "world"],
    "b_wc": [1, 1],
    "c": {
    "d": "foo bar",
    "d_wc": 2
    }
    }

    Problem statement

Currently the configuration of processor doesn't support this multiple fields mapping configuration fashion, which makes every processor needs to implement similar logics in their own.

Describe the solution you'd like

We can support multiple fields mapping configuration in opensearch core so that it can be reused in different processors across different plugins. We can support two different configuration styles for this, e.g.:

{
  "field_map": {
    "a": "a_wc",
    "b": "b_wc",
    "c": {
      "d": "d_wc"
    }
  }
}
{
  "field_map": {
    "a": "a_wc",
    "b": "b_wc",
    "c.d": "c.d_wc"
  }
}

Related component

Other

Describe alternatives you've considered

No response

Additional context

No response

shwetathareja commented 6 months ago

@zane-neo can you give example of existing processors and how they support this multiple fields mapping configuration without common support?

shwetathareja commented 6 months ago

Also @zane-neo, you mentioned

create a new field based on an existing field value

are you planning to create a new processor to support this?

zane-neo commented 6 months ago

@zane-neo can you give example of existing processors and how they support this multiple fields mapping configuration without common support?

@shwetathareja Currently we have text_embedding processor doing this: https://opensearch.org/docs/latest/ingest-pipelines/processors/text-embedding/

zane-neo commented 6 months ago

Also @zane-neo, you mentioned

create a new field based on an existing field value

are you planning to create a new processor to support this?

In fact, text_embedding processor is doing this, and a new processor: https://opensearch.org/docs/latest/search-plugins/text-chunking/ is also doing this.

peternied commented 6 months ago

[Triage - attendees 1 2 3 4 5 6 7 8] @zane-neo Thanks for creating this issue