opensearch-project / OpenSearch

🔎 Open source distributed and RESTful search engine.
https://opensearch.org/docs/latest/opensearch/index/
Apache License 2.0
9.56k stars 1.75k forks source link

[Feature Request] Provide way of defining configuration for the pipeline #15921

Open martin-gaievski opened 1 week ago

martin-gaievski commented 1 week ago

Is your feature request related to a problem? Please describe

If feature or custom workflow requires certain configuration of the pipeline it has to be done manually. Those additional steps may lead to issues: some steps can be missing or configuration may have errors. Another issue may come from lack of awareness or knowledge: customer not being aware of additional pipeline configuration.

Describe the solution you'd like

Template of the default pipeline configuration that is created by the system itself and is based on information provided by developer/engineer together with the code of new feature.

Related component

Search

Describe alternatives you've considered

Simplified version of the solution may be: a dependency between processors. If one processor depends on another processor, then that foundational/child processor is added to the pipeline configuration (or just executed) by the system. Such "depends on" relation can be part of the processor registration.

Example: Can be extension for Factory class

it already accepts the map of processor factories final Map<String, Processor.Factory<SearchPhaseResultsProcessor>> processorFactories, but it can only return one instance of the Processor class. Factory can return collection of the processors depending on what it needs.

Additional context

Example of such use can would be search flow with a search pipeline, with existing SearchPhaseResultsProcessor that requires another Response processor to finalize its results.

Today we have to tell user to configure a pipeline in a certain way, something like following example:

PUT /_search/pipeline/nlp-search-pipeline

{
    "description": "My search pipeline",
    "phase_results_processors": [
        {
            "normalization-processor": {}
        }
    ],
    "response_processors": [
        {
            "processor_explain_publisher": {}
        }
    ]
}

It can be even more problematic if user already has a pipeline with one processor.

This is applicable to ingest pipelines as well.

msfroh commented 5 days ago

[Search community meeting triage]:

@martin-gaievski -- Ingest pipelines have the pipeline processor that embeds a pipeline as a processor, allowing reuse. If that embedded pipeline contains a single processor, it's a convenient way of embedding a single pre-configured processor. We don't have that for search pipelines yet, but it was part of the original proposal.

Would that level of reuse address your needs? Or do we need more of a semi-configured template?