Collate processor - Githubissues

ashoktelukuntla commented 1 year ago

Is your feature request related to a problem? Please describe.

Pipeline users need an option to collate/compare by specifying criteria needed. A scenario is to know query performance between two clusters.

Describe the solution you'd like

Create a processor which would take inputs on what needs to be collated. For Instance, receive streaming logs containing details of live query which is happening on source cluster and pipeline will execute the same query on destination cluster.

Processor will able to read queries/derive from received logs and queue the query on destination cluster. While pipeline is running on other destination cluster , I would envision processor should be able to run user defined comparisons , generate latency , compare metrics or logs.

source:
    - collate:
        destination_cluster_node_id: "50855at-856-896545"
        query: ""
        api: ""
        schedule: " ***** "

Additional context

sharraj commented 1 year ago

This processor should handle large scale ingestion and should be able to queue all these queries and run them asynchronously on destination hosts while ingestion data. Also it should be able to run in multi-node cluster and cordinate any load distribution needs across nodes.

dlvenable commented 1 year ago

@ashoktelukuntla , Can you provide some broad context on what you are trying to perform? What is your end goal here? And how can Data Prepper help?

opensearch-project / data-prepper

Collate processor #2307