opensearch-project / data-prepper

OpenSearch Data Prepper is a component of the OpenSearch project that accepts, filters, transforms, enriches, and routes data at scale.
https://opensearch.org/docs/latest/clients/data-prepper/index/
Apache License 2.0
262 stars 202 forks source link

is there an example to setup data-prepper within kubernetes #1541

Open patrick-ding-domanirx opened 2 years ago

patrick-ding-domanirx commented 2 years ago

Is your feature request related to a problem? Please describe. I'm looking for instructions to setup data-prepper within K8s.

Describe the solution you'd like Helm install would be preferred.

Describe alternatives you've considered (Optional) Alternatively, a good example of yaml file with all K8s resources defined would work too.

Additional context By looking at the example here https://github.com/opensearch-project/data-prepper/blob/main/examples/dev/k8s/data-prepper.yaml, it is using example-k8s/data-prepper image which I cannot find from quick google search. After changed the docker image to opensearchproject/data-preper, I got the issue below during container startup. From the example, it already has ssl disabled with pipeline yaml file, but not sure if I have to disable ssl within data prepper configuration yaml file.

WARNING: sun.reflect.Reflection.getCallerClass is not supported. This will impact performance. 2022-06-24T16:39:27,052 [main] INFO com.amazon.dataprepper.parser.config.DataPrepperAppConfiguration - Command line args: /appconfig/trace_analytics_no_ssl.yml 2022-06-24T16:39:27,054 [main] INFO com.amazon.dataprepper.parser.config.DataPrepperArgs - Using /appconfig/trace_analytics_no_ssl.yml configuration file 2022-06-24T16:39:32,921 [main] WARN com.amazon.dataprepper.parser.model.PipelineConfiguration - Prepper configurations are deprecated, processor configurations will be required in Data Prepper 2.0 2022-06-24T16:39:32,925 [main] WARN com.amazon.dataprepper.parser.model.PipelineConfiguration - Prepper configurations are deprecated, processor configurations will be required in Data Prepper 2.0 2022-06-24T16:39:32,926 [main] WARN com.amazon.dataprepper.parser.model.PipelineConfiguration - Prepper configurations are deprecated, processor configurations will be required in Data Prepper 2.0 2022-06-24T16:39:32,928 [main] INFO com.amazon.dataprepper.parser.PipelineParser - Building pipeline [entry-pipeline] from provided configuration 2022-06-24T16:39:32,928 [main] INFO com.amazon.dataprepper.parser.PipelineParser - Building [otel_trace_source] as source component for the pipeline [entry-pipeline] 2022-06-24T16:39:33,076 [main] WARN com.amazon.dataprepper.plugins.source.oteltrace.OTelTraceSource - Creating otel-trace-source without authentication. This is not secure. 2022-06-24T16:39:33,076 [main] WARN com.amazon.dataprepper.plugins.source.oteltrace.OTelTraceSource - In order to set up Http Basic authentication for the otel-trace-source, go here: https://github.com/opensearch-project/data-prepper/tree/main/data-prepper-plugins/otel-trace-source#authentication-configurations 2022-06-24T16:39:33,077 [main] INFO com.amazon.dataprepper.parser.PipelineParser - Building buffer for the pipeline [entry-pipeline] 2022-06-24T16:39:33,089 [main] INFO com.amazon.dataprepper.parser.PipelineParser - Building processors for the pipeline [entry-pipeline] 2022-06-24T16:39:33,437 [main] INFO com.amazon.dataprepper.plugins.prepper.peerforwarder.discovery.DnsPeerListProvider - Found endpoints: [Endpoint{data-prepper-headless.opensearch, ipAddr=192.168.4.45, weight=1000}] 2022-06-24T16:39:33,438 [main] INFO com.amazon.dataprepper.plugins.prepper.peerforwarder.HashRing - Building hash ring with endpoints: [192.168.4.45] 2022-06-24T16:39:33,441 [main] INFO com.amazon.dataprepper.parser.PipelineParser - Building sinks for the pipeline [entry-pipeline] 2022-06-24T16:39:33,441 [main] INFO com.amazon.dataprepper.parser.PipelineParser - Building [pipeline] as sink component 2022-06-24T16:39:33,442 [main] INFO com.amazon.dataprepper.parser.PipelineParser - Building [pipeline] as sink component 2022-06-24T16:39:33,443 [main] INFO com.amazon.dataprepper.parser.PipelineParser - Building pipeline [service-map-pipeline] from provided configuration 2022-06-24T16:39:33,443 [main] INFO com.amazon.dataprepper.parser.PipelineParser - Building [pipeline] as source component for the pipeline [service-map-pipeline] 2022-06-24T16:39:33,444 [main] INFO com.amazon.dataprepper.parser.PipelineParser - Building buffer for the pipeline [service-map-pipeline] 2022-06-24T16:39:33,444 [main] INFO com.amazon.dataprepper.parser.PipelineParser - Building processors for the pipeline [service-map-pipeline] 2022-06-24T16:39:33,581 [main] INFO com.amazon.dataprepper.parser.PipelineParser - Building sinks for the pipeline [service-map-pipeline] 2022-06-24T16:39:33,581 [main] INFO com.amazon.dataprepper.parser.PipelineParser - Building [opensearch] as sink component 2022-06-24T16:39:33,588 [main] WARN com.amazon.dataprepper.plugins.sink.opensearch.index.IndexConfiguration - The parameters, trace_analytics_raw and trace_analytics_service_map, are deprecated. Please use index_type parameter instead. 2022-06-24T16:39:33,594 [main] INFO com.amazon.dataprepper.plugins.sink.opensearch.OpenSearchSink - Initializing OpenSearch sink 2022-06-24T16:39:33,600 [main] INFO com.amazon.dataprepper.plugins.sink.opensearch.ConnectionConfiguration - Using the username provided in the config. 2022-06-24T16:39:33,725 [main] INFO com.amazon.dataprepper.plugins.sink.opensearch.ConnectionConfiguration - Using the trust all strategy 2022-06-24T16:39:34,100 [main] INFO com.amazon.dataprepper.plugins.sink.opensearch.index.IndexManager - Found version 0 for existing index template otel-v1-apm-service-map-index-template 2022-06-24T16:39:34,100 [main] INFO com.amazon.dataprepper.plugins.sink.opensearch.index.IndexManager - Index template otel-v1-apm-service-map-index-template should not be updated, current version 0 >= existing version 0 2022-06-24T16:39:34,118 [main] INFO com.amazon.dataprepper.plugins.sink.opensearch.OpenSearchSink - Initialized OpenSearch sink 2022-06-24T16:39:34,119 [main] INFO com.amazon.dataprepper.parser.PipelineParser - Building pipeline [raw-pipeline] from provided configuration 2022-06-24T16:39:34,119 [main] INFO com.amazon.dataprepper.parser.PipelineParser - Building [pipeline] as source component for the pipeline [raw-pipeline] 2022-06-24T16:39:34,119 [main] INFO com.amazon.dataprepper.parser.PipelineParser - Building buffer for the pipeline [raw-pipeline] 2022-06-24T16:39:34,120 [main] INFO com.amazon.dataprepper.parser.PipelineParser - Building processors for the pipeline [raw-pipeline] 2022-06-24T16:39:34,120 [main] WARN com.amazon.dataprepper.parser.PipelineParser - No plugin of type Processor found for plugin setting: otel_trace_raw_prepper, attempting to find comparable Prepper plugin. 2022-06-24T16:39:34,122 [main] INFO com.amazon.dataprepper.parser.PipelineParser - Building sinks for the pipeline [raw-pipeline] 2022-06-24T16:39:34,122 [main] INFO com.amazon.dataprepper.parser.PipelineParser - Building [opensearch] as sink component 2022-06-24T16:39:34,123 [main] WARN com.amazon.dataprepper.plugins.sink.opensearch.index.IndexConfiguration - The parameters, trace_analytics_raw and trace_analytics_service_map, are deprecated. Please use index_type parameter instead. 2022-06-24T16:39:34,124 [main] INFO com.amazon.dataprepper.plugins.sink.opensearch.OpenSearchSink - Initializing OpenSearch sink 2022-06-24T16:39:34,124 [main] INFO com.amazon.dataprepper.plugins.sink.opensearch.ConnectionConfiguration - Using the username provided in the config. 2022-06-24T16:39:34,124 [main] INFO com.amazon.dataprepper.plugins.sink.opensearch.ConnectionConfiguration - Using the trust all strategy 2022-06-24T16:39:34,190 [main] INFO com.amazon.dataprepper.plugins.sink.opensearch.index.IndexManager - Found version 1 for existing index template otel-v1-apm-span-index-template 2022-06-24T16:39:34,190 [main] INFO com.amazon.dataprepper.plugins.sink.opensearch.index.IndexManager - Index template otel-v1-apm-span-index-template should not be updated, current version 1 >= existing version 1 2022-06-24T16:39:34,196 [main] INFO com.amazon.dataprepper.plugins.sink.opensearch.OpenSearchSink - Initialized OpenSearch sink 2022-06-24T16:39:34,333 [main] WARN com.amazon.dataprepper.pipeline.server.config.DataPrepperServerConfiguration - Creating data prepper server without authentication. This is not secure. 2022-06-24T16:39:34,333 [main] WARN com.amazon.dataprepper.pipeline.server.config.DataPrepperServerConfiguration - In order to set up Http Basic authentication for the data prepper server, go here: https://github.com/opensearch-project/data-prepper/blob/main/docs/core_apis.md#authentication 2022-06-24T16:39:34,334 [main] INFO com.amazon.dataprepper.pipeline.server.HttpServerProvider - Creating Data Prepper server with TLS 2022-06-24T16:39:34,336 [main] WARN org.springframework.context.support.AbstractApplicationContext - Exception encountered during context initialization - cancelling refresh attempt: org.springframework.beans.factory.UnsatisfiedDependencyException: Error creating bean with name 'dataPrepperServer' defined in URL [jar:file:/usr/share/data-prepper/data-prepper.jar!/com/amazon/dataprepper/pipeline/server/DataPrepperServer.class]: Unsatisfied dependency expressed through constructor parameter 0; nested exception is org.springframework.beans.factory.BeanCreationException: Error creating bean with name 'httpServer' defined in class path resource [com/amazon/dataprepper/pipeline/server/config/DataPrepperServerConfiguration.class]: Bean instantiation via factory method failed; nested exception is org.springframework.beans.BeanInstantiationException: Failed to instantiate [com.sun.net.httpserver.HttpServer]: Factory method 'httpServer' threw exception; nested exception is java.lang.IllegalStateException: Problem loading keystore to create SSLContext Exception in thread "main" org.springframework.beans.factory.UnsatisfiedDependencyException: Error creating bean with name 'dataPrepperServer' defined in URL [jar:file:/usr/share/data-prepper/data-prepper.jar!/com/amazon/dataprepper/pipeline/server/DataPrepperServer.class]: Unsatisfied dependency expressed through constructor parameter 0; nested exception is org.springframework.beans.factory.BeanCreationException: Error creating bean with name 'httpServer' defined in class path resource [com/amazon/dataprepper/pipeline/server/config/DataPrepperServerConfiguration.class]: Bean instantiation via factory method failed; nested exception is org.springframework.beans.BeanInstantiationException: Failed to instantiate [com.sun.net.httpserver.HttpServer]: Factory method 'httpServer' threw exception; nested exception is java.lang.IllegalStateException: Problem loading keystore to create SSLContext at org.springframework.beans.factory.support.ConstructorResolver.createArgumentArray(ConstructorResolver.java:800) at org.springframework.beans.factory.support.ConstructorResolver.autowireConstructor(ConstructorResolver.java:229) at org.springframework.beans.factory.support.AbstractAutowireCapableBeanFactory.autowireConstructor(AbstractAutowireCapableBeanFactory.java:1372) at org.springframework.beans.factory.support.AbstractAutowireCapableBeanFactory.createBeanInstance(AbstractAutowireCapableBeanFactory.java:1222) at org.springframework.beans.factory.support.AbstractAutowireCapableBeanFactory.doCreateBean(AbstractAutowireCapableBeanFactory.java:582) at org.springframework.beans.factory.support.AbstractAutowireCapableBeanFactory.createBean(AbstractAutowireCapableBeanFactory.java:542) at org.springframework.beans.factory.support.AbstractBeanFactory.lambda$doGetBean$0(AbstractBeanFactory.java:335) at org.springframework.beans.factory.support.DefaultSingletonBeanRegistry.getSingleton(DefaultSingletonBeanRegistry.java:234) at org.springframework.beans.factory.support.AbstractBeanFactory.doGetBean(AbstractBeanFactory.java:333) at org.springframework.beans.factory.support.AbstractBeanFactory.getBean(AbstractBeanFactory.java:208) at org.springframework.beans.factory.support.DefaultListableBeanFactory.preInstantiateSingletons(DefaultListableBeanFactory.java:955) at org.springframework.context.support.AbstractApplicationContext.finishBeanFactoryInitialization(AbstractApplicationContext.java:918) at org.springframework.context.support.AbstractApplicationContext.refresh(AbstractApplicationContext.java:583) at com.amazon.dataprepper.ContextManager.(ContextManager.java:48) at com.amazon.dataprepper.DataPrepperExecute.main(DataPrepperExecute.java:22) Caused by: org.springframework.beans.factory.BeanCreationException: Error creating bean with name 'httpServer' defined in class path resource [com/amazon/dataprepper/pipeline/server/config/DataPrepperServerConfiguration.class]: Bean instantiation via factory method failed; nested exception is org.springframework.beans.BeanInstantiationException: Failed to instantiate [com.sun.net.httpserver.HttpServer]: Factory method 'httpServer' threw exception; nested exception is java.lang.IllegalStateException: Problem loading keystore to create SSLContext at org.springframework.beans.factory.support.ConstructorResolver.instantiate(ConstructorResolver.java:658) at org.springframework.beans.factory.support.ConstructorResolver.instantiateUsingFactoryMethod(ConstructorResolver.java:638) at org.springframework.beans.factory.support.AbstractAutowireCapableBeanFactory.instantiateUsingFactoryMethod(AbstractAutowireCapableBeanFactory.java:1352) at org.springframework.beans.factory.support.AbstractAutowireCapableBeanFactory.createBeanInstance(AbstractAutowireCapableBeanFactory.java:1195) at org.springframework.beans.factory.support.AbstractAutowireCapableBeanFactory.doCreateBean(AbstractAutowireCapableBeanFactory.java:582) at org.springframework.beans.factory.support.AbstractAutowireCapableBeanFactory.createBean(AbstractAutowireCapableBeanFactory.java:542) at org.springframework.beans.factory.support.AbstractBeanFactory.lambda$doGetBean$0(AbstractBeanFactory.java:335) at org.springframework.beans.factory.support.DefaultSingletonBeanRegistry.getSingleton(DefaultSingletonBeanRegistry.java:234) at org.springframework.beans.factory.support.AbstractBeanFactory.doGetBean(AbstractBeanFactory.java:333) at org.springframework.beans.factory.support.AbstractBeanFactory.getBean(AbstractBeanFactory.java:208) at org.springframework.beans.factory.config.DependencyDescriptor.resolveCandidate(DependencyDescriptor.java:276) at org.springframework.beans.factory.support.DefaultListableBeanFactory.doResolveDependency(DefaultListableBeanFactory.java:1391) at org.springframework.beans.factory.support.DefaultListableBeanFactory.resolveDependency(DefaultListableBeanFactory.java:1311) at org.springframework.beans.factory.support.ConstructorResolver.resolveAutowiredArgument(ConstructorResolver.java:887) at org.springframework.beans.factory.support.ConstructorResolver.createArgumentArray(ConstructorResolver.java:791) ... 14 more Caused by: org.springframework.beans.BeanInstantiationException: Failed to instantiate [com.sun.net.httpserver.HttpServer]: Factory method 'httpServer' threw exception; nested exception is java.lang.IllegalStateException: Problem loading keystore to create SSLContext at org.springframework.beans.factory.support.SimpleInstantiationStrategy.instantiate(SimpleInstantiationStrategy.java:185) at org.springframework.beans.factory.support.ConstructorResolver.instantiate(ConstructorResolver.java:653) ... 28 more Caused by: java.lang.IllegalStateException: Problem loading keystore to create SSLContext at com.amazon.dataprepper.pipeline.server.SslUtil.createSslContext(SslUtil.java:35) at com.amazon.dataprepper.pipeline.server.HttpServerProvider.get(HttpServerProvider.java:41) at com.amazon.dataprepper.pipeline.server.config.DataPrepperServerConfiguration.httpServer(DataPrepperServerConfiguration.java:59) at com.amazon.dataprepper.pipeline.server.config.DataPrepperServerConfiguration$$EnhancerBySpringCGLIB$$6fac9e.CGLIB$httpServer$1() at com.amazon.dataprepper.pipeline.server.config.DataPrepperServerConfiguration$$EnhancerBySpringCGLIB$$6fac9e$$FastClassBySpringCGLIB$$b3de3ee2.invoke() at org.springframework.cglib.proxy.MethodProxy.invokeSuper(MethodProxy.java:244) at org.springframework.context.annotation.ConfigurationClassEnhancer$BeanMethodInterceptor.intercept(ConfigurationClassEnhancer.java:331) at com.amazon.dataprepper.pipeline.server.config.DataPrepperServerConfiguration$$EnhancerBySpringCGLIB$$6fac9e.httpServer() at java.base/jdk.internal.reflect.NativeMethodAccessorImpl.invoke0(Native Method) at java.base/jdk.internal.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:64) at java.base/jdk.internal.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) at java.base/java.lang.reflect.Method.invoke(Method.java:564) at org.springframework.beans.factory.support.SimpleInstantiationStrategy.instantiate(SimpleInstantiationStrategy.java:154) ... 29 more Caused by: java.io.IOException: Is a directory at java.base/sun.nio.ch.FileDispatcherImpl.read0(Native Method) at java.base/sun.nio.ch.FileDispatcherImpl.read(FileDispatcherImpl.java:48) at java.base/sun.nio.ch.IOUtil.readIntoNativeBuffer(IOUtil.java:276) at java.base/sun.nio.ch.IOUtil.read(IOUtil.java:245) at java.base/sun.nio.ch.FileChannelImpl.read(FileChannelImpl.java:229) at java.base/sun.nio.ch.ChannelInputStream.read(ChannelInputStream.java:65) at java.base/sun.nio.ch.ChannelInputStream.read(ChannelInputStream.java:107) at java.base/sun.nio.ch.ChannelInputStream.read(ChannelInputStream.java:101) at java.base/java.io.BufferedInputStream.fill(BufferedInputStream.java:244) at java.base/java.io.BufferedInputStream.read(BufferedInputStream.java:263) at java.base/sun.security.util.DerValue.init(DerValue.java:383) at java.base/sun.security.util.DerValue.(DerValue.java:327) at java.base/sun.security.util.DerValue.(DerValue.java:340) at java.base/sun.security.pkcs12.PKCS12KeyStore.engineLoad(PKCS12KeyStore.java:1960) at java.base/sun.security.util.KeyStoreDelegator.engineLoad(KeyStoreDelegator.java:220) at java.base/java.security.KeyStore.load(KeyStore.java:1472) at com.amazon.dataprepper.pipeline.server.SslUtil.createSslContext(SslUtil.java:24) ... 41 more

d02540315 commented 2 years ago

One update - I had to customize data-prepper-configuration yaml to disable the ssl there and then data-prepper container started successfully. But I'm looking for official documation/example or helm chart.

cmanning09 commented 2 years ago

We have an example for Kubernetes Logging with Fluent Bit and Data Prepper. We do not have a published helm chart.

Does this fit your use case? If not, what is your use case for Data Prepper and Kubernetes?

d02540315 commented 2 years ago

All our applications are running in K8s cluster. We'd like to use opentelemetry to send metrics/tracing data into opensearch. Potentially we will use opentelemetry to send application logs as well, but right now we are using fluent-bit to send the application logs into opensearch directly.

asifsmohammed commented 2 years ago

Thanks for creating the issue, we'll update the example configurations.

JustinasKO commented 2 years ago

Helm chart would be helpful. Once you have bigger infrastructure its nice to have as little as possible different deployment strategies. And now everything what we deploy on K8s (EKS) comes through Helm so it would be best to have chart for data-prepper also

JKrehling commented 1 year ago

It wouldn't take much to write one but there isn't that much yaml to this anyways.
Most of it is just the configmap. The rest is a single deployment unless you want some sidecars or something.

Might make mapping in certs or something a little bit easier.
modified this from https://github.com/opendistro-for-elasticsearch/data-prepper/blob/main/deployment-template/k8s/data-prepper-k8s.yaml

apiVersion: v1
kind: ConfigMap
metadata:
  labels:
    app: data-prepper
  name: data-prepper-config
data:
  pipelines.yaml: |
    log-pipeline:
      source:
        http:
          # Explicitly disable SSL
          ssl: false
          # Explicitly disable authentication
          authentication:
            unauthenticated:
          # The default port that will listen for incoming logs
          port: 2021
      sink:
        - opensearch:
            hosts: [ "http://elasticsearch-es-http:9200" ]
            # Change to your credentials
            username: "elastic"
            password: "password"
            # Add a certificate file if you are accessing an OpenSearch cluster with a self-signed certificate  
            #cert: /path/to/cert
            # If you are connecting to an Amazon OpenSearch Service domain without
            # Fine-Grained Access Control, enable these settings. Comment out the
            # username and password above.
            #aws_sigv4: true
            #aws_region: us-east-1
            # Since we are grok matching for apache logs, it makes sense to send them to an OpenSearch index named apache_logs.
            # You should change this to correspond with how your OpenSearch indices are set up.
            index: elastictest

  data-prepper-config.yaml: |
    ssl: false
---
apiVersion: v1
kind: Service
metadata:
  labels:
    app: data-prepper
  name: data-prepper-headless
spec:
  clusterIP: None
  ports:
    - name: "2021"
      port: 2021
      targetPort: 2021
  selector:
    app: data-prepper
status:
  loadBalancer: {}
---
apiVersion: v1
kind: Service
metadata:
  labels:
    app: data-prepper
  name: data-prepper-metrics
spec:
  type: NodePort
  ports:
    - name: "4900"
      port: 4900
      targetPort: 4900
  selector:
    app: data-prepper
status:
  loadBalancer: {}
---
apiVersion: apps/v1
kind: Deployment
metadata:
  labels:
    app: data-prepper
  name: data-prepper
spec:
  replicas: 1
  selector:
    matchLabels:
      app: data-prepper
  strategy:
    type: Recreate
  template:
    metadata:
      labels:
        app: data-prepper
    spec:
      imagePullSecrets:
      - name: regcred
      containers:
        - args:
            - bin/data-prepper
            - /etc/data-prepper/pipelines.yaml
            - /etc/data-prepper/data-prepper-config.yaml 
          image: opensearchproject/data-prepper:latest
          name: data-prepper
          ports:
            - containerPort: 2021
          resources: {}
          volumeMounts:
            - mountPath: /etc/data-prepper
              name: prepper-configmap-claim0
      restartPolicy: Always
      serviceAccountName: ""
      volumes:
        - name: prepper-configmap-claim0
          configMap:
            name: data-prepper-config
status: {}
---
besha100 commented 1 year ago

@JKrehling @asifsmohammed I see an example here But what if we want to run the Data Prepper with multiple pods and the pipeline is reading from S3 buckets. How can the data prepper replicas coordinate to read from the same S3 bucket but different files? Basically, we want the replicas to read from the same S3 bucket but avoid reading the same file to avoid ingesting duplicate data

ishangupta01 commented 1 year ago

@besha100 did you find a solution for that?

arichtman commented 12 months ago

I have a simple-ish Helm chart I'm happy to contribute but it probably needs some feedback to get the quality up and to match the project. Do I just open a Draft PR or ...?

https://github.com/arichtman/opensearch-data-prepper/tree/helm-chart/examples/helm-chart