opensearch-project / opensearch-devops

:smile: We welcome all the community members to help develop third party tools/automations/workflows for OpenSearch/OpenSearch-Dashboards.
https://opensearch.org/
Apache License 2.0
42 stars 34 forks source link

Kafka Connector Compatibility #128

Open jaehyeon-kim opened 12 months ago

jaehyeon-kim commented 12 months ago

Is your feature request related to a problem? Please describe. Support Kafka connect

Describe the solution you'd like I'd like to build a data pipeline to/from Kafka topics. Wanted to check which connectors may be compatible.

  1. Confluent ElasticSearch Sink Connector - https://www.confluent.io/hub/confluentinc/kafka-connect-elasticsearch
  2. Camel Kafka connectors
  3. Aiven's OpenSearch Connector - https://github.com/Aiven-Open/opensearch-connector-for-apache-kafka

Describe alternatives you've considered It'd be possible to use Kinesis Firehose but it'll requires additional steps that I'd like to avoide.

reta commented 12 months ago

@jaehyeon-kim the Aiven's OpenSearch Connector - https://github.com/Aiven-Open/opensearch-connector-for-apache-kafka is developed specifically to support OpenSearch and should be fully compatible.

dblock commented 11 months ago

Let's move this to the devops repo. I think this is a good topic to be documented on opensearch.org, or maybe a blog post? @jaehyeon-kim want to try and write it up?

jaehyeon-kim commented 11 months ago

@dblock

I haven't checked the Confluent connector but created an issue on the Camel Kafka connector repo to check compatibility. They mentioned they haven't tried and I can try if I want. If something goes wrong, I guess I wouldn't get enough support from them.

Aiven already has a documentation section on OpesnSearch (https://docs.aiven.io/docs/products/opensearch). As they use their own product, I'd be able to create a blog post working with MSK. Would you think it is a good topic?

Cheers, Jaehyeon

bbarani commented 11 months ago

@jaehyeon-kim I think a blog on MSK integration would be a good start.

pajuric commented 11 months ago

@jaehyeon-kim

I'm happy to provide direction and help you get your content published on the OpenSearch blog. If you have a moment, please file a new blog Issue on the website repo here: https://github.com/opensearch-project/project-website/issues/new?assignees=&labels=new+blog%2C+untriaged&projects=&template=blog_post.md&title= with all the details about the blog you'd like to write (suggested title, short description, a date you think it will be ready). This helps me schedule and plan for your blog.

Once you have written the blog, you can open a PR in the website-repo and post your content here: https://github.com/opensearch-project/project-website/pulls and post your draft for review. Our tech writing and editorial team will edit or make suggestions content and we'll help you get it ready to publish. Be sure to also include your author bio and image when you open your PR with the content to help the process move faster.

If you have any questions, you can email me at pattijur@amazon.com.

Thanks, PJ

dlvenable commented 11 months ago

@jaehyeon-kim , The Data Prepper project provides an ingestion pipeline for OpenSearch and other services. We are releasing Data Prepper 2.4.0 with Kafka support (planned for next week). Data Prepper 2.5.0 should add support for Kafka as a sink. In this way you can ingest Data into Kafka and send to OpenSearch or another Kafka topic.

Feel free to comment on https://github.com/opensearch-project/data-prepper/issues/1986 for the Kafka sink. Or open a new issue with any requests for the Kafka source.

jaehyeon-kim commented 11 months ago

@bbarani Thanks for your comment.

@pajuric I created an issue as you mentioned - https://github.com/opensearch-project/project-website/issues/1882

@dlvenable Thanks for letting me know about Data Prepper. While it looks interesting, I'm focusing on Kafka-related services for now. I'll keep checking it and would try it out later.

bbarani commented 11 months ago

@jaehyeon-kim FYI... Data Prepper 2.4.0 is now available for download. This release introduces a number of exciting new features, including a new Apache Kafka source, Amazon S3 batch processing, filtering inside of sinks, new S3 sink codecs, and streaming anomaly detection with high cardinality.

More info: https://opensearch.org/blog/Announcing-Data-Prepper-2.4.0/