streamsets / tutorials

StreamSets Tutorials
Apache License 2.0
348 stars 192 forks source link

Kafka consumer Offset from beginning #146

Closed Loxnol closed 9 months ago

Loxnol commented 9 months ago

Hello,

I'm currently working on a simple pipeline to ingest kafka messages inside a log file.

I'm trying to consume all the data from the beginning of a topic but i'm only getting newer data added to this topic. Once consumed, the previous topic messages are not accessible anymore.

I've already test all the different "Auto Offset Reset" properties. Same for simple and multi topic consumers. In the official documentation docs.streamsets.com : auto.commit.interval.ms bootstrap.servers enable.auto.commit group.id max.poll.records

If I understand correctly all those parameters are locked so I can't disable the offset management and process all the data from the beginning of a topic.

Is there an additionnal Kafka configuration property to use or do I need to configure the topic directly via kafka CLI ??

StreamSets Data Collector version : 3.14.0 Kafka Consumer version : 2.0.0

Regards.

xverges commented 9 months ago

Hi Loxnol, thanks for your interest in StreamSets.

Loxnol commented 9 months ago

Hi @xverges, I've created this article : https://community.streamsets.com/community-articles-and-got-a-question-7/kafka-consumer-offset-from-beginning-every-batch-2301

Indeed I'm using the Open Source version.

Thank you for your help.

You can close the issue.