Closed saranyaeu2987 closed 4 years ago
This seems more like Kafka Connect / Camel Connector question than Strimzi question.
1) In general, you can set there even higher numbers. But it depends on the connector whether it can do more than one task and in most cases more tasks than partitions makes little sense.
2) Judging from the errors, you maybe want to make max.poll.interval.ms
interval smaller and not bigger. You should be able to set it in KafkaConnect.spec.config
or in KafkaConnector.spec.config
.
3) I guess you need to first find out which part is slow - Kafka brokers? Network? Connect? Connector?
@scholzj
apiVersion: kafka.strimzi.io/v1beta1
kind: KafkaConnect
metadata:
name: emd-kafka-cluster
annotations:
strimzi.io/use-connector-resources: "true"
spec:
image: 202991147671.dkr.ecr.us-east-1.amazonaws.com/strimzicamel-kafka-connector:1.0.1
logging:
type: inline
loggers:
connect.root.logger.level: "INFO"
replicas: 1
bootstrapServers: <broker>
authentication:
type: plain
username: emdconsumer
passwordSecret:
secretName: sasl-user-pass
password: password
tls:
trustedCertificates:
- secretName: hebcert
certificate: HEBcert.crt
config:
group.id: emdconsumer-group-1
offset.storage.topic: emd-connect-offsets
config.storage.topic: emd-connect-configs
status.storage.topic: emd-connect-status
config.providers: file
config.providers.file.class: org.apache.kafka.common.config.provider.FileConfigProvider
key.converter: org.apache.kafka.connect.json.JsonConverter
value.converter: org.apache.kafka.connect.json.JsonConverter
key.converter.schemas.enable: false
value.converter.schemas.enable: false
max.poll.records: 100. <--- see this value??
Attached the log which shows default value **max.poll.records =500** is set instead of 100.
[container-location1.log](https://github.com/strimzi/strimzi-kafka-operator/files/4825803/container-location1.log)
2. Also, I see more messages at consumer end than produced.
How does Strimzi kafkaconnect ensure "**exactly once processing**" ?
3.
> I guess you need to first find out which part is slow - Kafka brokers? Network? Connect? Connector?
kafka brokers are not slow, because standalone consumer consumes 97k messages in 22 seconds.
How to check if connect or connector is slow - Suggestions please.
Strimzi is an orchestration layer only. Inside is a regular Apache Kafka release - with example the same binaries as you can download from kafka.apache.org. So there is no real Strimzi Kafka Connect - we just deploy Kafka Connect for you. The configs from KafkaConnect.spec.config
are just passed into the Kafka Connect configuration file. Maybe you need to prefix the max.poll.records
with consumer.
? I know it was needed for some options. You also have there 100.
- which I wonder if it is in the actual YAML or just a copy paste error.
@scholzj
100.
is just copypaste error.
I see a lot of Attempt to heartbeat failed since group is rebalancing
I feel thats causing the delay (maybe?)
max.poll.records
is a kafka property https://kafka.apache.org/documentation/#max.poll.records -> It can be used as is.
I checked /tmp/strimzi-connect.properties in pod and value is set max.poll.records=100
unsure where the log is pulling value max.poll.records=500
So, have you tried to prefix it with consumer.
s I suggested?
@scholzj
setting consumer.max.poll.records: 100
worked but it didnt speedup the process :(
Also, I see so many messages duplicated, how to guarantee exactly once processing ?
Can one kafkaconnect have multiple kafkaconnectors (1 for each topic) ?
Re 1) Well, setting that to lower level should IMHO in general improve the latency but worsen the throughput. So I don't think that would be expected to help.
Re 2) I have no idea how that works in Connect. But among other things it would depend how you store the data into S3, whether S3 supports something like that or not etc. If you just store the message to some bucket with file name based on some header etc., it should be just idempotent.
Re 3) I do not think there is any limit on number of connectors Strimzi would impose.
2) I thought below topics in kafkaconnect is mainly used for maintaining the actual topic(s) status and acts as idempotent/process once. Is it not true? config.storage.topic status.storage.topic
As I said, I have no idea how Kafka Connect implements exactly once. I do not expect these topics to have anything to do with it, but I don't know. The consumer in Connect is basically the regular Consumer API - so that will do whatever the regular consumer does in this respect. And as I said, the other side will depend on the connector and where it connects.
Is there still something we can help with here? Or can we close the issue? Thanks
Can we close this
On Wed, Aug 26, 2020, 3:29 PM Jakub Scholz notifications@github.com wrote:
Is there still something we can help with here? Or can we close the issue? Thanks
— You are receiving this because you authored the thread. Reply to this email directly, view it on GitHub https://github.com/strimzi/strimzi-kafka-operator/issues/3231#issuecomment-681133912, or unsubscribe https://github.com/notifications/unsubscribe-auth/AEXOLXXQ2WDW26HQPKJLSOLSCV5DHANCNFSM4OF7G6ZQ .
I have a kafka topic which currently has 97k messages across 10 partitions. The connector runs very slow - processing only 150 to 200 messages per min.
K8s cluster details: 3 node m5large eks cluster
KafkaConnector.yaml currently running on only one m5large instance
It takes very long time to consume messages and occasionally encounters following errors, but keeps proceeding.
A. Error 1 (occurred multiple times but proceeded)
B. Error 2 (occurred 2 times but proceeded)
Questions: