odpi / egeria

Egeria core
https://egeria-project.org
Apache License 2.0
797 stars 259 forks source link

[BUG] Kafka topic connector having issues with kafka partition rebalance #7149

Open planetf1 opened 1 year ago

planetf1 commented 1 year ago

Is there an existing issue for this?

Current Behavior

Noticed this when debugging a db2 integration setup:

022-11-29 16:17:51.081  INFO 39183 --- [ool-56-thread-1] o.a.k.c.c.internals.ConsumerCoordinator  : [Consumer clientId=consumer-6c2845ff-e8f6-415a-84a8-537ed0d
f1102-55, groupId=6c2845ff-e8f6-415a-84a8-537ed0df1102] Setting offset for partition egeria.omag.openmetadata.repositoryservices.cohort.cocoCohort.OMRSTopic.reg
istration-0 to the committed offset FetchPosition{offset=792, offsetEpoch=Optional[0], currentLeader=LeaderAndEpoch{leader=Optional[192.168.178.134:9092 (id: 0
rack: null)], epoch=0}}
2022-11-29 16:17:51.082  INFO 39183 --- [ool-24-thread-1] o.a.k.c.c.internals.ConsumerCoordinator  : [Consumer clientId=consumer-acce90cb-18c0-4a75-8db6-917f483
tadata{offset=0, leaderEpoch=null, metadata=''}} failed: Offset commit cannot be completed since the consumer is not part of an active group for auto partition
assignment; it is likely that the consumer was kicked out of the group.
2022-11-29 16:17:50.492  WARN 39183 --- [ool-82-thread-1] o.a.k.c.c.internals.ConsumerCoordinator  : [Consumer clientId=consumer-40605fed-50e5-4a98-a924-af9dbbf
69530-81, groupId=40605fed-50e5-4a98-a924-af9dbbf69530] Asynchronous auto-commit of offsets {egeria.omag.server.cocoMDS6.omas.assetmanager.outTopic-0=OffsetAndM
etadata{offset=18983, leaderEpoch=0, metadata=''}} failed: Commit cannot be completed since the group has already rebalanced and assigned the partitions to anot
her member. This means that the time between subsequent calls to poll() was longer than the configured max.poll.interval.ms, which typically implies that the po
ll loop is spending too much time message processing. You can address this either by increasing max.poll.interval.ms or by reducing the maximum size of batches
returned in poll() with max.poll.records.
2022-11-29 16:17:50.494  INFO 39183 --- [ool-56-thread-1] o.a.k.c.c.internals.ConsumerCoordinator  : [Consumer clientId=consumer-6c2845ff-e8f6-415a-84a8-537ed0d
f1102-55, groupId=6c2845ff-e8f6-415a-84a8-537ed0df1102] Successfully joined group with generation Generation{generationId=7, memberId='consumer-6c2845ff-e8f6-41
5a-84a8-537ed0df1102-55-976183a4-3c2b-4f81-a28d-32f9737ed5b5', protocol='range'}
2022-11-29 16:17:50.493  INFO 39183 --- [ool-38-thread-1] o.a.k.c.c.internals.ConsumerCoordinator  : [Consumer clientId=consumer-67937cc0-f8e7-4bb8-8031-77a0b15
5b524-37, groupId=67937cc0-f8e7-4bb8-8031-77a0b155b524] Successfully joined group with generation Generation{generationId=9, memberId='consumer-67937cc0-f8e7-4b
b8-8031-77a0b155b524-37-9d45891d-6421-4726-bce9-484a24f02bcf', protocol='range'}
2022-11-29 16:17:50.495  INFO 39183 --- [ool-82-thread-1] o.a.k.c.c.internals.ConsumerCoordinator  : [Consumer clientId=consumer-40605fed-50e5-4a98-a924-af9dbbf
69530-81, groupId=40605fed-50e5-4a98-a924-af9dbbf69530] Failing OffsetCommit request since the consumer is not part of an active group
2022-11-29 16:17:50.495  INFO 39183 --- [ool-66-thread-1] o.a.k.c.c.internals.ConsumerCoordinator  : [Consumer clientId=consumer-405bf426-3888-402a-a28c-3cc8796
bf5a2-65, groupId=405bf426-3888-402a-a28c-3cc8796bf5a2] Request joining group due to: need to re-join with the given member-id: consumer-405bf426-3888-402a-a28c
-3cc8796bf5a2-65-7ac52688-23f5-4744-b054-9e62a3b143c1
2022-11-29 16:17:50.495  INFO 39183 --- [ool-38-thread-1] o.a.k.c.c.internals.ConsumerCoordinator  : [Consumer clientId=consumer-67937cc0-f8e7-4bb8-8031-77a0b15
5b524-37, groupId=67937cc0-f8e7-4bb8-8031-77a0b155b524] Finished assignment for group at generation 9: {consumer-67937cc0-f8e7-4bb8-8031-77a0b155b524-37-9d45891
d-6421-4726-bce9-484a24f02bcf=Assignment(partitions=[egeria.omag.openmetadata.repositoryservices.cohort.cocoCohort.OMRSTopic.registration-0])}
2022-11-29 16:17:50.495  WARN 39183 --- [ool-80-thread-1] o.a.k.c.c.internals.ConsumerCoordinator  : [Consumer clientId=consumer-40605fed-50e5-4a98-a924-af9dbbf
69530-79, groupId=40605fed-50e5-4a98-a924-af9dbbf69530] Asynchronous auto-commit of offsets {egeria.omag.server.cocoMDS6.omas.subjectarea.outTopic-0=OffsetAndMe
tadata{offset=0, leaderEpoch=null, metadata=''}} failed: Commit cannot be completed since the group has already rebalanced and assigned the partitions to anothe
r member. This means that the time between subsequent calls to poll() was longer than the configured max.poll.interval.ms, which typically implies that the poll
 loop is spending too much time message processing. You can address this either by increasing max.poll.interval.ms or by reducing the maximum size of batches re
turned in poll() with max.poll.records.
Tue Nov 29 16:17:50 GMT 2022 cocoMDS6 Information OCF-KAFKA-TOPIC-CONNECTOR-0018 The Egeria client was rebalanced by Kafka and failed to commit already consumed
 events

This needs investigation as we may a) be failing to commit messages that are read - leading to processing twice b) we may have performance issues in the topic connector that could cause other processing issues

Expected Behavior

not to see rebalancing issues

Steps To Reproduce

No response

Environment

- Egeria:
- OS:
- Java:
- Browser (for UI issues):
- Additional connectors and integration:

Any Further Information?

No response

github-actions[bot] commented 1 year ago

This issue has been automatically marked as stale because it has not had recent activity. It will be closed in 20 days if no further activity occurs. Thank you for your contributions.

github-actions[bot] commented 1 year ago

This issue has been automatically marked as stale because it has not had recent activity. It will be closed in 20 days if no further activity occurs. Thank you for your contributions.