snowflakedb / snowflake-kafka-connector

Snowflake Kafka Connector (Sink Connector)
Apache License 2.0
138 stars 97 forks source link

Streaming Channel Offset Migration (transient exception) #770

Closed lsimac closed 3 months ago

lsimac commented 9 months ago

Hi, a few weeks back we reported that in version v2.1.2 we are receiving error com.snowflake.kafka.connector.internal.SnowflakeKafkaConnectorException: [SF_KAFKA_CONNECTOR] Exception: Failure in Streaming Channel Offset Migration Response Error Code: 5023.

Link to GitHub Issue Comment

Full Error Log ``` com.snowflake.kafka.connector.internal.SnowflakeKafkaConnectorException: [SF_KAFKA_CONNECTOR] Exception: Failure in Streaming Channel Offset Migration Response Error Code: 5023 Detail: Streaming Channel Offset Migration from Source to Destination Channel has no/invalid response, please contact Snowflake Support Message: Snowflake experienced a transient exception, please retry the migration request. \tat com.snowflake.kafka.connector.internal.SnowflakeErrors.getException(SnowflakeErrors.java:367) \tat com.snowflake.kafka.connector.internal.SnowflakeConnectionServiceV1.migrateStreamingChannelOffsetToken(SnowflakeConnectionServiceV1.java:1062) \tat com.snowflake.kafka.connector.internal.streaming.TopicPartitionChannel.(TopicPartitionChannel.java:287) \tat com.snowflake.kafka.connector.internal.streaming.SnowflakeSinkServiceV2.createStreamingChannelForTopicPartition(SnowflakeSinkServiceV2.java:254) \tat com.snowflake.kafka.connector.internal.streaming.SnowflakeSinkServiceV2.lambda$startPartitions$1(SnowflakeSinkServiceV2.java:222) \tat java.base/java.lang.Iterable.forEach(Iterable.java:75) \tat com.snowflake.kafka.connector.internal.streaming.SnowflakeSinkServiceV2.startPartitions(SnowflakeSinkServiceV2.java:217) \tat com.snowflake.kafka.connector.SnowflakeSinkTask.open(SnowflakeSinkTask.java:259) \tat org.apache.kafka.connect.runtime.WorkerSinkTask.openPartitions(WorkerSinkTask.java:644) \tat org.apache.kafka.connect.runtime.WorkerSinkTask.access$1200(WorkerSinkTask.java:73) \tat org.apache.kafka.connect.runtime.WorkerSinkTask$HandleRebalance.onPartitionsAssigned(WorkerSinkTask.java:741) \tat org.apache.kafka.clients.consumer.internals.ConsumerCoordinator.invokePartitionsAssigned(ConsumerCoordinator.java:322) \tat org.apache.kafka.clients.consumer.internals.ConsumerCoordinator.onJoinComplete(ConsumerCoordinator.java:471) \tat org.apache.kafka.clients.consumer.internals.AbstractCoordinator.joinGroupIfNeeded(AbstractCoordinator.java:474) \tat org.apache.kafka.clients.consumer.internals.AbstractCoordinator.ensureActiveGroup(AbstractCoordinator.java:385) \tat org.apache.kafka.clients.consumer.internals.ConsumerCoordinator.poll(ConsumerCoordinator.java:557) \tat org.apache.kafka.clients.consumer.KafkaConsumer.updateAssignmentMetadataIfNeeded(KafkaConsumer.java:1272) \tat org.apache.kafka.clients.consumer.KafkaConsumer.poll(KafkaConsumer.java:1236) \tat org.apache.kafka.clients.consumer.KafkaConsumer.poll(KafkaConsumer.java:1216) \tat org.apache.kafka.connect.runtime.WorkerSinkTask.pollConsumer(WorkerSinkTask.java:479) \tat org.apache.kafka.connect.runtime.WorkerSinkTask.poll(WorkerSinkTask.java:331) \tat org.apache.kafka.connect.runtime.WorkerSinkTask.iteration(WorkerSinkTask.java:237) \tat org.apache.kafka.connect.runtime.WorkerSinkTask.execute(WorkerSinkTask.java:206) \tat org.apache.kafka.connect.runtime.WorkerTask.doRun(WorkerTask.java:202) \tat org.apache.kafka.connect.runtime.WorkerTask.run(WorkerTask.java:257) \tat org.apache.kafka.connect.runtime.isolation.Plugins.lambda$withClassLoader$1(Plugins.java:181) \tat java.base/java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:515) \tat java.base/java.util.concurrent.FutureTask.run(FutureTask.java:264) \tat java.base/java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1128) \tat java.base/java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:628) \tat java.base/java.lang.Thread.run(Thread.java:829) ```

I am opening a new issue because, even though the error is transient (and not connected to the JDK version), and it is fixed once the task is restarted, we would prefer the issue to be resolved (by introducing a retry mechanism or something similar). Since we are running a large number of connectors, and this issue occurs on average 2-3 times a week, we believe addressing it would be beneficial. We have observed that the issue happens during connector startup and it can occur for the same connector multiple times. If you need more information, we will gladly provide it.

sfc-gh-japatel commented 9 months ago

Hi there, I am happy to look into it. please provide timestamps of when the issue occurred. deployment/account if possible. Stay tuned for one issue I found on server side which should be released shortly. But I want to make sure this is not a different issue.

Also, this issue is only going to happen during restart, but happy to implement a retry mechanism as well in upcoming release of KC.

Thanks for your help here!

lsimac commented 9 months ago

Thank you @sfc-gh-japatel for the quick replay, I created a Snowflake Support case because account information needs to be shared. Number of Support Case: 00668995 Title of the support case: Streaming Channel Offset Migration (transient exception)

sfc-gh-achyzy commented 3 months ago

Hi @lsimac. Can you confirm that your issue has been resolved? I would like to clean up the issue, as it seems it has.

sfc-gh-gjachimko commented 3 months ago

closing due to inactivity.