Open nikakaltura opened 3 years ago
I was able to temporarily fix the issue by increasing maxRetryTime to higher number.
The second error you mentioned is interesting. Could you set up a listener for the CRASH
event and log out not just the error message, but also error.stack
? The error is being thrown from here, probably because it's being called with null or undefined, but this method is called from a lot of places, so it would be interesting to understand from where:
If you have the ability to reproduce this locally, it would also be super helpful to have debug output including protocol buffers from when this happens. Running with KAFKAJS_LOG_LEVEL=debug KAFKAJS_DEBUG_PROTOCOL_BUFFERS=1
would output this, but also increase logging volume by a tonne, so probably you don't want to do this in production.
Note that this would contain any data sent between your client and the broker, so if you can only reproduce this with real customer data, this might not be an option. If you can trigger it with test data, that would be 💯
Hi @Nevon we are also facing the same issue. I have pasted the error stack below. Can you please help us with this?
Crash: Error: value should be an instance of Encoder
Error: value should be an instance of Encoder\n at Encoder.writeEncoder (/usr/src/app/node_modules/kafkajs/src/protocol/encoder.js:211:13 undefined)\n at /usr/src/app/node_modules/kafkajs/src/protocol/encoder.js:281:18\n at Array.forEach ()\n at Encoder.writeArray (/usr/src/app/node_modules/kafkajs/src/protocol/encoder.js:272:13 undefined)\n at /usr/src/app/node_modules/kafkajs/src/consumer/assignerProtocol.js:51:44\n at Array.map ()\n at Object.encode (/usr/src/app/node_modules/kafkajs/src/consumer/assignerProtocol.js:50:33 undefined)\n at /usr/src/app/node_modules/@nestjs/microservices/helpers/kafka-reply-partition-assigner.js:108:78\n at Array.map ()\n at KafkaReplyPartitionAssigner.assign (/usr/src/app/node_modules/@nestjs/microservices/helpers/kafka-reply-partition-assigner.js:106:40 undefined)\n at ConsumerGroup.[private:ConsumerGroup:sync] (/usr/src/app/node_modules/kafkajs/src/consumer/consumerGroup.js:176:35 undefined)\n at processTicksAndRejections .processTicksAndRejections (internal/process/task_queues.js:93:5 undefined)\n at async /usr/src/app/node_modules/kafkajs/src/consumer/consumerGroup.js:291:9\n at async Runner.join (/usr/src/app/node_modules/kafkajs/src/consumer/runner.js:77:5 undefined)\n at async Runner.start (/usr/src/app/node_modules/kafkajs/src/consumer/runner.js💯7 undefined)\n at async start .async start (/usr/src/app/node_modules/kafkajs/src/consumer/index.js:236:7 undefined)
im also seeing this error in version: 1.16.0-beta.18,1.16.0-beta.22
@Kannndev: I don't think your issue is the same. Since you're using a custom partition assigner, my bet would be that it's returning something that's not a valid assignment, and you're simply running into validation that's telling you that your assignment isn't valid.
my mistake - by mistake, i send an object to "topic" instead of a string.
Describe the bug We are using Nest.js and kafka as message broker. Kafkajs worked great until now. We have multiple instances of api gateway, which is listening to the reply topic. Api gateway has 6 kafka clients, each of them listening to the their own topic. Module A client is listening to topic A only. Module B client is listening to topic B only...
When we are deploying api-gw, all 6 at the same time, it's naturally killing the old instances. So, 6 consumers leave and another 6 join at the same time. This is causing a lot of rebalances, which wasn't an issue until now. Right now, after all these rebalances kafka js is throwing 2 errors.
First: ERROR [Consumer] Crash: KafkaJSNumberOfRetriesExceeded: Specified group generation id is not valid
And after that: ERROR [Consumer] Crash: Error: value should be an instance of Encoder
After second error, all kafka clients die, all connections dropped, no retries are triggered and Api gateway just hangs there returning empty responses to the client
To Reproduce
If none of the above are possible to provide, please write down the exact steps to reproduce the behavior:
Expected behavior To connect to kafka after a rebalance
Environment: