I'm trying to sink some pretty large topics from Kafka (5 topics with about 250 million events each) into BigQuery via a separate (rather large - 8CPU, 32Gb RAM X3) Kafka Connect cluster. It starts up fine but after about 2 minutes, the connect instance CPUs are pegged at 100%, and the nodes start disconnecting - ultimately the whole process restarts with little progress on getting any data into BigQuery.
I tried that configuration in a replica of our environment with many less events (500,000) and it works fine.
Are there any configurations that can throttle the processing of events to keep the CPU in check? I tried tuning queueSize and threadPoolSize, as well as max.queue.size and max.batch.size to no avail.
I'm trying to sink some pretty large topics from Kafka (5 topics with about 250 million events each) into BigQuery via a separate (rather large - 8CPU, 32Gb RAM X3) Kafka Connect cluster. It starts up fine but after about 2 minutes, the connect instance CPUs are pegged at 100%, and the nodes start disconnecting - ultimately the whole process restarts with little progress on getting any data into BigQuery.
I tried that configuration in a replica of our environment with many less events (500,000) and it works fine.
Are there any configurations that can throttle the processing of events to keep the CPU in check? I tried tuning
queueSize
andthreadPoolSize
, as well as max.queue.size and max.batch.size to no avail.Any hint/help would be very much appreciated!
Here's our config for reference: