Closed josepowera closed 2 years ago
Ok, i think I might have found a solution.... Could this two variables be exposed to config..
kafka_franz.go ` kgo.FetchMaxBytes(1 << 27), //134 MB kgo.BrokerMaxReadBytes(1 << 27), //134 MB
`
from franz-go config: // FetchMaxBytes sets the maximum amount of bytes a broker will try to send // during a fetch, overriding the default 50MiB. Note that brokers may not obey // this limit if it has records larger than this limit. Also note that this // client sends a fetch to each broker concurrently, meaning the client will // buffer up to <brokers * max bytes> worth of memory. // // This corresponds to the Java fetch.max.bytes setting.
/ BrokerMaxReadBytes sets the maximum response size that can be read from // Kafka, overriding the default 100MiB. // // This is a safety measure to avoid OOMing on invalid responses. This is // slightly double FetchMaxBytes; if bumping that, consider bump this. No other // response should run the risk of hitting this limit.
Using clickhouse_sinker 2.2 / kafka 2.8/ new kafka driver with sample record coming from kafka in CSV format (approx 63bytes/record):
2,0,0,0,11,0,0,1638280052738,1,,0,1462500355397980160,0,0,,0,,,
We have millions of records in kafka to process. However when running this in clickhouse_sinker we need 2.3 GB in order to process this records.
It's not that this process is leaking memory, we just want to cut down memory usage (we have cold-start problem) with memory when we must restart server with 23 standalone tasks similar to this.
Is there any special setting that could help in this situation, like setting some queue to lower value or similar (I believe record above is small and also our setting "bufferSize": 10000 and "flushInterval": 5 are not excesive). We have a problem only when we have milions of records in the kafka backlog, and it looks like that clickhouse_sinker is fetching too much/too fast from kafka.