nikepan / clickhouse-bulk

Collects many small inserts to ClickHouse and send in big inserts
Apache License 2.0
474 stars 86 forks source link

App stops sending data to Clickhouse after receiving error from it #65

Closed denisov-vlad closed 1 year ago

denisov-vlad commented 1 year ago

The last lines in log file:

2023/04/21 08:29:37.756987 ERROR: server down (502): Post "CH_URL": context deadline exceeded (Client.Timeout exceeded while awaiting headers)
2023/04/21 08:29:37.756995 ERROR: Send (503) No working clickhouse servers; response

After that, data isn't sent to the CH server and the pod's RAM has increased over time.

I've checked app status and it looks ok:

/app # curl -s http://127.0.0.1:8124/metrics | grep "^ch_"
ch_bad_servers 0
ch_dump_count 14763
ch_good_servers 1
ch_queued_dumps 14743
ch_received_count 5.4660103e+07
ch_sent_count 2.154301e+06
JohnDoeDC commented 1 year ago

Hey, we had same issue with clickhouse-bulk and all over unstable work but we did increased config parameters and problems are gone:

{
  "listen": ":8124",
  "flush_count": 10000,
  "flush_interval": 3000,
  "clean_interval": 0,
  "remove_query_id": true,
  "dump_check_interval": 300,
  "debug": false,
  "dump_dir": "dumps",
  "clickhouse": {
    "down_timeout": 120,
    "connect_timeout": 30,
    "tls_server_name": "",
    "insecure_tls_skip_verify": false,
    "servers": [
      "http://*******:8123"
    ]
  },
  "use_tls": false,
  "tls_cert_file": "",
  "tls_key_file": ""
}

So my advice to increase the flush_interval, down_timeout and connect_timeout and observe.

denisov-vlad commented 1 year ago

@JohnDoeDC Hi! I've thought about that solution. Maybe such a big timeout would hide the real problem?

JohnDoeDC commented 1 year ago

@denisov-vlad Sorry for long response, I believe connect timeout and down timeout doesnt make any impact but flush count and flush interval does. When we had default settings there was thread starvation from clickhouse-server, about 16k threads. While now with quieres like every 1-2 seconds there is only a few threads (may be 1-2)

ps -T -p 3734613 | grep HTTPHandler | wc -l
3

3734613 is a clickhouse-server pid

denisov-vlad commented 1 year ago

Increasing flush_count / flush_interval helped. Thanks!