Closed like-inspur closed 4 years ago
until today I found 4090 start after all db data has done
time="2020-09-03 23:53:25" level=info msg="Processed DB data from influxdb01[prometheus|autogen] to influxdb02[prometheus|autogen] has done #Points (1413407956) Took [16h16m57.381408237s] ERRORS [0]!\n"
time="2020-09-03 23:53:25" level=info msg="Beginning Monitoring process process for influxdb influxdb02 | http://influxdb-1.influxdb-svc:8086/"
time="2020-09-03 23:53:25" level=info msg="Beginning Monitoring process process for influxdb influxdb01 | http://influxdb-0.influxdb-svc:8086/"
time="2020-09-03 23:53:25" level=info msg="InfluxMonitor: InfluxDB : influxdb02 OK (Version 1.8.0 : Duration 3.997799ms )"
time="2020-09-03 23:53:25" level=info msg="InfluxMonitor: InfluxDB : influxdb01 OK (Version 1.8.0 : Duration 5.704578ms )"
time="2020-09-03 23:53:35" level=info msg="Beginning Supervision process process each 20s "
time="2020-09-03 23:53:35" level=info msg="HACluster check...."
time="2020-09-03 23:53:35" level=info msg="Server is running on :0.0.0.0:4090..."
Hello @like-inspur , I can not understand where is the issue can you give us please more context ?
Can you told us how are you working with srelay/syncflux/influxdb ? ( how many srelay/syncflux did you play with your influxdb cluster/s ?
was your data in influxdb cluster ok synced before the configuration change?
did you put in hamonitor
mode initial-replication
parámeter something different to none? (
https://github.com/toni-moreno/syncflux/blob/master/conf/sample.syncflux.toml#L69-L81 ) if yes, why?
I hope to help you with this extended info.
1、I build influxdb cluster with two pods,each pod contains influxdb ,srelay and syncflux container. 2、yes, I use api/health to check influxdb cluster sync data ok, I just want to see once one restart how the cluster behave 3、I config initial-replication with both, becuase I think one influxdb lose some data, this can help recover data
Hello @like-inspur
As described here https://github.com/toni-moreno/syncflux/issues/38 in hamonitor mode the best way to work is assuming a already synced initial state, if doubts you can sync with a external syncflux process once.
I recommend initial-replication=none
when working in hamonitor mode
suppose that one influxdb was down for a time, when it start after this period, how can another influxdb sync the data for this period for this influxdb, and ensure that external call to influxdb cluster will not be scheduled to this influxdb which is recovering data
I suggest check the cluster state prior any restart , and wait if cluster is not fully ok.
You can do this by check both nodes http://node1:4090/api/health
and http://node2:4090/api/health
if I do that, then I won't be able to ensure if one influxdb down, another can still provide service. And when the dead one is up, it can recover data from normal one. When the dead one has entirely recoverd, it can be added to influxdb cluster to provide service
@like-inspur syncflux /influxdb-srelay has been designed to be running always while influxdb could ( accidentally or not ) be restarted. So I think the mistake is running 3 different services in the same pod all together. de you agree?
But syncflux and influxdb-sreay also have the possibility of death, so I put 3 container into the same pod for the convenience of cluster implementation, and config each self is master and another is slave.
Hello @like-inspur I suggest influxdb-srelay and influxdb in the same pod and syncflux as a independent pod , in this case the proposed tool will do its work well and you will be able to change db and relay config when you need.
I install influxdb-srealy and syncflux with influxdb on two hosts by statefulset, when I change config and pod restart, 4090 port of syncflux can't be opened and log of syncflux like below: root@mgt01:~# kubectl log influxdb-0 -n monitoring syncflux