Open kartikeya-pharasi opened 3 years ago
I believe this is expected behavior. Please see detailed documentation about adjustment
and gravity
in serf here:
https://www.serf.io/docs/internals/coordinates.html#additional-enhancements
- Another non-Euclidean "adjustment" term was added to help the system perform better with hosts that are near each other in terms of network round trip time.
- A "gravity" effect was added to gently pull the cluster's coordinates back into a system that's roughly centered around the origin. Without this, over long periods of time, the nodes might all drift which is undesirable for accuracy. For example, the components of the vectors could take on large values, and the default position of new nodes at the origin would be far outside the rest of the space.
Does it make sense that the “gravity effect” is what is causing the periodic “re-calibration” that you see in your telemetry graphs?
I believe this is expected behavior. Please see detailed documentation about
adjustment
andgravity
in serf here:https://www.serf.io/docs/internals/coordinates.html#additional-enhancements
- Another non-Euclidean "adjustment" term was added to help the system perform better with hosts that are near each other in terms of network round trip time.
- A "gravity" effect was added to gently pull the cluster's coordinates back into a system that's roughly centered around the origin. Without this, over long periods of time, the nodes might all drift which is undesirable for accuracy. For example, the components of the vectors could take on large values, and the default position of new nodes at the origin would be far outside the rest of the space.
Does it make sense that the “gravity effect” is what is causing the periodic “re-calibration” that you see in your telemetry graphs?
@mocofound Thanks for the context! I think this is what would cause this as well
@kartikeya-pharasi I'm curious if you're seeing any operational impact or delays from these oscillations?
Overview of the Issue
We recently discovered a strange behavior where a particular metric
consul_serf_coordinate_adjustment_ms
for our Consul Servers remains high for a number of days, goes back down and repeats. This metric represents how much consul is adjusting each time it updates: Github Link.Snapshots of the Graphs for different Consul servers
Is this the normal behavior? Any ideas why the update operation is spread out across multiple days?