Open rtyley opened 2 years ago
how about we add another step to the end of the process, which checks if there are further nodes to rotate (that meet the criteria) and kicks off another instance of itself (the step function) - that way each rotation has its own step function invocation - making debugging easier.
this proposed behaviour could be configurable via the input event, perhaps with an optional maxRotations
(defaults to one, if not present) then the schedule could be set to say 10 in your case (given 5 weekdays)
also, this seems to be a dupe of https://github.com/guardian/elasticsearch-node-rotation/issues/34
Our team currently has two big-ish Elasticsearch clusters - Elasticsearch 6 & Elasticsearch 7, and that ends up being a lot of ES nodes - about 50 nodes or so. Our current node rotation schedule is failing to keep up:
This is because:
"indices.recovery.max_bytes_per_sec" : "256mb"
set, migrating all the data off a node can take nearly an hour.RotationCronExpression
is currentlycron(10 4-10 ? * MON-FRI *)
(try hourly between 4am to 10am on weekdays)- but this offers only 35 rotations per week, which is not enough to keep up with 50 nodes.We can widen that rotation period, but precisely scaling cron schedules is quite fiddly (eg currently, we have to be very careful to make sure they are never more frequent than the slowest possible migration). It would be nice to have a better way to scale this...
Let's respect
ageThresholdInDays
- don't stop until all nodes are youngerThanks to https://github.com/guardian/elasticsearch-node-rotation/pull/68, we now have the
ageThresholdInDays
parameter (for Ophan, it is 7 days). At the moment it just means:How about if instead it meant:
...then, rather than scheduling the Step Function to run multiple times a day, we could just cron it to run once per day.
How can the ENR Step Function achieve that?
A few options:
ageThresholdInDays
condition has been achieved.ageThresholdInDays
!