philips-labs / terraform-aws-github-runner

Terraform module for scalable GitHub action runners on AWS
https://philips-labs.github.io/terraform-aws-github-runner/
MIT License
2.58k stars 619 forks source link

Runners are terminated due to incorrect time logic based on idle config cron expression. #1889

Closed shashidhar087 closed 2 years ago

shashidhar087 commented 2 years ago

We have defined Idle config to maintain runner count on weekday and weekend. We have below Idle config,

idle_config = [{
  cron      = "* * * * * 1-5" // Runs monday to friday cron parser format - https://www.npmjs.com/package/cron-parser     timezone reference - https://en.wikipedia.org/wiki/List_of_tz_database_time_zones
  timeZone  = "GMT"
  idleCount = 2
  },
  {
    cron      = "* * * * * 6-7" // runs Saturday and Sunday as per GMT timezone
    timeZone  = "GMT"
    idleCount = 1
}]

The function "https://github.com/philips-labs/terraform-aws-github-runner/blob/develop/modules/runners/lambdas/runners/src/scale-runners/scale-down-config.ts#L22" is returing the value '0' and causing the runners to terminate. This is leading to continuous launch and termination of runners if Pooling feature is enabled.

We dont have a solution to this issue right now. Sharing the issue which we faced over the weekend.

We tried to debug the code and below is the information.

`var parser = require("cron-parser");
var moment = require("moment");

var scalingDownConfigs = [
  { cron: "* * * * * 1-5", idleCount: 2, timeZone: "GMT" },
  { cron: "* * * * * 6-7", idleCount: 1, timeZone: "GMT" },
];

function inPeriod(period) {
  const now = moment(new Date());
  console.log("now-->", now);
  const expr = parser.parseExpression(period.cron, {
    tz: period.timeZone,
  });
  //console.log("expr-->", expr);
  const next = moment(expr.next().toDate());
  console.log("next-->", next);
  console.log("next.diff-->", next.diff(now, "seconds"));
  console.log(Math.abs(next.diff(now, "seconds")));
  return Math.abs(next.diff(now, "seconds")) < 5; // we keep a range of 5 seconds
}

for (const scalingDownConfig of scalingDownConfigs) {
  console.log("Input Config-->", scalingDownConfig);
  if (inPeriod(scalingDownConfig)) {
    console.log("scalingDownConfig.idleCount--->", scalingDownConfig.idleCount);
  }
}
console.log("Outside For loop-->output-->", 0);`
Output:

`Input Config--> { cron: '* * * * * 1-5', idleCount: 2, timeZone: 'GMT' }
now--> Moment<2022-03-26T20:59:40-04:00>
next--> Moment<2022-03-27T20:00:00-04:00>
next.diff--> 82819
82819

Input Config--> { cron: '* * * * * 6-7', idleCount: 1, timeZone: 'GMT' }
now--> Moment<2022-03-26T20:59:40-04:00>
next--> Moment<2022-04-01T20:00:00-04:00>
next.diff--> 514819
514819
Outside For loop-->output--> 0`
shashidhar087 commented 2 years ago

Scale down lambda logs.

We can observe in screenshot that the runner was kept idle (1) until 00:00 UTC hour and there after the runner was terminated and went into infinite loop of creation and termination of runners due to an issue with time calculation.

Screen Shot 2022-03-26 at 9 12 46 PM
shashidhar087 commented 2 years ago

More details to add, on friday and sunday, the const next = moment(expr.next().toDate()); would result the date to be next monday and next saturday respectively. The math condition is always false since the time difference in seconds will not be less than 5.

I have found a solution to this and will create a PR next week.

github-actions[bot] commented 2 years ago

This issue has been automatically marked as stale because it has not had activity in the last 30 days. It will be closed if no further activity occurs. Thank you for your contributions.