kuzzleio / kuzzle

Open-source Back-end, self-hostable & ready to use - Real-time, storage, advanced search - Web, Apps, Mobile, IoT -
https://kuzzle.io
Apache License 2.0
1.43k stars 123 forks source link

Scheduler module #1833

Closed Aschen closed 2 years ago

Aschen commented 3 years ago

Scheduler

The scheduler module is intended to execute tasks at fixed interval. It will act like a crontab.

Tasks will be standard API requests.

The syntax to define the execution interval will be the cron syntax (and use node-cron for that) but we may add other syntax in the future

New API actions

This module will be configurable through the new schedule controller with the following actions

New internal collections

Two new internal collections will be created: tasks and task-statuses.

tasks

{
  name: 'checkCounters', // will be used as document ID
  description: 'Met à jour les compteurs dans ES depuis Redis',
  schedule: {
    syntax: 'cron',
    humanized: 'every 5 mins', // not stored, calculated for schedule:validate, schedule:add and schedule:list
    value: '/5 * * * *',
  },
  request: {
    controller: 'foobar/cron',
    action: 'checkCounters',
  },
  lastStatus: { // last execution status (copy of the last one in `task-statuses`, users cannot modify this)
    name: 'checkCounters',
    executedAt: 1603197425240,
    node: 'evasive-einstein-62763',
    error: { /* Kuzzle API error or null */ },
    result: { /* Kuzzle API result*/ },
  },
  nextExecution: '2020-10-20T12:42:00', // not stored, calculated for schedule:validate, schedule:add and schedule:list
}

An alternative syntax to schedule the cron is the timestamp syntax allowing to run a task only once at a specific date.
This syntax is mainly intended to be used in the functional tests.

// "timestamp" syntax
{
  syntax: 'timestamp',
  value: 1603197425240,
}

// "cron" syntax
{
  syntax: 'cron',
  value: '/5 * * * *',
}

task-statuses

{
  name: 'checkCounters',
  executedAt: 1603197425240,
  node: 'evasive-einstein-62763',
  error: {},
  result: {},
}

Task execution

Instead of using a simple timer, we will use a system like for the TokenManager to setup a timer only for the next task execution.

The task execution will use the Mutex class to avoid race-conditions in a cluster environment.

Nodes can have different clock time and it can be an issue when it comes to execute the next task.
To avoid this issue, we will store a counter in Redis for each task.
When a node starts and loads the tasks from ES, it will also load the counter from Redis and keep a copy in memory.
When it comes to execute the task, the node will try to acquire the task resource with a mutex and, if successful, it checks that the counter value in Redis matches with its local one. If it does then the task is executed, otherwise this means that another node did execute it beforehand and the task is discarded. If the node executes the task, it increments the redis counter.

Aschen commented 2 years ago

This is part of the Enterprise product