Periodically clean settled jobs and expired sessions (scheduled tasks)

christopheblin commented 2 years ago

Is your feature request related to a problem? Please describe.

The database is "unexpectedly" big due to the size of job_record and session tables

These 2 tables contain mostly stale data (old jobs and expired sessions) and nothing is cleaning them periodically

Describe the solution you'd like

For job_records, it seems that periodically calling removeSettledJobs internally would do the trick

For sessions, something similar should be done

See https://vendure-ecommerce.slack.com/archives/CKYMF0ZTJ/p1644844460660589?thread_ts=1644830594.467479&cid=CKYMF0ZTJ

Describe alternatives you've considered

For jobs, calling removeSettledJobs externally and config.jobQueueOptions.enableWorkerHealthCheck = false to reduce the number of jobs

For sessions, a manual db query

Additional context

For sessions, it seems there is an additional bug for anonymous sessions because they are always created for a duration of 1y no matter the setting config.authOptions.sessionDuration

see https://github.com/vendure-ecommerce/vendure/blob/6bc01aae230646118fac3c103793896b326d68c6/packages/core/src/service/services/session.service.ts#L108-L110

michaelbromley commented 2 years ago

I've been thinking about what the best way would be to implement this. Here's my initial thoughts:

Let's generalize this into the concept of "scheduled tasks". This means there should be a generic mechanism in Vendure for defining some arbitrary code to run on a given schedule - most likely repeating at a given interval or at a certain time of day like a cron job.
We will then ship a set of default scheduled tasks which include cleaning old jobs, removing stale sessions, and cleaning up abandoned orders.
The developer is then free to swap these out or use only some of them etc, as well as adding her own.
We'd probably implement this as a new property like scheduledTasks: ScheduledTask[] so that plugins can add their own tasks as part of the configure metadata function.

OK, so that's the easy part. Now the question is how to actually implement this.

Key challenge: We need to keep in mind the fact that Vendure might be running multiple servers & workers, e.g. a Kubernetes setup. So we can't just naively install node-cron and start a job on the server or worker process. That will lead to the job being run multiple times, once for each running instance.

This suggests that we need some external store to coordinate the running of tasks. For example, a new table in the database like:

task	lastRun	lastRunProcessId
clean-job-queue	2022-02-24T00:00:00Z	1234

Then when the cron mechanism triggers on a given process, we first look up this table to see whether this particular task has run within some time delta (e.g. 5 seconds), and if not then we trigger the task.

martijnvdbrug commented 2 years ago

I like this alot! Right now, every scheduled job I have needs to be configured outside my codebase. I want to stay in my IDE env! :wink: (or store it in DB)

An aspect to keep in mind is that alot of cloud services don't allow any background processes outside a request context. Everything needs to happen between incoming request and res.send()

Any action needs to be invoked by an HTTP request from outside the application.

This goes for

Google Cloud Run
Google AppEngine
Google Cloud Functions
Probably most other serverless environments

Maybe we can call an endpoint on Vendure every X minutes, and Vendure handles any scheduled job internally? Preferably with X being configurable, as these serverless env's bill per usage.

xrev87 commented 2 years ago

Haha I actually want the opposite of @martijnvdbrug , I'm working on implementing a multi-vendor marketplace and I have various vendors who will be interacting with plugins that have jobs in them and those jobs need to be run at a vendor defined schedule. So for my use case it would be ideal if not only were we storing a log of running tasks, but we were also storing the schedule itself. Currently having to externalize this to an external service that invokes these jobs and is controlled via a separate plugin.

martijnvdbrug commented 2 years ago

@xrev87 Schedules/crons in DB are also fine! I just meant not in any external platforms like I do now with Google Cloud Scheduler :smile:

michaelbromley commented 2 years ago

@martijnvdbrug this is a great point. So the triggering mechanism must also be configurable somehow. So we can trigger either by an internal cron job, or trigger via a HTTP call to an endpoint. How would you control the HTTP call? Does Google Cloud have some built-in way of doing that? Or would you need to use some other system to make the calls?

@xrev87 Can you go into more detail on this part?

So for my use case it would be ideal if not only were we storing a log of running tasks, but we were also storing the schedule itself. Currently having to externalize this to an external service that invokes these jobs and is controlled via a separate plugin.

What does your current setup look like? And what would your ideal solution look like?

skid commented 2 years ago

Like @martijnvdbrug, we are also running a setup on Google cloud and i have a few comments.

First, tasks have to be invoked from an external source since Cloud Run will kill any process that doesn't directly serve a request. This is why, for example, we don't use a worker process - we only have the main app which handles worker tasks and let cloud run handle the scaling when needed. This has proven to work quite well so far.

We use @martijnvdbrug's google cloud tasks plugin. A side note here is that google cloud tasks are great since I can pause a queue, retry individual tasks or even purge a queue, which has been invaluable when importing large data and dealing with 3rd party integrations (dearsystems).

Finally, we also use google cloud scheduler for cron jobs. It just pings a REST endpoint to trigger a task with an optional payload. For now we define scheduled tasks in google cloud directly, but there is also a REST API for managing them: https://cloud.google.com/scheduler/docs/reference/rest

When designing a scheduler, I would suggest that we make it support cloud native services. This can be done by passing some basic async functions for creating/deleting/listing scheduled tasks, as well as a configurable REST route that will trigger a task execution. Using this interface we can easily develop a plugin for google cloud scheduler and other cloud services.

Finally, this is a great addition to vendure - indeed needed.

martijnvdbrug commented 2 years ago

@martijnvdbrug this is a great point. So the triggering mechanism must also be configurable somehow. So we can trigger either by an internal cron job, or trigger via a HTTP call to an endpoint. How would you control the HTTP call? Does Google Cloud have some built-in way of doing that? Or would you need to use some other system to make the calls?

@michaelbromley as @skid said, Google has Google Scheduler, which is basically an external CRON service with a UI over it that triggers HTTP endpoints.

2 possible solutions I can think of

Scheduler calls <vendure-server>/some-ticker-endpoint and Vendure decides which job has to run based on some schedule config.
Make Vendure-scheduling 'plugin-able', so that we can write a plugin that creates jobs in Google Scheduler.

I would prefer option 1, as it's more flexible: you are not dependent on how jobs can be configured in Google's platform.

These are just initial ideas though!

xrev87 commented 2 years ago

@martijnvdbrug this is a great point. So the triggering mechanism must also be configurable somehow. So we can trigger either by an internal cron job, or trigger via a HTTP call to an endpoint. How would you control the HTTP call? Does Google Cloud have some built-in way of doing that? Or would you need to use some other system to make the calls?

@xrev87 Can you go into more detail on this part?

So for my use case it would be ideal if not only were we storing a log of running tasks, but we were also storing the schedule itself. Currently having to externalize this to an external service that invokes these jobs and is controlled via a separate plugin.

What does your current setup look like? And what would your ideal solution look like?

I'm running a multi-vendor marketplace where we have various EDI solutions in place that are facilitated by vendor specific plugins. Due to the nature of these plugins there is a requirement for data ingress/egress at different cadences.

At the moment these plugins do this through jobs, but with a lack of scheduling inside Vendure, we have a scheduler plugin which is simply a thin wrapper around our orchestrator which is essentially kicking these jobs off via mutation.

Due to the fact we need vendors to be able to set the schedule for a particular job, it's not sufficient to be able to globally set the schedule for a queue or particular type of job.

michaelbromley commented 2 years ago

Thank you all for the feedback on this. I think that, to do this right, it is a bit of a bigger task than I first imagined. From all of your feedback it looks like there are different mechanisms that could be used to actually trigger the scheduled task:

A plain old cron job that runs a program or command
A platform-specific task scheduler like Google Scheduler that hits an HTTP endpoint
Some other kind of external orchestrator

Additionally, the solution needs to work correctly in multi-instance deployments.

Because of these different use-cases, the design needs to elegantly handle all of these. Let's reach for our friend the Strategy pattern once again.

Design

So here's a very rough, high-level sketch of a possible implementation:

First some types:

// This represents what a task actually does, e.g. hit a HTTP endpoint or run a command
// depending on the strategy
type TaskAction = string;

interface Task {
  name: string;
  action: TaskAction;
}

interface ScheduledTask {
  id: string;
  description: string;
  schedule: Schedule; // ?? not sure what this would look like
  task: Task;
}

Then the strategy:

interface CreateTaskInput {
  description: string;
  task: Task;
  schedule: Schedule;
}

export interface TaskSchedulerStrategy extends InjectableStrategy {
  getTasks(): Promise<ScheduledTask[]>;
  createTask(input: CreateTaskInput): Promise<ScheduledTask| undefined>;
  updateTask(input: UpdateTaskInput): Promise<ScheduledTask>;
  deleteTask(id: ID): Promise<DeletionResult>;
}

Then in the VendureConfig we would add a new config object:

const config = {
  // ...
  taskSchedulerOptions: {
    taskSchedulerStrategy: TaskSchedulerStrategy;
    tasks: Task[];
  }
}

Admin UI

In the Admin UI we'd add a new page in the "system" section which allows the admin to perform CRUD on tasks.

Feedback welcome!

skid commented 2 years ago

The schedule can look like a normal crontab string. I reckon there are robust libraries to parse it already and everyone's familiar with it. I would also add an optional params string to be used by the strategy implementation. For example some HTTP task schedulers can POST data to their endpoint.

marcus-friedrich commented 2 years ago

Did you have a look at the nest-schedule module if this could do all the scheduling? https://www.npmjs.com/package/nest-schedule

michaelbromley commented 2 years ago

@marcus-friedrich Nest's Schedule module is OK for situations where you are certain you're not going to run multiple instances. But as soon as you run multiple instances (e.g. k8s or serverless scenarios) you will run into double-scheduling of tasks.

That's why we're look at a more high-level abstraction of scheduling, of which a simple cron-like in-process implementation could be one version.

michaelbromley commented 1 year ago

From a Slack thread:

How do you make sure to only run the cronjob once when using a multiple server instances?

We do that with locks via redis

We have this in a service

    public runWithLock(lockKey: string, runFunction: () => void, lockTimeMinutes = 15) {
        return new Promise<boolean>(async (resolve, reject) => {
            const client = await this.configService.getRedisClient()

            if (!client) {
                reject("Couldn't get redis client")
                return
            }

            try {
                const res = await client.setNX(lockKey, 'a')
                if (res) {
                    await client.expire(lockKey, 60 * lockTimeMinutes) // lock for 15 minutes by default

                    try {
                        await runFunction()
                    } catch (e) {
                        this.logger.error(`Error running function.`, undefined, e)
                        this.logger.debug(`Delete lock key ${lockKey}`)
                        client.del(lockKey).then(() => {
                            reject(e)
                        })
                    }
                    this.logger.debug(`Delete lock key ${lockKey}`)
                    // keep the lock a little to prevent another start by a late worker
                    await sleep(2000)
                    client.del(lockKey).then(() => {
                        resolve(true)
                    })
                } else {
                    this.logger.debug(`Locked key ${lockKey}. May run somewhere else.`)
                    resolve(true)
                }
            } catch (err) {
                if (err) {
                    this.logger.error(`Error getting lockKey ${lockKey}.`, undefined, err)
                    reject(err)
                }
            }
        }).catch(err => {
            this.logger.error(`Something went wrong.`, undefined, err)
            this.logger.error(err)
        })
    }

vendure-ecommerce / vendure

Periodically clean settled jobs and expired sessions (scheduled tasks) #1425

Design

Admin UI