kelektiv / node-cron

Cron for NodeJS.
MIT License
8.3k stars 617 forks source link

Cron job stops after random hours/days #805

Open ansnoussi opened 7 months ago

ansnoussi commented 7 months ago

Description

I have a node app running on a machine with 512mb of memory and 256vCPU on AWS, and I have a cron job (actually using Nestjs which in turn uses this package) that runs every second using node-cron. After few days (doesn't always happen), the cron just stops working with no error or log. In fact there was an identical issue reported before (https://github.com/kelektiv/node-cron/issues/232) and was closed despite people still reporting the same behaviour.

Expected Behavior

cron job should not stop working, or at the very least throw an error ?

Actual Behavior

Possible Fix

No response

Steps to Reproduce

Context

-

Your Environment

intcreator commented 7 months ago

is the Node process still running when the cron job stops? is it only the cron part that fails?

ansnoussi commented 7 months ago

yes the Node process keeps running. In fact I have other crons defined in the same process, and they keep on working.

sheerlox commented 7 months ago

I have a NestJS codebase running in production for more than 8 months (restarted once in a while for deployments) that runs a lot of cronjobs for handling financial transactions, one of them running every second, and I've never encountered this issue... However, I do have more memory on my server, 2GB I believe, so maybe that's the issue. Have you ever encountered a situation when your container exceeds its maximum memory usage?

Could you try updating cron to 2.4.4 to see if the issue still happens? You can force NestJS to use it in this way:

"overrides": {
  "@nestjs/schedule": {
    "cron": "2.4.4"
  }
},

This kind of issue is typically difficult to pinpoint without detailed information from those who have experienced it. Unfortunately, reports of this nature are quite rare and often lack critical details. If you can provide any detail or hint that would help us reproduce the issue, we would appreciate it and do our best to find a solution.

ansnoussi commented 7 months ago

I understand how difficult it is to reproduce these kind of bugs, I can't even reproduce it myself and it happened twice so far. I just noticed that the previous issue (https://github.com/kelektiv/node-cron/issues/232) was closed while some people are still reporting it, so I opened this issue just to keep track.

For details, the container with the cron jobs is running usually at 70% memory and <5% cpu, but there was no peak in usage when the bug occurred.

Thank you @sheerlox for the suggestion, but I think that blindly upgrading cron without identifying the root cause will give a false feeling that bug was fixed (exactly what happened in the previously closed Github issue).

Again, I'm not even sure how to debug this bug, so I'm available to give more details if possible.

sheerlox commented 7 months ago

I gave a good look to #232, and it actually contains a lot of information! From what I understand it looks like the issue is with line 191 from job.js. I've created a patch you can deploy easily to confirm that's indeed the issue and provide more information.

Please add this file to your project, then follow the set-up instructions from patch-package:

./patches/cron+2.4.1.patch

diff --git a/node_modules/cron/lib/job.js b/node_modules/cron/lib/job.js
index a3b12b5..9d89479 100644
--- a/node_modules/cron/lib/job.js
+++ b/node_modules/cron/lib/job.js
@@ -135,6 +135,8 @@ function CronJob(CronTime, spawn) {
        // The callback wrapper checks if it needs to sleep another period or not
        // and does the real callback logic when it's time.
        function callbackWrapper() {
+           console.debug(`[CRON DEBUG] BEG CALLBACK: ${performance.now()}`);
+
            const diff = startTime + timeout - Date.now();

            if (diff > 0) {
@@ -188,7 +190,13 @@ function CronJob(CronTime, spawn) {

            _setTimeout(timeout);
        } else {
+           console.debug(`[CRON DEBUG] END CALLBACK: ${performance.now()}`);
            this.stop();
+           console.debug(
+               `[CRON DEBUG] JOB ABORTED: ${
+                   self.cronTime.source
+               } - ${luxon.DateTime.local().toISO()}`
+           );
        }
    };

Once the issue manifests itself again, you should be able to extract the interesting output with the following command: cat {YOUR_LOGFILE} | grep "[CRON DEBUG]" | grep -B 15 "JOB ABORTED"

15 lines above should give us a pretty good picture of the previous runs of the callbackWrapper function.

If you can get us this debug information I'll make sure to free up some time to dig deeper into the issue and find a fix.

Thank you for re-opening an issue about this to let us know that still happens.

ansnoussi commented 7 months ago

Thanks for the update, will make sure to add this and get back to you if the issue manifests itself again πŸ™.

ansnoussi commented 7 months ago

A little update :

sheerlox commented 7 months ago

That makes sense, I didn't consider that when writing the patch!

Knowing the exact time the callback was started would be really useful, so instead of my first version can I suggest saving performance.now()'s result in a variable and logging it when the job gets killed in the else block?

ansnoussi commented 7 months ago

Sounds good to me, will add it πŸ‘Œ.

Update patch :

diff --git a/node_modules/cron/lib/job.js b/node_modules/cron/lib/job.js
index a3b12b5..833640e 100644
--- a/node_modules/cron/lib/job.js
+++ b/node_modules/cron/lib/job.js
@@ -135,6 +135,8 @@ function CronJob(CronTime, spawn) {
        // The callback wrapper checks if it needs to sleep another period or not
        // and does the real callback logic when it's time.
        function callbackWrapper() {
+           let callbackStartedAt = performance.now()
+
            const diff = startTime + timeout - Date.now();

            if (diff > 0) {
@@ -188,8 +190,14 @@ function CronJob(CronTime, spawn) {

            _setTimeout(timeout);
        } else {
+           console.debug(`[CRON DEBUG] END CALLBACK: ${performance.now()} WHICH STARTED: ${callbackStartedAt}`);
            this.stop();
-       }
+           console.debug(
+               `[CRON DEBUG] JOB ABORTED: ${
+               self.cronTime.source
+               } - ${luxon.DateTime.local().toISO()}`
+               );
+                        }
    };

    CJ.prototype.start = start;
jrichardsz commented 6 months ago

Same issue in the 3.0.3 version.

I have something like this:

1. const cron = require('node-cron');
2. 
3. function PollingStrategy(){
4.     this.start = async(..) => {
5.         cron.schedule("*/30 * * * * *", async () => {
6.         -----> Doing whatever I need
7.         } )
8.     }
9. }
10. 
11. module.exports = PollingStrategy;

And my test is

it("Polling should work", async function () {
    var pollingStrategy = new PollingStrategy();
    pullingStrategy.start(...);
});

The issue is that when I run all my unit tests, randomly the line #6 is not executed :( . Approximately fails once every 3 executions.

I tried stopping the task after each uni test but not works

I will continue the research.

Similar issues:

sheerlox commented 6 months ago

hey @jrichardsz, this is the GitHub repository for cron, but it looks like you're using node-cron. The repository you're looking for is node-cron/node-cron :wink:

MiikeVi commented 1 month ago

Just stopping by to say that it’s still happening in version 3.1.7, using node 14.19.3 (node:14-alpine). I had to switch to node-cron

sheerlox commented 1 month ago

Just stopping by to say that it’s still happening in version 3.1.7, using node 14.19.3 (node:14-alpine). I had to switch to node-cron

As I previously said, that issue is hard to reproduce and we're still looking for help from someone encountering it.

I have critical production code running this library for more than a year and I've never been able to observe that issue, so we're welcoming anyone who would like to help us debug the issue with more information. Please find the patch to apply for the necessary debug information here: https://github.com/kelektiv/node-cron/issues/805#issuecomment-1833783287.

Regarding the transition to node-cron, it's known for having lots of bugs and being unmaintained (look at the issues list on their GitHub). If you really have to switch libraries, I'd recommend using croner.

P.S. moreover that issue also seems to exist in node-cron.