jhuckaby / Cronicle

A simple, distributed task scheduler and runner with a web based UI.
http://cronicle.net
Other
3.92k stars 391 forks source link

Scheduled tasks are on the "Home" page as upcoming but never run. #613

Open Altersoundwork opened 1 year ago

Altersoundwork commented 1 year ago

Summary

I feel like I'm being daft and missing something but, tasks appear on the "Home" page as upcoming. When the time on them reaches 0, they dissapear as if they'd been done but they have not been done. No entry in "Completed" either. Manual execution works fine (and manual executions do appear in "Completed").

Steps to reproduce the problem

Create any task, set it to run every minute and wait (I say every minute for the sake of testing, setting any time/date has the same result when said time/date arrives).

Your Setup

EC2 instance with nothing else on it.

Operating system and version?

Ubuntu Server 22.04

Node.js version?

v12.22.9

Cronicle software version?

0.9.22

Are you using a multi-server setup, or just a single server?

Single server

Are you using the filesystem as back-end storage, or S3/Couchbase?

No

Can you reproduce the crash consistently?

Yes, no jobs run automatically.

Log Excerpts

Cronicle.log

[1685551541.278][2023-05-31 18:45:41][server-cronicle][744][Cronicle][debug][4][Scheduler catching events up to: 2023/05/31 18:45:00][] [1685551560.181][2023-05-31 18:46:00][server-cronicle][744][Cronicle][debug][4][Scheduler Minute Tick: Advancing time up to: 2023/05/31 18:46:00][] [1685551560.182][2023-05-31 18:46:00][server-cronicle][744][Cronicle][debug][9][Checking event queues][["elia0o1eu0r","eli9zryby0q","eli91nu3i10","eli91goee0y","eli915fgf0x","eli913a8h0w","eli910tm50v","eli90u8ca0t","eli90ptt20s","eli90mmo30q","eli90i2qa0p","eli907ymr0k","eli9035d40j","eli8zmuqn0i","eli8zj8z90h","eli8ze0n40g","eli8z2nce0e","eli8z2bjr0d","eli8z1zg00c","eli8z0yp70b","eli8yybc80a","eli8yj8cw08","eli8yico007","eli8yhabe06","eli8yfqxv05","eli8ye3v104","eli8yd2pl03","eli8ybbx902","eli8qpcss0m","eli8qor430l","eli8qo6n30k","eli8qnucb0j","eli8qatw20i","eli3aci8a0h","eli3a1h4a0f","eli39m0nf0e","eli39jtvk0c","eli39ev2m0b","eli39cwrx0a","eli390m5707","eli38w1tn06","eli38rh8505","eli38jdho04","eli38brla03","eli334wrp01"]] [1685551620.192][2023-05-31 18:47:00][server-cronicle][744][Cronicle][debug][4][Scheduler Minute Tick: Advancing time up to: 2023/05/31 18:47:00][] [1685551620.193][2023-05-31 18:47:00][server-cronicle][744][Cronicle][debug][9][Checking event queues][["elia0o1eu0r","eli9zryby0q","eli91nu3i10","eli91goee0y","eli915fgf0x","eli913a8h0w","eli910tm50v","eli90u8ca0t","eli90ptt20s","eli90mmo30q","eli90i2qa0p","eli907ymr0k","eli9035d40j","eli8zmuqn0i","eli8zj8z90h","eli8ze0n40g","eli8z2nce0e","eli8z2bjr0d","eli8z1zg00c","eli8z0yp70b","eli8yybc80a","eli8yj8cw08","eli8yico007","eli8yhabe06","eli8yfqxv05","eli8ye3v104","eli8yd2pl03","eli8ybbx902","eli8qpcss0m","eli8qor430l","eli8qo6n30k","eli8qnucb0j","eli8qatw20i","eli3aci8a0h","eli3a1h4a0f","eli39m0nf0e","eli39jtvk0c","eli39ev2m0b","eli39cwrx0a","eli390m5707","eli38w1tn06","eli38rh8505","eli38jdho04","eli38brla03","eli334wrp01"]] [1685551680.206][2023-05-31 18:48:00][server-cronicle][744][Cronicle][debug][4][Scheduler Minute Tick: Advancing time up to: 2023/05/31 18:48:00][] [1685551680.207][2023-05-31 18:48:00][server-cronicle][744][Cronicle][debug][9][Checking event queues][["elia0o1eu0r","eli9zryby0q","eli91nu3i10","eli91goee0y","eli915fgf0x","eli913a8h0w","eli910tm50v","eli90u8ca0t","eli90ptt20s","eli90mmo30q","eli90i2qa0p","eli907ymr0k","eli9035d40j","eli8zmuqn0i","eli8zj8z90h","eli8ze0n40g","eli8z2nce0e","eli8z2bjr0d","eli8z1zg00c","eli8z0yp70b","eli8yybc80a","eli8yj8cw08","eli8yico007","eli8yhabe06","eli8yfqxv05","eli8ye3v104","eli8yd2pl03","eli8ybbx902","eli8qpcss0m","eli8qor430l","eli8qo6n30k","eli8qnucb0j","eli8qatw20i","eli3aci8a0h","eli3a1h4a0f","eli39m0nf0e","eli39jtvk0c","eli39ev2m0b","eli39cwrx0a","eli390m5707","eli38w1tn06","eli38rh8505","eli38jdho04","eli38brla03","eli334wrp01"]] [1685551740.216][2023-05-31 18:49:00][server-cronicle][744][Cronicle][debug][4][Scheduler Minute Tick: Advancing time up to: 2023/05/31 18:49:00][] [1685551740.217][2023-05-31 18:49:00][server-cronicle][744][Cronicle][debug][9][Checking event queues][["elia0o1eu0r","eli9zryby0q","eli91nu3i10","eli91goee0y","eli915fgf0x","eli913a8h0w","eli910tm50v","eli90u8ca0t","eli90ptt20s","eli90mmo30q","eli90i2qa0p","eli907ymr0k","eli9035d40j","eli8zmuqn0i","eli8zj8z90h","eli8ze0n40g","eli8z2nce0e","eli8z2bjr0d","eli8z1zg00c","eli8z0yp70b","eli8yybc80a","eli8yj8cw08","eli8yico007","eli8yhabe06","eli8yfqxv05","eli8ye3v104","eli8yd2pl03","eli8ybbx902","eli8qpcss0m","eli8qor430l","eli8qo6n30k","eli8qnucb0j","eli8qatw20i","eli3aci8a0h","eli3a1h4a0f","eli39m0nf0e","eli39jtvk0c","eli39ev2m0b","eli39cwrx0a","eli390m5707","eli38w1tn06","eli38rh8505","eli38jdho04","eli38brla03","eli334wrp01"]] [1685551800.222][2023-05-31 18:50:00][server-cronicle][744][Cronicle][debug][4][Scheduler Minute Tick: Advancing time up to: 2023/05/31 18:50:00][] [1685551800.223][2023-05-31 18:50:00][server-cronicle][744][Cronicle][debug][9][Checking event queues][["elia0o1eu0r","eli9zryby0q","eli91nu3i10","eli91goee0y","eli915fgf0x","eli913a8h0w","eli910tm50v","eli90u8ca0t","eli90ptt20s","eli90mmo30q","eli90i2qa0p","eli907ymr0k","eli9035d40j","eli8zmuqn0i","eli8zj8z90h","eli8ze0n40g","eli8z2nce0e","eli8z2bjr0d","eli8z1zg00c","eli8z0yp70b","eli8yybc80a","eli8yj8cw08","eli8yico007","eli8yhabe06","eli8yfqxv05","eli8ye3v104","eli8yd2pl03","eli8ybbx902","eli8qpcss0m","eli8qor430l","eli8qo6n30k","eli8qnucb0j","eli8qatw20i","eli3aci8a0h","eli3a1h4a0f","eli39m0nf0e","eli39jtvk0c","eli39ev2m0b","eli39cwrx0a","eli390m5707","eli38w1tn06","eli38rh8505","eli38jdho04","eli38brla03","eli334wrp01"]] [1685551860.23][2023-05-31 18:51:00][server-cronicle][744][Cronicle][debug][4][Scheduler Minute Tick: Advancing time up to: 2023/05/31 18:51:00][] [1685551860.231][2023-05-31 18:51:00][server-cronicle][744][Cronicle][debug][9][Checking event queues][["elia0o1eu0r","eli9zryby0q","eli91nu3i10","eli91goee0y","eli915fgf0x","eli913a8h0w","eli910tm50v","eli90u8ca0t","eli90ptt20s","eli90mmo30q","eli90i2qa0p","eli907ymr0k","eli9035d40j","eli8zmuqn0i","eli8zj8z90h","eli8ze0n40g","eli8z2nce0e","eli8z2bjr0d","eli8z1zg00c","eli8z0yp70b","eli8yybc80a","eli8yj8cw08","eli8yico007","eli8yhabe06","eli8yfqxv05","eli8ye3v104","eli8yd2pl03","eli8ybbx902","eli8qpcss0m","eli8qor430l","eli8qo6n30k","eli8qnucb0j","eli8qatw20i","eli3aci8a0h","eli3a1h4a0f","eli39m0nf0e","eli39jtvk0c","eli39ev2m0b","eli39cwrx0a","eli390m5707","eli38w1tn06","eli38rh8505","eli38jdho04","eli38brla03","eli334wrp01"]] [1685551920.236][2023-05-31 18:52:00][server-cronicle][744][Cronicle][debug][4][Scheduler Minute Tick: Advancing time up to: 2023/05/31 18:52:00][] [1685551920.238][2023-05-31 18:52:00][server-cronicle][744][Cronicle][debug][9][Checking event queues][["elia0o1eu0r","eli9zryby0q","eli91nu3i10","eli91goee0y","eli915fgf0x","eli913a8h0w","eli910tm50v","eli90u8ca0t","eli90ptt20s","eli90mmo30q","eli90i2qa0p","eli907ymr0k","eli9035d40j","eli8zmuqn0i","eli8zj8z90h","eli8ze0n40g","eli8z2nce0e","eli8z2bjr0d","eli8z1zg00c","eli8z0yp70b","eli8yybc80a","eli8yj8cw08","eli8yico007","eli8yhabe06","eli8yfqxv05","eli8ye3v104","eli8yd2pl03","eli8ybbx902","eli8qpcss0m","eli8qor430l","eli8qo6n30k","eli8qnucb0j","eli8qatw20i","eli3aci8a0h","eli3a1h4a0f","eli39m0nf0e","eli39jtvk0c","eli39ev2m0b","eli39cwrx0a","eli390m5707","eli38w1tn06","eli38rh8505","eli38jdho04","eli38brla03","eli334wrp01"]] [1685551980.247][2023-05-31 18:53:00][server-cronicle][744][Cronicle][debug][4][Scheduler Minute Tick: Advancing time up to: 2023/05/31 18:53:00][] [1685551980.248][2023-05-31 18:53:00][server-cronicle][744][Cronicle][debug][9][Checking event queues][["elia0o1eu0r","eli9zryby0q","eli91nu3i10","eli91goee0y","eli915fgf0x","eli913a8h0w","eli910tm50v","eli90u8ca0t","eli90ptt20s","eli90mmo30q","eli90i2qa0p","eli907ymr0k","eli9035d40j","eli8zmuqn0i","eli8zj8z90h","eli8ze0n40g","eli8z2nce0e","eli8z2bjr0d","eli8z1zg00c","eli8z0yp70b","eli8yybc80a","eli8yj8cw08","eli8yico007","eli8yhabe06","eli8yfqxv05","eli8ye3v104","eli8yd2pl03","eli8ybbx902","eli8qpcss0m","eli8qor430l","eli8qo6n30k","eli8qnucb0j","eli8qatw20i","eli3aci8a0h","eli3a1h4a0f","eli39m0nf0e","eli39jtvk0c","eli39ev2m0b","eli39cwrx0a","eli390m5707","eli38w1tn06","eli38rh8505","eli38jdho04","eli38brla03","eli334wrp01"]] [1685552040.256][2023-05-31 18:54:00][server-cronicle][744][Cronicle][debug][4][Scheduler Minute Tick: Advancing time up to: 2023/05/31 18:54:00][] [1685552040.257][2023-05-31 18:54:00][server-cronicle][744][Cronicle][debug][9][Checking event queues][["elia0o1eu0r","eli9zryby0q","eli91nu3i10","eli91goee0y","eli915fgf0x","eli913a8h0w","eli910tm50v","eli90u8ca0t","eli90ptt20s","eli90mmo30q","eli90i2qa0p","eli907ymr0k","eli9035d40j","eli8zmuqn0i","eli8zj8z90h","eli8ze0n40g","eli8z2nce0e","eli8z2bjr0d","eli8z1zg00c","eli8z0yp70b","eli8yybc80a","eli8yj8cw08","eli8yico007","eli8yhabe06","eli8yfqxv05","eli8ye3v104","eli8yd2pl03","eli8ybbx902","eli8qpcss0m","eli8qor430l","eli8qo6n30k","eli8qnucb0j","eli8qatw20i","eli3aci8a0h","eli3a1h4a0f","eli39m0nf0e","eli39jtvk0c","eli39ev2m0b","eli39cwrx0a","eli390m5707","eli38w1tn06","eli38rh8505","eli38jdho04","eli38brla03","eli334wrp01"]] [1685552100.273][2023-05-31 18:55:00][server-cronicle][744][Cronicle][debug][4][Scheduler Minute Tick: Advancing time up to: 2023/05/31 18:55:00][] [1685552100.273][2023-05-31 18:55:00][server-cronicle][744][Cronicle][debug][9][Checking event queues][["elia0o1eu0r","eli9zryby0q","eli91nu3i10","eli91goee0y","eli915fgf0x","eli913a8h0w","eli910tm50v","eli90u8ca0t","eli90ptt20s","eli90mmo30q","eli90i2qa0p","eli907ymr0k","eli9035d40j","eli8zmuqn0i","eli8zj8z90h","eli8ze0n40g","eli8z2nce0e","eli8z2bjr0d","eli8z1zg00c","eli8z0yp70b","eli8yybc80a","eli8yj8cw08","eli8yico007","eli8yhabe06","eli8yfqxv05","eli8ye3v104","eli8yd2pl03","eli8ybbx902","eli8qpcss0m","eli8qor430l","eli8qo6n30k","eli8qnucb0j","eli8qatw20i","eli3aci8a0h","eli3a1h4a0f","eli39m0nf0e","eli39jtvk0c","eli39ev2m0b","eli39cwrx0a","eli390m5707","eli38w1tn06","eli38rh8505","eli38jdho04","eli38brla03","eli334wrp01"]]

mikeTWC1984 commented 1 year ago

Check "activity log" on admin tab. Do you see "Failed to launch" warning related to this event there?

Altersoundwork commented 1 year ago

No entries under activity log, failed or otherwise, for scheduled tasks.

Altersoundwork commented 1 year ago

So I created another Ubuntu 22.04 Server instance, installed npm, nodejs and then ran the curl command to install Cronicle. Ran the setup command, then start.

Logged in, all ok. Created a test job, scheduled it and it worked.

I then exported the data from the instance with the issue, imported it, changed the hostname on the new instance (otherwise it doesn't find itself as the master server).

And issue is back. So it's definitely not a config.json file issue or any other files as they're all unmolested.

Log still shows nothing new that would aid in discovering the issue and the activity log shows no failures either.

With all this in mind, things that are different from a brand new install with no data on it:

ybitso commented 1 year ago

I got the same issue with also Ubuntu 22.04.2

I have weekly jobs scheduled and was working fine for a year until Beginning of May when you see the upcoming job, but no status, no error, no history on it.
I did not create a new server, but did create a test job that is just echo "hi", with no luck

The only updates I see done during that timeframe was an update to git

jhuckaby commented 1 year ago

Hey @Altersoundwork would be possible for you to send me that data export which reproduces the problem? Note: You may need to scrub any sensitive data inside. I really need to be able to see this issue first-hand to debug it further. My private email is jhuckaby-at-gmail-dot-com.

ybitso commented 1 year ago

@jhuckaby Not sure if you received any files on this; but as I have the same issue, would it be helpful if I send you any files?

jhuckaby commented 1 year ago

@ybitso Sure, can you do a data export and send me the file it generates?

WARNING: You may need to scrub any sensitive data inside, so open it up in a text editor and take a look first. It contains a full dump of your Cronicle data including all your users (emails, hashed passwords, etc.), as well as all your Cronicle events and plugins, which may contain secrets, if you set anything in environment variables, etc.. All I am interested in is the schedule data, so I can try to reproduce this issue locally. Thanks!

care2DavidDPD commented 1 year ago

I'm having a similar issue ... I thought I had messed up with S3 (via MinIO) and clobbered something in replication. This is also a cluster of 6 machines ... cross two - San Jose (SJC1) and Reston,VA (IAD1). For example, I have this job, that creates a static dump/backup/snapshot of my IPAM/DCIM/OpsDB into a file on the NFS server:

Screenshot 2023-07-07 at 4 04 56 PM

When I "up-to-history" ... it's blank ...

Screenshot 2023-07-07 at 4 07 38 PM

However, JobID, which I manually ran - is there.

Screenshot 2023-07-07 at 4 08 40 PM

The log file is on both MinIO instances:

[ pts/4 iad1 nas1:/sr/minio/cronicle/jobs ]
[ dpd ] > md5 jljt65wn90g.json
MD5 (jljt65wn90g.json) = 161e70ac76d0cfc27c5abf4f95357f08

[ pts/0 sjc1 nas1:/sr/minio/cronicle/jobs ]
[ dpd ] > md5 jljt65wn90g.json
MD5 (jljt65wn90g.json) = 161e70ac76d0cfc27c5abf4f95357f08

... and the history is for this event ID is actually on disk ... and look to be mostly in sync ...

[ pts/4 iad1 nas1:/sr/minio/cronicle/jobs ]
[ dpd ] > find . -name "*.json" | xargs -I % grep -L elfw2dwqq53 % | wc
   52747   52747 1002193

[ pts/0 sjc1 nas1:/sr/minio/cronicle/jobs ]
[ dpd ] > find . -name "*.json" | xargs -I % grep -L elfw2dwqq53 % | wc
   52747   52747 1002193

It seems to me, some sort of index file or something is missing and not getting updating ... I just don't know where in the code to look. The "All Completed Jobs" in https://cronicle/#History?sub=history is also empty .... however the jobs that run less frequently ... still seem to display histories.

jhuckaby commented 1 year ago

@care2DavidDPD Hey David, you've got a different situation there. Looks like classic corruption of the "database" files on disk, which can happen if you use a non-AWS S3 provider. The S3 provider needs to provide "immediate consistency" which AWS S3 only added recently (in 2019 I think). Not sure if your MinIO implementation has it, but that's a classic cause of issues like this.

The other cause could be a crash or sudden power loss of the master node during writes. Cronicle was designed years before I added full transaction support to my storage system, so it is still susceptible to this. It will be fixed in Cronicle v2.

care2DavidDPD commented 1 year ago

@care2DavidDPD Hey David, you've got a different situation there. Looks like classic corruption of the "database" files on disk, which can happen if you use a non-AWS S3 provider.

Anyway to manually rebuild these database files ?

jhuckaby commented 1 year ago

@care2DavidDPD I have moved this to #618, as it is unrelated to the current issue at hand.

jhuckaby commented 1 year ago

@ybitso I received your data export (thank you), and installed it on a VM. I see that almost all of your events are either disabled, or run on demand only:

Screenshot 2023-07-11 at 7 53 25 PM

In this case nothing will show on the "Upcoming Events" list, because nothing is upcoming. This is as designed:

Screenshot 2023-07-11 at 8 02 19 PM

You do have one single scheduled recurring event enabled ("Reseller Deal Status") but it only runs on Sundays at 3:05 AM -- that's once per week only:

Screenshot 2023-07-11 at 7 59 42 PM

The "Upcoming Events" list only predicts 24 hours in the future, so it would only see this event within 24 hours of it running.

I tried re-enabling your "Test" event (Hourly on the :05), and this showed up on the "Upcoming Events" list and ran successfully on time:

Screenshot 2023-07-11 at 8 05 05 PM

I was unable to reproduce any issues using this data export, but again, thank you for providing it.

jhuckaby commented 1 year ago

@Altersoundwork just checking with you one more time. Would it be possible for you to send me that data export which reproduces the problem described in this issue? Note: You may need to scrub any sensitive data inside. I really need to be able to see this issue first-hand to debug it further. My private email is jhuckaby-at-gmail-dot-com.

Thanks.

jhuckaby commented 1 year ago

@Altersoundwork I just noticed that you're using Node v12.22.9. This is really old at this point. Can you please try upgrading to v16 or v18? That might solve the issue right there.

curl -sL https://deb.nodesource.com/setup_16.x | sudo -E bash -
ybitso commented 1 year ago

@ybitso I received your data export (thank you), and installed it on a VM. I see that almost all of your events are either disabled, or run on demand only:

Screenshot 2023-07-11 at 7 53 25 PM

In this case nothing will show on the "Upcoming Events" list, because nothing is upcoming. This is as designed:

Screenshot 2023-07-11 at 8 02 19 PM

You do have one single scheduled recurring event enabled ("Reseller Deal Status") but it only runs on Sundays at 3:05 AM -- that's once per week only:

Screenshot 2023-07-11 at 7 59 42 PM

The "Upcoming Events" list only predicts 24 hours in the future, so it would only see this event within 24 hours of it running.

I tried re-enabling your "Test" event (Hourly on the :05), and this showed up on the "Upcoming Events" list and ran successfully on time:

Screenshot 2023-07-11 at 8 05 05 PM

I was unable to reproduce any issues using this data export, but again, thank you for providing it.

So for me, the "test" that runs every :05 does show up in upcoming events but does not run, and nothing appears in the event history for that run

jhuckaby commented 1 year ago

@ybitso Very odd. I cannot seem to reproduce that. Note that your "Test" event does nothing except echo "Hi" to STDOUT, so it's difficult to tell if it really ran. Can you change the script to write something to a file? Like:

#!/bin/bash
date >> /tmp/TEST_JOB.txt

Then see if that file appears after the next scheduled run and has the date/time in it.

If so, then your problem is probably more like what @care2DavidDPD is experiencing, i.e. some level of corruption in the storage system which is preventing the job history list from populating.

If not, then you truly have the mystery bug, which I cannot for the life of me track down.

jhuckaby commented 1 year ago

Cronicle v0.9.24 was just released, which ships with transactions enabled by default (however you'll need to manually enable them for existing installs), as well as a new storage repair script.

Check out the updated Troubleshooting Wiki for details and instructions.

mikeTWC1984 commented 1 year ago

@ybitso did you try to upgrade your node version? I think the recent versions of cronicle don't work correctly with v12.