Duplicate completed history

jhuckaby / Cronicle

A simple, distributed task scheduler and runner with a web based UI.

http://cronicle.net

Other

3.83k stars 387 forks source link

Duplicate completed history #155

Open ckt114 opened 5 years ago

ckt114 commented 5 years ago

Summary

Cronicle shows duplicate entries for jobs that run every minute in Event History.

Steps to reproduce the problem

Create a sample Shell script job to echo Hello world and make it run "Every minute".

Your Setup

One master server, 2 backup servers, and 4 slaves all running on k8s. The masters run as statefulsets and shared one data directory.

Operating system and version?

Node.js version?

10.11

Cronicle software version?

0.8.28

Are you using a multi-server setup, or just a single server?

Yes. Multi. 1 primary and 2 backups.

Are you using the filesystem as back-end storage, or S3/Couchbase?

Filesystem

Screenshot

duplicate

jhuckaby commented 5 years ago

Wow, I really cannot fathom how this is possible. An event job completion only inserts one single item into the list. I am struggling to see how they could ever be duplicated.

My first thought was somehow you had two servers that became master at the same time, but the Job IDs should be different in that case (well, it is a timestamp-based ID generation, so I guess it is possible this could happen, but very rare).

I will add this a bug and keep it open, but I've never seen this behavior before, and am very confused how it could happen.

If you can zip up the Cronicle logs directory and post it (or send it to me privately, jhuckaby at gmail), that may be helpful.

Thank you.

peterbuga commented 1 year ago

jumping in on this one, just a bit worse 😅 it's not the end of the world, i can live with it but it is annoying a bit (the worse part is if I attach an email confirmation to a task it's sent X-times over and over). it randomly happens too, originally i saw it happen only to tasks that were API-triggered, now it's pretty much on any task be it runs on a slave server or primary (I have a setup 1 primary + 1 slave with S3 backend)

Screenshot 2023-09-11 at 11 32 18

@jhuckaby I know Cronicle is maintenance mode, if you're willing to take a look a logs i'd be more than glad to send them over just give the go-ahead I don't want to spam.

mikeTWC1984 commented 1 year ago

Check your log, see if such jobs repeat multiple times in there as well. Typically should be just 4 times (launch/spawn/complete/log). Run something like

cat logs/Cronicle.log  | grep "jlmfb21ee26" | wc -l

jhuckaby commented 1 year ago

This is such a bizarre issue, because each of those jobs has an identical ID in the table. Every job is assigned a unique ID when it is first created. I'm having trouble fathoming how this could even occur.

@peterbuga Are you on the latest Cronicle (v0.9.30) with storage transactions enabled?

peterbuga commented 1 year ago

@jhuckaby @mikeTWC1984 thanks for replies, and yes i'm on latest Cronicle release w/ storage transactions enabled.

after some digging around found the culprit, for some reason turns out there are multiple ways to setup a storage with S3, in my case I'm using Cloudflare's R2 buckets. before I had it setup like so

"AWS":
        {
            ....
            "endpoint": "https://xxxx.r2.cloudflarestorage.com",
            "credentials": {
                "secretAccessKey": "xxxx",
                "accessKeyId": "xxxx",
                "region": "auto"
            },
            ....

and now I simply moved the credentials 1 level up

"AWS":
        {
            ....
            "endpoint": "https://xxxx.r2.cloudflarestorage.com",
            "secretAccessKey": "xxxx",
            "accessKeyId": "xxxx",
            "region": "auto",
            ....

I don't know how or why I had it setup like this in the 1st place but I clearly copied it from somewhere but now I'm glad it does the trick and doesn't do 50x duplicates

mikeTWC1984 commented 1 year ago

OK, I guess that's some glitch of cloudflare then. The config with "credential" property is for AWS SDK v3, and the other one is for v2. I'd expect one of them would just fail to authenticate, if they both work something is not right on their end.