jhuckaby / Cronicle

A simple, distributed task scheduler and runner with a web based UI.
http://cronicle.net
Other
3.84k stars 387 forks source link

No tasks listed in completed tab #143

Open geofffox opened 5 years ago

geofffox commented 5 years ago

Summary

No tasks listed in Completed tab

Steps to reproduce the problem

I had a PC freeze earlier tonight. This was the result of the reboot. I have tried restarting Cronicle. No change. Errors continue to be logged in the Admin tab (as I saw when I restarted Cronicle).

Your Setup

HP I5 w/8Gb RAM

Operating system and version?

Centos 7

Node.js version?

6.14.3

Cronicle software version?

0.8.28

Are you using a multi-server setup, or just a single server?

Single

Are you using the filesystem as back-end storage, or S3/Couchbase?

filesystem

Can you reproduce the crash consistently?

I can't unproduce it! :)

Log Excerpts

Cronicle has made my life a whole lot easier. I'm running around 4,500 crons a day to produce ~40,000 weather maps.

geofffox commented 5 years ago

After further checking, I can see completed tasks using "Filter by Event" for all except one of my tasks.

Bad data somewhere I can clean up?

jhuckaby commented 5 years ago

Hey there,

Very sorry about the crash. It sounds like there is indeed data corruption. Unfortunately Cronicle does not use ACID transactions (yet), so it is susceptible to things like sudden power loss and server crashes.

So, there is no easy way to "repair" the data and get back your event history, but you can wipe it and start fresh. Please see Issue #107 for a temporary solution.

The TL;DR is run these commands on your master server:

/opt/cronicle/bin/control.sh stop
/opt/cronicle/bin/storage-cli.js delete logs/completed
/opt/cronicle/bin/storage-cli.js list_create logs/completed
/opt/cronicle/bin/control.sh start

Good luck, and sorry!

Oh, I just read your follow-up comment, and I see that you may have corruption in at least two places. My instructions from issue #107 deal with corruption in the "main" global completed job list, but it does not address corruption inside individual event histories.

To fix that, please edit the event which has the corrupted history, and grab its unique Event ID from the UI. It should be an 11-character alphanumeric string beginning with the letter e, similar to this one: eiqbfkb9j4z.

Once you have that ID, issue these commands on your master server:

/opt/cronicle/bin/control.sh stop
/opt/cronicle/bin/storage-cli.js delete logs/events/EVENT_ID
/opt/cronicle/bin/storage-cli.js list_create logs/events/EVENT_ID
/opt/cronicle/bin/control.sh start

Replace EVENT_ID with your unique alphanumeric Event ID that has the corrupted history.

Good luck!

Also, I should add a disclaimer here. Cronicle is still in a prerelease testing state, and data loss can absolutely occur. There is really no guarantee of data retention and resilience. So please use this product at your own risk, and definitely not in production with mission critical data!

- Joe

geofffox commented 5 years ago

Hi Joe -

No apologies necessary. We all know what we're getting into on github. Beyond that, I think I speak for all here that we appreciate you contributing this back to the community. Plus, no more cron!

Because longterm logs aren't necessary to me I did the 'full neuter' and all is good with the world.

Again, thanks Joe.

geofffox commented 5 years ago

Joe --

As it turns out deleting the overall history wasn't enough. I did go back and delete the individual jobs and your instructions were 100%.

As long as I have you here... I can't figure out why the web interface reports back in PST. I live there, but the server is UTC and it's PST on the web interface there. All the jobs are UTC. Can't find where to change it.

Also, I don't know the practical difficulty, but two of my jobs run every minute. I wish there was a way to graph over a longer period of time.

jhuckaby commented 5 years ago

Hello, and sorry for the delay on this reply. I was away on vacation.

I can't figure out why the web interface reports back in PST. I live there, but the server is UTC and it's PST on the web interface there. All the jobs are UTC. Can't find where to change it.

Yup, so currently the UI makes a best guess at your local timezone, and adjusts dates and time display for that zone. There is a feature request to implement custom timezones per user, which I will be implementing in the next major version.

Also, I don't know the practical difficulty, but two of my jobs run every minute. I wish there was a way to graph over a longer period of time.

I assume you are talking about the Event History page, which shows a graph of the performance metrics of the last 50 job runs? Sure thing, I will add an option to increase this number.

Thanks for the suggestions!