lholden / job_scheduler

A simple cron-like job scheduling library for Rust.
Apache License 2.0
199 stars 34 forks source link

Working around `tick` #1

Open zslayton opened 7 years ago

zslayton commented 7 years ago

Hi! I noticed a crate had taken a dependency on cron so I came to check it out. I have a suggestion that's a bit involved, so forgive me in advance.

Currently the JobScheduler relies on the tick method to detect jobs that need to be run. This is functional but causes the program to claim a small amount of CPU time even when none of its jobs are active.

Each Schedule provides an upcoming() iterator. The DateTime values produced by this iterator are inherently sorted from soonest to most distant. Knowing this, you can:

  1. Make a new struct called, say, UpcomingJob that holds a DateTime and an Rc<Job>.
  2. Make an iterator type UpcomingJobIterator that provides next() by getting the next upcoming() DateTime from the Schedule and an Rc<Job> reference to the Job and combining them into an UpcomingJob.
  3. Implement Ord for UpcomingJob so they can be sorted in order of DateTime.
  4. Get an UpcomingJobIterator from each Job.
  5. Create a Heap.
  6. Pull the next() UpcomingJob from each iterator and put it in the heap (as ordered by DateTime)
  7. Peek at the top of the heap. That UpcomingJob is the next one to run across all of your Jobs.
  8. std::thread::sleep(...) until that fire time.
  9. Run the job.
  10. Remove that value from the top of the heap. Use the Rc<Job> reference in the UpcomingJob you just removed to get the next UpcomingJob for that Job and put it in the heap.
  11. Repeat steps 7-10.

The itertools crate offers a function called kmerge that should be able to take care of steps 5-11 for you.

Using this approach, your program is guaranteed to be asleep until the next fire time, no matter which job's next.

PS: Sorry about cron being nightly-only. It's mostly blocked on BTreeSet's range method being unstable. There's an issue tracking this.

lholden commented 7 years ago

Hi there!

No worries about cron being nightly-only. To be honest... I use nightly almost exclusively in my projects for various reasons. Having said that, certainly wouldn't mind to eventually make it work on stable at some point.

Thanks for the great advice!

The amount of CPU time the process uses while idling is very small overall. I'd certainly be interested in having a version of what you describe at some point though.

I do worry how well behaved sleeping for prolonged periods would be though. A task that triggered once a year for example. It would also not be able to follow things like updating the system time terribly well. (Ex, I my system hits ntp and ends up moving forward an hour for whatever reason... now the sleep is off an hour until it wakes up). One could have a maximum sleep period I suppose.

I would happily accept a pull request if someone wanted to implement this. As it is though, it's working great for me in my production environment as is. Otherwise, I may look into doing it when I find some free time.

jordanmack commented 5 years ago

The scheduling system I originally implemented in my program did as described, and slept for long periods until it was needed again. It worked fine unless the system went to sleep. Upon waking, the sleep time is completely inaccurate. This can lead to missed and unexpected runs and it isn't intuitive.

brokenthorn commented 5 years ago

8. std::thread::sleep(...) until that fire time.

That's gonna be the main cause of failure on time drifts, I think (computer sleeping, NTP client updating system time for DST, user updating timezone or clock, etc.).

You could sleep for a configurable resolution time in ms, and store the timezone-aware DateTime just before the process goes to sleep. Then on wake up, if there was any time drift between the current time and (the stored time of last sleep + resolution ms), adjust every job's time and date for that drift, then check if any jobs should have been executed already according to their new times and execute them in order, then, well, go back to sleep. :grin:

resolution would thus allow you to go easier on the CPU in a configurable load manner. Job execution misses would then at most be as long as resolution + time_drift, right?

brokenthorn commented 4 years ago

Wow, I forgot I even commented on this repository. Turns out a year later (and still learning Rust) I'm writing a job scheduler but as part of another project.

I got to a basic working scheduler for async fns but recently I thought about rewriting in a similar fashion described in this issue by @zslayton, and I might even build some mitigation for things like what @jordanmack mentioned.

I'll have to extract this code separately into a Rust library but in the meantime, here is the project I'm talking about: https://github.com/brokenthorn/mf-sellout-reporter/tree/dev

Right now, I'm looking into async_std::task::* in order to make the async fns execute as async_std tasks, which would allow them to run in parallel as well as concurrently (still learning about this aspect).

If anyone has any suggestions, they're welcome to open issues there or tweet me @brokenthorn.