hummingbird-project / swift-jobs

Offload work your server would be doing to another server
Apache License 2.0
15 stars 1 forks source link

Job Scheduling questions #42

Closed thoven87 closed 1 day ago

thoven87 commented 3 days ago

I have been debugging the schedule job "onMinute" features and it's possible that jobs can fall behind and never run again.

For example here, I ran a job on 2024-11-02 @ 14:40, the job is scheduled to run every 5 minutes. Today is 2024-11-29, after kicking off the jobs performance test again, I see the followings in the logs

2024-11-29T02:11:05-0500 debug io.stevenson.server : [Jobs] Last scheduled date 2024-11-02 14:40:00 +0000.
2024-11-29T02:11:05-0500 debug io.stevenson.server : [Jobs] Last scheduled date 2024-11-02 14:40:00 +0000.
2024-11-29T02:11:05-0500 debug io.stevenson.server : JobName=StatsJob JobTime=2024-11-02 14:45:00 +0000 [Jobs] Next Scheduled Job

After trying to troubleshoot this issue for a while, I am at a loss here this is supposed to return a date in the future where future is not yet known?

Is it advisable to have one job controller per schedule one is one controller is enough. I know if there's at least an everyMinute or everyHour schedule, the next onMinute schedule gets run.

adam-fowler commented 3 days ago

Are you saying that scheduled job (in the logs) never runs?

You are better with one JobController (regardless of the number of schedules). Set its number of workers to the number of cores on the system, although you could experiment with running with more workers and see if their work can interleave. If you aren't processing enough jobs in time then add another machine.

thoven87 commented 2 days ago

Are you saying that scheduled job (in the logs) never runs?

The log never update as in, the next schedule should have been 2024-11-29 20:00:00. As you can see, it's still 2024-11-02 14:40:00 You are better with one JobController (regardless of the number of schedules). Set its number of workers to the number of cores on the system, although you could experiment with running with more workers and see if their work can interleave. If you aren't processing enough jobs in time then add another machine. The StatsJob isn't doing much work actually,

WITH number_of_test_data AS (
    SELECT COUNT(1) AS total  FROM hummingbird.test_data
)
INSERT INTO  hummingbird.test_data_jobs_did_run (count, hash)
SELECT
    total,
    md5(\(hashForTheCurrentJobPartititon)) --- This is to prevent from rerunning - I am working on job workflow for swift-jobs
FROM number_of_test_data

The job runs on my MacBook and the database is hosted on neon.tech.

Originally, I had three Jobs ("StatsJob") with different schedules in on Controller, (Monthly, Hourly, onMinutes) This is when I started to experience issues where one job will run and the other would not. This PR fixed the issue that prevented the other jobs from running.

thoven87 commented 2 days ago

I don't think the issue is the number of cores as the laptop has 10-core and the job is supposed to use at least 4 of them. I put log statements in the "StatsJob" which confirms it never run. If I delete the current entry for _jobScheduleLastDate in the database, the job will run as expected.