pkp / pkp-lib

The library used by PKP's applications OJS, OMP and OPS, open source software for scholarly publishing.
https://pkp.sfu.ca
GNU General Public License v3.0
297 stars 442 forks source link

Improve job runner configuration/execution #9823

Closed jonasraoni closed 1 month ago

jonasraoni commented 5 months ago

Describe the bug The documentation states that jobs can be executed by:

But in fact, once the job_runner is enabled, both the scheduled task and the request based execution will execute the jobs.

Solution Adding a setting to control whether jobs should be executed by a scheduled task/request should be enough.

What application are you using? OJS 3.4

PRs (main branch) pkp-lib --> https://github.com/pkp/pkp-lib/pull/10196 ojs --> https://github.com/pkp/ojs/pull/4364 omp --> https://github.com/pkp/omp/pull/1662 ops --> https://github.com/pkp/ops/pull/743

Additional Information This need to be merged to main (upcoming 3.5) after the merge of https://github.com/pkp/pkp-lib/issues/9678 For implementation details see release note and https://github.com/pkp/pkp-lib/issues/9823#issuecomment-2222417819

bozana commented 5 months ago

@jonasraoni, what documentation you mean, this one https://docs.pkp.sfu.ca/admin-guide/en/deploy-jobs? Is CLI tool the same as scheduled task? -- I thought not... For CLI tool (run via Cron job) and Worker solution one needs to disable the build-in option job_runner. You maybe mean when a user uses Cron job for scheduled tasks, then the jobs are also run there? But isn't it in that case that the user needs to deactivate AcronPlugin and then the jobs would also not run two times, or is it here the problem that even if the AcronPlugin is not used the jobs would run?

jonasraoni commented 5 months ago

There was this problem before, but I've already fixed 😁

I've mixed the CLI tool with the scheduled task execution, so the documentation is ok, I've updated the description :)

bozana commented 5 months ago

So just to be sure I understand: What happens if job_runner = off and the user is running scheduled tasks by a Cron job? Will jobs be run however by the scheduled task?

jonasraoni commented 5 months ago

At this moment: job_runner=On = OJS will execute jobs at the end of requests and also when the scheduled tasks are running (either via Acron or a CRON task). job_runner=Off = OJS will not execute jobs by itself.

bozana commented 5 months ago

Hmm... So if scheduled tasks are run using a Cron job, the Acron plugin needs to be disabled. If the jobs are run via Cron + CLI or Workder (as in documentation) then the job_runner has to be Off. Then there are no problems, correct?

touhidurabir commented 5 months ago

@jonasraoni I also think we need to have another config option to handle it . However along with this I also like to have some sort of limiter to introduce when jobs are running via cron and that can also be configured (perhaps from the config file).

Right now if a job runs via cron, it will run or try to run all the pending jobs in the list at a time . This is not bad when there is only few jobs but not an ideal situation when there is a lot of pending jobs (which has been accumulated for quite some time for some unforeseen reason) or few jobs with long run time . This may cause memory run out issue and also not very efficient .

Solution 1 :

we can :-

  1. introduce another config option along with config option to control the job execution via cron (probably not a good idea as we are starting to have too many config options)
  2. we can use the existing config option job_runner_max_jobs which also used by internal job runner (seems a reasonable one to me).

Only issue using the existing job_runner_max_jobs is when an installation want to use both the job runner and corn with different config setting but then will be forced to use to same one for both option (probably such use case will be too low or non existence).

Solution 2:

Another way we can go about this is to use the job runner mechanism for the cron also . and it's not much different as both use pretty much same approach but job runner have some additional constrains (such as max jobs, time, memory and possible estimation of next job to run ), underneath after applying the constrains, it runs jobs one by one .

touhidurabir commented 1 month ago

@asmecher @jonasraoni I like to propose the following changes

we do need a new config setting for job section . Maybe named as schedule_job_runner which will control the job processing via scheduler . But we also need to consider if job is getting processed via other means e.g. worker/job_runner . so how about :-

  1. if only the process_jobs_at_task_scheduler in job section set to On jobs will get to processed via schedule tasks .
  2. When the scheduler run in CLI, the schedule task will process jobs but not more than a specific number . we can use the value of job_runner_max_jobs for this purpose. The reason for this is not to allow schedule tasks process a huge number of jobs at a single time as scheduler also need to process other schedule tasks . Basically we need to consider the performance impact for scheduler .
  3. if the scheduler running in web mode, it will utilize Job Runner feature but only if job_runner set to Off. The reason for this is as job runner process jobs in web request mode, we don't want scheduler to process jobs in same way when it's also running in web request mode .

One important decision to make , should we prioritize it for 3.4.0-x for push it for upcoming 3.5.0 as we will have some breaking changes with the merge of https://github.com/pkp/pkp-lib/issues/9678 ?

asmecher commented 1 month ago

^ @touhidurabir and I discussed this and I'd prefer leaving this kind of change for 3.5.0. We're not making heavy use of jobs/tasks that have much overhead or turnaround time, and aren't worried about concurrent e.g. execution of the same task, as we have decent enough locking. We should be fine for 3.4.0 as-is IMO.

touhidurabir commented 1 month ago

@jonasraoni can you review the linked PRs at https://github.com/pkp/pkp-lib/issues/9823#issue-2202216886 ? However this need to wait before we finalise and merge the https://github.com/pkp/pkp-lib/issues/9678 .

jonasraoni commented 1 month ago

FYI I'll wait the other issue to get merged before reviewing this one.

touhidurabir commented 1 month ago

@jonasraoni I have updated and rebased the PR. As the https://github.com/pkp/pkp-lib/issues/9678 got merged, it's ready for review .

jonasraoni commented 1 month ago

I'll take a look now, I just read the comments, and I think it's aligned with what I wrote in the description (to have a more granular control).

touhidurabir commented 1 month ago

All merged, PR and implementation details https://github.com/pkp/pkp-lib/issues/9823#issue-2202216886