symbiote / silverstripe-queuedjobs

A module that provides interfaces for scheduling jobs for certain times.
BSD 3-Clause "New" or "Revised" License
57 stars 73 forks source link

Jobs causing fatal errors affect the queue processing #179

Closed raissanorth closed 5 years ago

raissanorth commented 6 years ago

When a job causes a fatal error, e.g. calls a non-existent method, the QueueHandler does not return with 0 - but 255 instead - and the following queued jobs are not executed.

In addition, a 'stalled warning' message is sent to the admin:

A job named job name appears to have stalled. It will be stopped and restarted, please login to make sure it has continued

The status of the job remains as running, but it is indeed not automatically restarted.

This appears to be a regression from CWP 1.8.1 to CWP2.0-rc1.

Might it be related to the use of Monolog for logging?

To recreate:

symbiote/silverstripe-queuedjobs          dev-master d690e58
silverstripe/framework                    4.1.x-dev 9ed3cd4
silverstripe/admin                        1.1.x-dev 4166209
asyncphp/doorman                          3.0.0
robbieaverill commented 6 years ago

I think this is quite important to resolve, adding impact

robbieaverill commented 6 years ago

@mateusz do you see there being an impact on the platform or SilverStripe Ops from this?

robbieaverill commented 6 years ago

In addition, a 'stalled warning' message is sent to the admin:

Noting that this is expected behaviour

robbieaverill commented 5 years ago

Ok picking this up again after some time away from it, here are some notes:

Testing environments

Steps to reproduce

CWP 1.9 behaviour

Job exits silently and is left in Running state. No error messages are saved in the job logs.

Subsequent runs continue to try to re-run the job, but do not report any messages indicating this is happening.

CWP 2.2 behaviour

Job exits silently and is left in Running state. The error message is saved in the job logs.

Subsequent runs report on a broken job with a large stack trace.


It looks to me like this is working better than it did in CWP 1.9 now. I can't reproduce the 255 exit code when a job stalls or causes a fatal error.

As an aside, there is much more verbose output in the logs now that we're using Monolog tied into the core error logger - i.e. you get a full backtrace of middlewares. https://github.com/silverstripe/silverstripe-framework/pull/8241 would help with this, since it wouldn't get sent to the core error handler by default.

I'm going to close this for now as "cannot reproduce" - feel free to reopen if anyone would like to provide more information.