Open karptonite opened 5 months ago
If this is a change in the behavior, was it intentional?
Hi, Daniel. Rule of thumb: If a behavior change is not listed in a project's ChangeLog, then I think it's probably not intentional. :smile:
There are a lot of changed variables here (PHP, libgearman, Gearman extension). It's hard to tell where the issue might be. I recommend that you try to isolate those variables. Can you revert just libgearman, for example? Did you also change gearmand server versions?
Are you submitting your jobs synchronously or asynchronously?
Can you post a simple failing test case?
Hi @esabol ! Unfortunately, I do not have the ability to revert just libgearman at the moment. Maybe later, if the test case I've provided doesn't shed any light on this. The server is unchanged, and is running Gearman 1.1.12. I'm submitting jobs asynchronously.
you can find a simple failing test case here: https://github.com/karptonite/gearman-test
To run it, you must be running a gearman server--I don't know if the version matters, but I was running 1.1.12. You can use the same server for testing multiple PHP environments.
Edit both files to include the location of your gearman server.
first, run the file test_gearman1.php
.
second, run the file test_gearman2.php
. The output will tell you whether the test passed, and will also reset the function name for the next test.
It works as follows: The first script adds a job to the queue, then adds a worker that will take the job, but exit with a non-zero error code before returning. The second script Add a worker, and takes any jobs added by the first script. In theory, because the first script exited with a non-zero error code, the job should still be on the queue when you run the second script. This test passes with Pecl-Gearman 2.0.6, PHP 7.4, libgearman 1.1.12. It fails with Pecl-Gearman 2.1.2, PHP 8.0.30, libgearman 1.1.19.1.
Help! Can anyone test this with different versions of the PHP extension and/or different versions of libgearman?
One further comment: I've noticed on further examination that it doesn't matter what the exit status code was, in the previous behavior; Even exiting with a normal 0 exit code would cause the job to go back on the queue if the script exits before the callback function returns. I mention this mainly to avoid someone looking for code that checks errors status codes.
Has anyone had a chance to try my repro of this issue?
As recently as version 2.0.6, (with PHP 7.4, libgearman 1.1.12) if PHP exited with a non-zero value while a worker was working on a job, the job would be returned to the stack, and would be retried as soon as it started up again. We recently upgraded to PHP 8, libgearman 1.1.19.1 and 2.1.2 of the Gearman extension, and it seems as if this is no longer the case; now if a PHP process crashes in the middle of a job, it seems that the job is essentially lost--it does not go back on the queue. If this is a change in the behavior, was it intentional?
The (old) behavior is discussed here: https://stackoverflow.com/questions/8870132/error-conditions-and-retries-in-gearman/10348461#10348461 and here: https://stackoverflow.com/questions/15790531/delay-a-gearman-job-in-php