woocommerce / action-scheduler

A scalable, traceable job queue for background processing large queues of tasks in WordPress. Specifically designed for distribution in WordPress plugins (and themes) - no server access required.
https://actionscheduler.org
GNU General Public License v3.0
626 stars 113 forks source link

Review and revise exception handling within Action Scheduler #1059

Open alewolf opened 3 months ago

alewolf commented 3 months ago

There are quite a few occasions when Action Scheduler experiences an unexpected condition, and (mostly for safety) it stops. It communicates about this using exceptions, but many of these exceptions:

In most cases, this is 'fine' in the sense that the queue is processed in requests that are independent of user-facing requests. However, they could impact other scheduled events when running in the context of WP Cron.

Additionally, there is the problem noted below which is that only one version of Action Scheduler can be active at any one time. That is, only one parent plugin 'wins' this particular race, and if things go wrong—irrespective of the underlying reason—the backtraces often lead users back to that plugin's support team.

Let's review and revise when we throw exceptions, and when we should add handling for them.

Original report follows...


I bundle the ActionScheduler version 3.7.4 in the Pixel Manager for WooCommerce.

As of recently we started to receive critical error reports like this:

2024-05-13T08:24:09+00:00 Critical Uncaught InvalidArgumentException: Unidentified action 19061390 in /nas/content/live/nck/wp-content/plugins/pixel-manager-pro-for-woocommerce/vendor/woocommerce/action-scheduler/classes/data-stores/ActionScheduler_DBStore.php:1086  
Additional context
{
    "error": {
        "type": 1,
        "file": "\/nas\/content\/live\/nck\/wp-content\/plugins\/pixel-manager-pro-for-woocommerce\/vendor\/woocommerce\/action-scheduler\/classes\/data-stores\/ActionScheduler_DBStore.php",
        "line": 1086
    },
    "backtrace": [
        "",
        "#0 \/nas\/content\/live\/nck\/wp-content\/plugins\/pixel-manager-pro-for-woocommerce\/vendor\/woocommerce\/action-scheduler\/classes\/abstracts\/ActionScheduler_Abstract_QueueRunner.php(120): ActionScheduler_DBStore->mark_failure(19061390)",
        "#1 \/nas\/content\/live\/nck\/wp-content\/plugins\/pixel-manager-pro-for-woocommerce\/vendor\/woocommerce\/action-scheduler\/classes\/abstracts\/ActionScheduler_Abstract_QueueRunner.php(98): ActionScheduler_Abstract_QueueRunner->handle_action_error(19061390, Object(Exception), 'Async Request', true)",
        "#2 \/nas\/content\/live\/nck\/wp-content\/plugins\/pixel-manager-pro-for-woocommerce\/vendor\/woocommerce\/action-scheduler\/classes\/ActionScheduler_QueueRunner.php(169): ActionScheduler_Abstract_QueueRunner->process_action(19061390, 'Async Request')",
        "#3 \/nas\/content\/live\/nck\/wp-content\/plugins\/pixel-manager-pro-for-woocommerce\/vendor\/woocommerce\/action-scheduler\/classes\/ActionScheduler_QueueRunner.php(139): ActionScheduler_QueueRunner->do_batch(25, 'Async Request')",
        "#4 \/nas\/content\/live\/nck\/wp-includes\/class-wp-hook.php(324): ActionScheduler_QueueRunner->run('Async Request')",
        "#5 \/nas\/content\/live\/nck\/wp-includes\/class-wp-hook.php(348): WP_Hook->apply_filters('', Array)",
        "#6 \/nas\/content\/live\/nck\/wp-includes\/plugin.php(517): WP_Hook->do_action(Array)",
        "#7 \/nas\/content\/live\/nck\/wp-content\/plugins\/pixel-manager-pro-for-woocommerce\/vendor\/woocommerce\/action-scheduler\/classes\/ActionScheduler_AsyncRequest_QueueRunner.php(52): do_action('action_schedule...', 'Async Request')",
        "#8 \/nas\/content\/live\/nck\/wp-content\/plugins\/woocommerce\/includes\/libraries\/wp-async-request.php(147): ActionScheduler_AsyncRequest_QueueRunner->handle()",
        "#9 \/nas\/content\/live\/nck\/wp-includes\/class-wp-hook.php(324): WP_Async_Request->maybe_handle('')",
        "#10 \/nas\/content\/live\/nck\/wp-includes\/class-wp-hook.php(348): WP_Hook->apply_filters('', Array)",
        "#11 \/nas\/content\/live\/nck\/wp-includes\/plugin.php(517): WP_Hook->do_action(Array)",
        "#12 \/nas\/content\/live\/nck\/wp-admin\/admin-ajax.php(207): do_action('wp_ajax_nopriv_...')",
        "#13 {main}",
        "thrown"
    ]
}
  1. In one particular case that I analysed the scheduled action didn't even come from the Pixel Manager. But, WooCommerce and the Pixel Manager both bundle version 3.7.4 of the ActionScheduler. It seems that the version shipped by the Pixel Manager is loaded first (probably because of the alphabetical order). The error path shows the Pixel Manager and we get all support requests.
  2. There is no helpful information in that error. In the case I analysed the action 19061390 exists and has just failed to execute.
  3. What's even more annoying is that one of our customers reached out to WooCommerce support and they reverted him to us, because the error was thrown from the ActionScheduler that's shipped in the Pixel Manager. Even though the action probably was not even scheduled by the Pixel Manager. Since WooCommerce is the maintainer of the ActionScheduler you should also support users who have issues with it.
  4. I will now revert back to a lower version of the ActionScheduler, just to avoid further support requests. But, you should really do something about this problem. All other plugins that ship version 3.7.4 will have the same issue. Unwarranted support requests because the ActionScheduler doesn't handle failed actions gracefully and logs them in the fatal-errors log.
barryhughes commented 3 months ago

Hi @alewolf,

Are you quite sure the problem is specifically tied to 3.7.4? This was a fairly minimal set of changes. What version did you switch to, and have you found that that has resolved the problems (at least in cases where WooCommerce, or other plugins shipping 3.7.4, are not present)?

There is no helpful information in that error.

We could probably finesse some of these, thank you for flagging. For what it's worth, errors of this type may point to a race condition between two queue runners, or between Action Scheduler's own queue runners and some custom code.

What's even more annoying is that one of our customers reached out to WooCommerce support and they reverted him to us, because the error was thrown from the ActionScheduler that's shipped in the Pixel Manager. Even though the action probably was not even scheduled by the Pixel Manager. Since WooCommerce is the maintainer of the ActionScheduler you should also support users who have issues with it.

I'm sure this can be annoying, but things are not always quite as clear cut as you are suggesting. We do support Action Scheduler, here and to a lesser extent in the WordPress community forums, but Action Scheduler is a conduit for processing work: errors surface through it, but not always because of it.

With your experience, you may understand when and how to make the distinction, but I'm not sure it's realistic to expect large support teams to make the right call 100% of the time. Even in this case, it's not completely clear that the problem has its origin in Action Scheduler.

I will now revert back to a lower version of the ActionScheduler, just to avoid further support requests. But, you should really do something about this problem. All other plugins that ship version 3.7.4 will have the same issue.

These exceptions are not new, and as above I'm not convinced they are tied to 3.7.4 specifically. Definitely let me know if I'm missing something obvious, here.

Unwarranted support requests because the ActionScheduler doesn't handle failed actions gracefully and logs them in the fatal-errors log.

Though I can take some guesses as to what you'd like to see here, I'd much rather get your informed opinion. When an unexpected condition is met, should it: fail silently, log the errors somewhere else, carry on regardless? When should we let site operators know of problems, and when should we suppress them?

alewolf commented 3 months ago

Hi @barryhughes

Are you quite sure the problem is specifically tied to 3.7.4? This was a fairly minimal set of changes. What version did you switch to, and have you found that that has resolved the problems (at least in cases where WooCommerce, or other plugins shipping 3.7.4, are not present)?

I upgraded from 3.7.2 to 3.7.4, then the support requests started.

I can't say if it is bound to 3.7.4 or which version exactly. I only can say when upgrading to the same version as what's bundled in WooCommerce, then my bundled version will take precedence. And whatever code triggers those errors, users will contact our support because the ActionScheduler loads from our bundle and the Pixel Manager is visible in the path.

These exceptions are not new, and as above I'm not convinced they are tied to 3.7.4 specifically. Definitely let me know if I'm missing something obvious, here.

As explained in my earlier post, I've downgraded to one version below WooCommerce. As expected it will now hit another install. Now Rank Math will get those support requests:

Uncaught InvalidArgumentException: Unidentified action 19146968 in /nas/content/live/nck/wp-content/plugins/seo-by-rank-math/vendor/woocommerce/action-scheduler/classes/data-stores/ActionScheduler_DBStore.php:1086

Though I can take some guesses as to what you'd like to see here, I'd much rather get your informed opinion. When an unexpected condition is met, should it: fail silently, log the errors somewhere else, carry on regardless? When should we let site operators know of problems, and when should we suppress them?

barryhughes commented 3 months ago
  • When more than one plugin is loading the same version of the ActionScheduler as the one bundled in WooCommerce, then the one bundled with WooCommerce should take precedence.

This doesn't feel like the right direction, to me.

I'm curious, though: in the case of your plugin it seems like WooCommerce is a requirement. What prevents you from using the WC Queue, or from using WooCommerce's copy of Action Scheduler directly (vs bundling your own)?

alewolf commented 3 months ago

What prevents you from using the WC Queue, or from using WooCommerce's copy of Action Scheduler directly (vs bundling your own)?

Good question.

Technically the Pixel Manager can also be used on installs where WooCommerce is not running (because some of our users don't want to use more than one tagging solution).

Until now I haven't thought about loading the Action Scheduler conditionally, depending if WooCommerce is active or not.

However, I fear that it might not be possible to implement conditional loading.

The Action Scheduler needs to be required very early in the loading process. If I understand correctly this is to make sure that the Action Scheduler can check if only one, and only the most recent version is loaded, in case more than one plugin tries to load it. However, it is not possible to check if WooCommerce is active so early.

Let me know if you think this is possible. If so, I will change the logic and only load it conditionally and let WooCommerce take over.

barryhughes commented 3 months ago

Technically the Pixel Manager can also be used on installs where WooCommerce is not running.

Gotcha, I misunderstood in that case.

Let me know if you think this is possible.

It probably is possible, but given the context I suspect the potential risk and complexity may not be worth the gain.

barryhughes commented 2 months ago

Occasionally seeing a failed action with this message:

action failed via WP Cron: Unable to update the status of action ###### to in-progress.

Did some Googling etc but not able to find any real explanation or solution.

This was reported on the community forums. Noting it here because, similar to the work we did in https://github.com/woocommerce/action-scheduler/pull/1060, we can probably add additional context to make things clearer.

barryhughes commented 2 weeks ago

Updated title and description. We're hoping to look at options here soon.

crstauf commented 2 weeks ago

@barryhughes Would this error be one that you would expect an author of a plugin (bundling AS) to handle?

Uncaught InvalidArgumentException: Unidentified action 19146968

That is to say, if a customer came to WooCommerce support about the above error, would your internal process direct to investigate further, or simply send away to whichever plugin bundled Action Scheduler?

Certainly there are errors that are because of Action Scheduler, and not because of the bundling plugin. Perhaps including in the error message direction on who is the best first contact would be helpful. Off the top of my head, this could potentially be determined by checking $wp_filter[ $hook ] for the callbacks on the scheduled action.

I think @alewolf has a very valid point, that plugin support teams can potentially get flooded with support requests that are not related to their plugin, and everyone involved should be do all they can to keep that from happening, especially Action Scheduler, being the common technology. If it's not handled well, then plugin authors will do exactly as @alewolf has done, and use older versions in an attempt to avoid the support burden. Not exactly a winning approach to encourage developers to use the latest version.

crstauf commented 2 weeks ago

@alewolf In terms of conditionally loading, the introduction of Plugin Dependencies in WP 6.5 may give you a more viable approach:

/**
 * Plugin name: Pixel Manager for WooCommerce
 * Requires Plugins: action-scheduler
 */

...

if ( version_compare( $wp_version, '6.5' ) < 0) {
    // require bundled AS
}

This does add an additional step for site admins, but is an option to use the latest version while also alleviating support tickets. As a site admin, I always prefer installing AS explicitly anyways; maybe there are more like me.

barryhughes commented 2 weeks ago

Certainly there are errors that are because of Action Scheduler

Definitely. Others are not, and yet others occupy a grey area somewhere in-between (and relate to the volume of work being processed, alongside the available resources).

I think @alewolf has a very valid point.

Yep, so do we, hence the latest update.

crstauf commented 2 weeks ago

Yep, so do we, hence the latest update.

Oh sorry, I thought this ticket was generalized to changing exception handling and messages, not specific to alleviating false support tickets.

barryhughes commented 2 weeks ago

Yes—that's the primary technical focus. The support aspect is one of the motivations for doing this, and something we'll look to alleviate through those changes.

barryhughes commented 1 week ago
ActionScheduler_ActionFactory::repeat()
    classes/ActionScheduler_ActionFactory.php:233

↑ This is already caught, by ActionScheduler_Abstract_QueueRunner::schedule_next_instance(), which is the only location where we call it. Third parties invoking this directly are responsible for their own exception handling 🟢

barryhughes commented 1 week ago
ActionScheduler_Store::validate_args()
    classes/abstracts/ActionScheduler_Store.php:324

↑ This is already caught, by ActionScheduler_DBStore::fetch_action(), which is the only location where we call it. Third parties invoking this directly are responsible for their own exception handling 🟢

(Edit: 2 exceptions are thrown from this method, but both are covered.)

barryhughes commented 1 week ago
ActionScheduler_Store::validate_action()
    classes/abstracts/ActionScheduler_Store.php:359

↑ This is already caught, by ActionScheduler_DBStore::save_action_to_db() Third parties invoking this directly are responsible for their own exception handling 🟢

barryhughes commented 1 week ago
ActionScheduler_Abstract_QueueRunner::process_action()
    classes/abstracts/ActionScheduler_Abstract_QueueRunner.php:64
ActionScheduler_Abstract_QueueRunner::process_action()
    classes/abstracts/ActionScheduler_Abstract_QueueRunner.php:95

↑ Both of the above are part of the current nested try/catch used for error handling. The exceptions are caught in the very same structure, in the same method (but we can probably simplify this now that we don't need to support PHP 5.6) 🟠

Let's review the nested try/catch. Low priority.

barryhughes commented 1 week ago
ActionScheduler_Abstract_ListTable::get_bulk_actions()
    classes/abstracts/ActionScheduler_Abstract_ListTable.php:162

↑ Unclear who/what we're guarding against here. I'd think if a bulk action lacks a valid callback, we should log that but simply omit it from the returned array of possible actions.

Needs attention, low priority (I don't recall seeing reports about this specific exception) 🟠

barryhughes commented 1 week ago
ActionScheduler_WPCLI_QueueRunner::__construct()
    classes/WP_CLI/ActionScheduler_WPCLI_QueueRunner.php:33

↑ This code should only run in the context of WP CLI. The exception seems justified, no follow-up needed at present 🟢

barryhughes commented 1 week ago
ProgressBar::__construct()
    classes/WP_CLI/ProgressBar.php:47

↑ This code should only run in the context of WP CLI. The exception seems justified, no follow-up needed at present 🟢

barryhughes commented 1 week ago
ActionScheduler_DBStore::save_action_to_db()
    classes/data-stores/ActionScheduler_DBStore.php:113
ActionScheduler_DBStore::save_action_to_db()
    classes/data-stores/ActionScheduler_DBStore.php:118
ActionScheduler_DBStore::save_action_to_db()
    classes/data-stores/ActionScheduler_DBStore.php:126

↑ For all of the above, the exception is thrown from within a private method but we trigger this through various paths. In some cases, we catch the exception then log an error. In others, we do not and execution probably halts. Needs review 🔴

barryhughes commented 1 week ago
ActionScheduler_DBStore::get_query_actions_sql()
    classes/data-stores/ActionScheduler_DBStore.php:397
ActionScheduler_DBStore::get_query_actions_sql()
    classes/data-stores/ActionScheduler_DBStore.php:455
ActionScheduler_DBStore::get_query_actions_sql()
    classes/data-stores/ActionScheduler_DBStore.php:470
ActionScheduler_DBStore::get_query_actions_sql()
    classes/data-stores/ActionScheduler_DBStore.php:493

↑ These are all within ActionScheduler_DBStore::get_query_actions_sql(). It's called from various public API functions like as_get_scheduled_actions() which lack protection against the exceptions. These particular exceptions seem somewhat reasonable, they guard against misuse that should generally be caught during development, but that can't be guaranteed. Needs review 🟠