humanmade / altis-cloud

Cloud Module for Altis
https://docs.altis-dxp.com/cloud/
9 stars 2 forks source link

Avoid Cavalcade lookups on front-end requests #702

Open roborourke opened 1 year ago

roborourke commented 1 year ago

Per this issue on the Cavalcade repo when db latency is a factor the common pattern of checking for and scheduling cron jobs is often done on init. WP core does this as well many 3rd party plugins etc...

In theory we could avoid this altogether using the preflight filter to limit these checks to the admin only, provided they are added during the init hook:

add_filter( 'pre_get_scheduled_event', function ( $pre ) {
    if ( ! doing_action( 'init' ) || is_admin() ) {
       return $pre;
    }
    return true;
}, 1 );

Events will still be scheduled but on admin requests only.

This would cause issues if plugins are adding events on the frontend using the init hook based on user interactions and $_GET or $_POST parameters, although that would be far more likely dealt with not on the init hook.

This approach seems safe to me, especially if we also check that $_POST is empty. Would be great to get some additional input and feedback though.

roborourke commented 1 year ago

Suggestion from @kovshenin

It is a good compromise, though I'd be careful returning true for all events. Instead maybe have an explicit list of events that you expect to be scheduled at all times, and let the checks for those only run within wp-admin.

kovshenin commented 1 year ago

This is also a good opportunity to improve our scheduled tasks docs, specifically suggest ways how to add single and recurring tasks, how and when to check wp_next_scheduled and why it's better done during altis.migrate instead of init.

By default in altis/cloud we can short-circuit the core https check event, the privacy events, etc. that we know are most likely to already be scheduled. We can also try and move them to altis.migrate if that makes more sense.

kadamwhite commented 1 year ago

Implementing this filter from application code as described here doesn't seem to have any effect on whether Cavalcade queries are executed on the frontend, for me. In fact, returning either true or false seems instead to guarantee that there will also be a call to Job->save() which triggers additional queries.

It seems like possibly Cavalcade runs so early, that I am not able to set filters from an mu-plugin which actually run during the cavalcade job lookup process?

kadamwhite commented 1 year ago

My assumption was incorrect, but returning true causes an error because cron.php checks $returned_event->timestampif it is not null orfalse`, so the only way I've reliably disabled an event looks like this:

/**
 * Skip cavalcade scheduling on high-traffic frontend pages.
 *
 * @param null|false|object $pre  Value to return instead. Default null to continue retrieving the event.
 * @param string            $hook Action hook of the event.
 * @param array             $args Array containing each separate argument to pass to the hook's callback function.
 * @return null|false|object Returning a non-null value will short-circuit the normal process.
 */
function mytheme_skip_cavalcade_scheduled_event_on_high_traffic_pages( $pre, $hook, $args ) {
    if ( is_admin() ) {
        // Do not alter processing of events within the admin.
        return $pre;
    }

    // Most of the events we want to skip run before any globals are set, so
    // it is not straightforward to skip these only on is_single() or is_home().
    // Skip relevant jobs on all frontend pages, for now.

    $is_skippable_hook = in_array(
        $hook,
        [
            'wp_https_detection',
            'extendable-aggregator-sync-cron',
            'wp_site_health_scheduled_check',
            'wp_privacy_delete_old_export_files',
        ],
        true
    );

    if ( $is_skippable_hook ) {
        $fake_scheduled_event = new stdClass();
        // Perpetually one hour in the future to defer any queries.
        $fake_scheduled_event->timestamp = time() + HOUR_IN_SECONDS;
        return $fake_scheduled_event;
    }

    return $pre;
}

add_filter( 'pre_get_scheduled_event', 'mytheme_skip_cavalcade_scheduled_event_on_high_traffic_pages', 10, 3 );

init cannot be used as the sole restriction because the health check initiator runs before init.

roborourke commented 1 year ago

Can we just return false?

I this is the only way would you mind updating the PR #713 ?

kadamwhite commented 1 year ago

@roborourke if we return false, Cavalcade treats the job as unscheduled, and does us the "favor" of querying to set it. That pumps additional jobs into the table and causes an INSERT on every page view.

image
kovshenin commented 1 year ago

If $pre is not null then something else has already modified it, likely at an earlier priority and I think we should respect that and return early, I was thinking if ( is_admin() || ! is_null( $pre ) ) {

Can we just return false?

Returning false will also short-circuit the filter because there is an explicit check for null:

https://github.com/WordPress/wordpress-develop/blob/9690aa7f7b0b7acd5acad54278952622585dec16/src/wp-includes/cron.php#L746-L749

kadamwhite commented 1 year ago

@kovshenin See above—it short-circuits the filter, but then the plugin treats that as an opportunity to "fix" the situation by setting the missing job. That results in more queries, which is not the goal.

roborourke commented 1 year ago

Shortcircuiting is the point though? Cavalcade should be respecting any previous modifications to the value on that filter.

kovshenin commented 1 year ago

Sorry I should have been a bit more clear. Short-circuit with a false at the end tells Cavalcade that the none of the jobs exist, which is what we don't want, as demonstrated by @kadamwhite's screenshot. We only want to short-circuit for the jobs that we know, i.e. $is_skippable_hook, and allow Cavalcade to proceed with the checks to fetch any remaining jobs from the DB, and add them to the schedule if absent.

kadamwhite commented 1 year ago

@kovshenin The change I made which resulted in additional queries was to return false where I am otherwise returning the fake scheduled notice, above.

    if ( $is_skippable_hook ) {
        return false;
    }

I wasn't short-circuiting at the end, only for the jobs that we know. However, returning false causes WP to treat the job as not scheduled — which, as I've attempted to illustrate, means that the health check then schedules the event, unnecessarily.

        if ( ! wp_next_scheduled( 'wp_site_health_scheduled_check' ) && ! wp_installing() ) {
            wp_schedule_event( time() + DAY_IN_SECONDS, 'weekly', 'wp_site_health_scheduled_check' );
        }

Returning false doesn't skip processing of that hook, it instead notifies that the event must be scheduled. That results in more queries overall. The only way to skip both looking up, and scheduling an event, is to return an object with a non-falsy timestamp property so that each of the functions which check for a scheduled event will decide no further action is needed.

kovshenin commented 1 year ago

Ah, apologies, I thought @roborourke's comment was referring to that last return $pre;. Returning an object with a timestamp sounds correct.