humanmade / Cavalcade

A better wp-cron. Horizontally scalable, works perfectly with multisite.
https://engineering.hmn.md/projects/cavalcade/
Other
512 stars 46 forks source link

Performance issues if checking for schedules during every page load #117

Closed roborourke closed 1 year ago

roborourke commented 1 year ago

It's a common pattern in WP to check for a scheduled job and add it if it doesn't exist e.g.

add_action( 'init', function () {
    if ( ! wp_next_scheduled( 'my_hook' ) ) {
        wp_schedule_event( time(), 'hourly', 'my_hook' );
    }
} );

Typically the list of cron tasks is stored in the options table which is autoloaded and the impact is pretty minimal in terms of performance to do this check on every invocation of WP.

When using Cavalcade we're not able to rely on any persistent object caching for the database queries so there are additional uncached reads on every page load for every event being checked - this is also on a significantly larger db table than just the value from wp_options.

ideally we can solve this performance issue to avoid making multiple db queries for this common pattern.

roborourke commented 1 year ago

Could we leave the option in the wp_options table intact, and use that data for checking whether an event is scheduled?

Potentially flaky and not referencing the real source of truth...

roborourke commented 1 year ago

Maybe set an autoloaded option for each scheduled event and check for that before hitting the cavalcade jobs table. Could be keyed on hook and args - issue is the timestamp argument. Maybe bypass option check if timestamp isn't null?

svandragt commented 1 year ago

Was looking into this for a little bit (distractions). So without Cavalcade, WordPress gets the crons from an option within _get_cron_array() (docs). Cavalcade uses a db query inside the same pre_reschedule_event hook (source and pre_get_scheduled_event so perhaps it should move to a model where it updates the option AFTER the cron event completes? Then it can also use the option at init time, and it'd just have a db call if the option doesn't exist yet.

roborourke commented 1 year ago

The option is completely bypassed currently, I'm not sure I follow the benefit or process of updating the option after the cron event completes? Can you elaborate a bit more?

johnbillion commented 1 year ago

When wp_next_scheduled() is called, could Cavalcade fetch all events with one query (SELECT * FROM cavalcade_jobs) and cache the result for the current page load? This would mean O(1) query per page load instead of O(n). Probably need to invalidate the cache if wp_schedule_event() is called.

roborourke commented 1 year ago

The jobs table gets absolutely massive as each run is a new row. Could perhaps do so with a where clause for nextrun > now() at least but yeah - calls to wp_schedule_event() will be an issue.

johnbillion commented 1 year ago

Ah yeah in that case it would need a clause to only fetch future events. The cron option used by default in WordPress contains all future events so the data stored for future events in the cavalcade_jobs table should be of similar size.

rmccue commented 1 year ago

Not sure if I addressed elsewhere, but: Cavalcade's design does intentionally query the database for this data rather than using the object cache (valuing freshness over performance), and database queries are not a problem per se, so we need to make sure any solution is worth the cost. I am assuming from this issue that we've directly identified it as an actual performance bottleneck, but I don't recall seeing the discussion anywhere.

Querying for all pending jobs (potentially deferred to the first wp_next_scheduled() check) makes sense to me, as the marginal query time for getting all jobs isn't much higher than getting one; for the majority of cases, the connection overhead is more likely to dominate than the query time I imagine (for n < 10k).

We may only want to do that for recurring jobs rather than all jobs, where wp_next_scheduled() makes sense to call. This will also cull the size of data down, at the likely cost of two DB queries for a missing job vs currently one (since we'd need to fall back to querying individually if not in the pre-fetched data). May improve the common case nonetheless.

The cron option used by default in WordPress contains all future events so the data stored for future events in the cavalcade_jobs table should be of similar size.

That's an assumption that may not hold; Cavalcade is designed to be able to scale beyond WP's limits, so we wouldn't want to assume the array would still be small.

It's not likely to be at the too-big-for-PHP-memory size, but it could be at the too-big-for-one-query size.

rmccue commented 1 year ago

Looks like this was a performance regression ultimately from when we switched to the preflight filters in https://github.com/humanmade/Cavalcade/pull/91/files; previously, we populated the full array once per page load, and the non-persistent cache was then used for follow-ups.

Restoring that functionality would likely fix this issue.

roborourke commented 1 year ago

Ah nice. Not totally sure how to go about that with the get_jobs_by_query() implementation... Filtering the array after the actual db query?

roborourke commented 1 year ago

I don't think this plugin is the right place for it but I've written up an alternative approach to avoid the problem described here on frontend requests that we could add to Altis Cloud, and that others could just implement if they wish:

Closing this out as the current setup is the best on balance between efficiency and scalability.