google / site-kit-wp

Site Kit is a one-stop solution for WordPress users to use everything Google has to offer to make them successful on the web.
https://sitekit.withgoogle.com
Apache License 2.0
1.22k stars 278 forks source link

Enhance reliability of feature synchronization #7531

Closed aaemnnosttv closed 2 weeks ago

aaemnnosttv commented 1 year ago

Feature Description

Features that are controlled by remote activation are currently only refreshed via a WP cron job twice daily after being received as part of the initial setup.

For most sites this works well, although some sites may have WP cron disabled without properly configuring WP cron to run by an alternative method which can leave a site without important features until the flag is removed entirely, which can be an unknown length of time after the feature is made generally available.

Site Kit should have an alternative method to synchronize its enabled features that does not rely on WP cron.

This issue is related to #6015 but independent from a requirements perspective.


Do not alter or remove anything below. The following sections will be managed by moderators only.

Acceptance criteria

Implementation Brief

Some context: Since remote features are updated only after cron sync happens, we can't listen to the on_change to check if update happend or not. So this approach handles the timestamp check on admin_init and fires a non-blocking request that in turn invokes the CRON callback action.

Test Coverage

QA Brief

QA:Eng

Changelog entry

techanvil commented 8 months ago

Hey @jimmymadon, I think it would be worth revising and fleshing out the AC here a bit.

jimmymadon commented 8 months ago

@techanvil All good points. I think the period would have to be amendable per task? This might complicate the issue slightly since some of the potential IBs for the Remote Features sync could simply rely on certain actions/events in the plugin rather than a specific heartbeat period. I feel we will essentially be creating a dismissed-notification style system that relies on running certain tasks if a certain time has already passed. I'm not against this and this would be quite helpful as it will be generic.

UPDATE: Would we need some UI elements to allow the user to disable the running of these tasks? If yes, then we'd need UX's input and this feature would now be something that the user will have to be made aware of which is not ideal for 'background running tasks'.

c.c. @aaemnnosttv

techanvil commented 8 months ago

Thanks @jimmymadon.

@techanvil All good points. I think the period would have to be amendable per task? This might complicate the issue slightly since some of the potential IBs for the Remote Features sync could simply rely on certain actions/events in the plugin rather than a specific heartbeat period. I feel we will essentially be creating a dismissed-notification style system that relies on running certain tasks if a certain time has already passed. I'm not against this and this would be quite helpful as it will be generic.

I think the most straightforward approach to take would be to provide our own API modelled on the WP schedule function calls that we use. So essentially our own version of wp_next_scheduled(), wp_schedule_event(), wp_schedule_single_event() and wp_clear_scheduled_hook(), each of which calls its corresponding function if WP cron is enabled, otherwise calls our own implementation (unless disabled). That should avoid any conceptual ambiguity, and we can flesh out the underlying scheduling mechanism in the IB (unless you prefer to get into it a bit in the AC, which would be fair too).

We may want to start off with wp_next_scheduled() and wp_schedule_event(), to cover the enabled features sync as detailed in this issue, with a view to adding the others later. Or we could do them all now, I think that would be perfectly OK.

Incidentally I noticed another place we're scheduling an event, refreshing the user's profile data in OAuth_Client.

UPDATE: Would we need some UI elements to allow the user to disable the running of these tasks? If yes, then we'd need UX's input and this feature would now be something that the user will have to be made aware of which is not ideal for 'background running tasks'.

When I mean it should be disabled by the user, I meant via a WP filter rather than any UI-side control, I don't think that would be necessary.

jimmymadon commented 8 months ago

@techanvil Your modelled solution seems like a good idea. I think the ACs as they are cover the "what" quite well and I'm a tad bit skeptical of adding the contents of your comment to the AC as it goes too much into the "how". So I've added a comment for an IB author to look at your comment as one good solution to model the IB on.

I have updated the AC to accommodate for a new WP filter which would disable our version of the scheduler.

c.c. @aaemnnosttv

techanvil commented 8 months ago

Thanks @jimmymadon, that SGTM. AC :white_check_mark:

techanvil commented 8 months ago

Hi @zutigrm, thanks for drafting this IB. First off, I would say I think it's a pretty ingenious idea, and I think could be a fitting solution in a different situation. That said, I don't think it's such a good fit for the plugin for the following reasons.

You make a good point about potentially needing to duplicate wp-cron.php, however, on balance I think it would be preferable to take an approach where we do provide our own infrastructure rather than piggybacking on WP cron for the reasons outlined above. Ideally we could provide a lightweight WP cron-a-like that only implements the bits that are needed by Site Kit.

Of course, we could always consider dialling the requirements back to only cover syncing the enabled features, as per the Feature Description. Seeing as you've raised the point about the extent of the infrastructure work that would be required, let's check in with @aaemnnosttv to see what he thinks before carrying on with this one as currently specced.

aaemnnosttv commented 8 months ago

@techanvil First, I think it's important to clarify the behavior of DISABLE_WP_CRON as it's a bit of a misnomer; the constant does not disable WP cron entirely – only the mechanism which spawns requests for invoking/running it. By default, if not disabled, WP will make a request to /wp-cron.php (see spawn_cron). However, even if set cron can still be run by requesting /wp-cron.php by another means, or via wp-cli (wp cron event run --due-now) usually powered by a real cron job. This can be preferable to make cron more reliable for time-sensitive tasks which would otherwise depend on web traffic to push things along.

All that to say, I don't think we should be overly concerned with the idea that this mechanism should be conditional based on this constant. Instead, I think we want to focus more on the schedule which should be complementary to WP cron but only for our jobs. This also gives us an opportunity to think about jobs that we may be using WP cron for that we don't really need to run all the time and could be "lazier" – that is, only running when SK/wp-admin is being used.

So I definitely agree that we want to avoid reimplementing or copying substantial parts from WP cron internals but I don't think we need to be entirely decoupled from it either. We could potentially use WP cron for everything still, but as mentioned before, we could consider introducing the concept of a lazy job too.

For tasks that do leverage wp cron (such as feature syncing), we could check when the last time it was run, and if the schedule is due, we can dispatch it. It's probably worth looking at how this is done via wp-cli for individual jobs (wp cron event run <hook>...). It could be worth looking at other plugins like WP Crontrol for inspiration as well.


I hope that helps. One more thing that comes to mind that we'll want to account for is the current user since there is no user set for normal cron jobs, but if we're invoking this from the client it will have that set by default, so we'll want to make sure our pseudo-cron is as close to the real thing as possible if we're planning on running the same jobs in each context.

techanvil commented 8 months ago

Hey @aaemnnosttv, thanks for the detail here. One thing though, although we can presumably still trigger wp-cron.php when DISABLE_WP_CRON is defined, it does seem that conceptually we're not "supposed" to, as per this comment in the source for wp-cron.php:

  • Defining DISABLE_WP_CRON as true and calling this file directly are
  • mutually exclusive and the latter does not rely on the former to work.

Although, admittedly the clarifying comment "and the latter does not rely on the former to work" makes me wonder if the term "mutually exclusive" is actually being applied correctly here... :thinking:. WDYT?

aaemnnosttv commented 7 months ago

@techanvil I saw that comment as well and my thought is that if they didn't want it to run if DISABLE_WP_CRON was defined as true, they could exit early but that constant is never checked in that file so the value appears to be irrelevant once we get to that point. I agree it's a bit confusing but I think we can proceed based on how we can see it works and ignore that comment for now.

techanvil commented 7 months ago

Thanks @aaemnnosttv, sounds reasonable. I guess this one's back over to you @zutigrm for a fresh look, taking Evan and my feedback above into account. I know there are still a number of avenues this could go down so please feel free to drop me a line if you want to discuss it further :)

zutigrm commented 7 months ago

@aaemnnosttv @techanvil Thank you for the input. I have re-wrote the approach to build our own infrastructure for Scheduling events. I was trying to make it simple and efficient, back to you @techanvil for review

zutigrm commented 7 months ago

@techanvil When this makes to EB, please assign it to me :D Thanks

techanvil commented 7 months ago

Thanks @zutigrm, the service you have outlined looks pretty good.

That said, I have to confess that the further we get into this the more I'm wondering if we are doing the right thing here.

Based on our conversation to date it looks like actually firing off Cron tasks directly would involve too much copying of the WP Cron internals, hence the design for our own scheduling system which is independent of WP Cron.

What the IB is missing is coverage of the last two AC points:

  • Tasks should run only if WP Cron is disabled.
  • There should be a way to disable the running of these tasks via a WP filter.

I realise though that @aaemnnosttv has said the following which does indicate a steer away from the point about only running if WP Cron is disabled:

All that to say, I don't think we should be overly concerned with the idea that this mechanism should be conditional based on this constant. Instead, I think we want to focus more on the schedule which should be complementary to WP cron but only for our jobs.

The thing is, WP Cron is the idiomatic way for scheduling tasks in WordPress. It's a core feature, and there is an ecosystem supporting it e.g. plugins where users can inspect their Cron schedule and so forth. It's something we should be able to rely on in the majority of cases. Also, I would imagine that some users or their site admins will have gone to some lengths to get it running on their system to allow the self-routed request mechanism to work, that might not work off the bat for our own implementation.

All that is to say, I do think we should stick with WP Cron as the main scheduler, with the implication being that we should make our scheduler a fallback that is dependent on DISABLE_WP_CRON. Assuming we are continuing with the direction of implementing our own scheduler.

So, before we get further into the IBR, I would like to make sure that we're fully aligned on the requirements here and the AC is accurate.

As I mentioned at the top of the comment, having got further into it I do wonder if we're taking the right approach here. Do we have an idea of how many users might actually be affected by this? If we're basically on the margins of our user base it might not be worth spending this much development effort on it, and we might be better off revisiting the core requirement of improving the feature synchronisation reliability as a standalone task.

I would be interested to know what Evan thinks on the above. @aaemnnosttv, please can you let us know your thoughts here?

aaemnnosttv commented 1 month ago

That said, I have to confess that the further we get into this the more I'm wondering if we are doing the right thing here.

Based on our conversation to date it looks like actually firing off Cron tasks directly would involve too much copying of the WP Cron internals, hence the design for our own scheduling system which is independent of WP Cron.

Thanks @techanvil – I agree, I think we've somewhat lost sight of the goal here in pursuit of a more ambitious solution.

I think we can simplify things by decoupling this from WP cron entirely. It shouldn't be concerned with whether or not WP cron is disabled. At the end of the day, a cron is essentially a do_action( $hook, ...$args ) and this is about adding a reasonable fallback mechanism rather than mirroring the behavior and timing of WP cron. While it would be nice to enhance this for all crons we use, this has proven to be a bit more complicated and stalled us out here.

Let's keep the focus here on feature synchronization only for now. We essentially just need to trigger the same hook that is registered with cron if a certain amount of time has elapsed since the features were last synchronized (e.g. 24 hours) which can happen by listening for updates to the remote features option, for example. This makes it less different than other synchronization tasks we have and should help to make for a clearer path forward. If wp cron is functioning, this fallback would never run so long as it is set to run later than the usual schedule, and that's no problem.

On a related note, I would like to move #8341 forward first as it makes changes to where features are handled before we proceed with this addition.

With that said, let's update the AC and restart with a new IB (sorry @zutigrm – I do like the abstraction around Events – maybe we can save some of this in #6992?)

cc: @binnieshah

IB backup for safekeeping ### Add `Google\Site_Kit\Core\Schedule\Events.php` class * Add `OPTION` constant that will hold the option name, say `googlesitekit_scheduled_events` * Include `Google\Site_Kit\Core\Storage\Options`, and `Google\Site_Kit\Core\Storage\Transients` instances as dependencies, and assign them as the class properties #### Add `register` method * Hook into `admin_init` and invoke callback method, say `run_scheduled_events` #### Add `add_event` method * It should accept following parameters: * `timestamp` of when event will run * `hook` that will be used for callback * `arguments` array * `recurrent` optional - timestamp, it can be passed as `HOUR_IN_SECONDS` etc, or by default `null` if it is a single event * Format the event item array, so that `timestamp` is the key of multidimensional array holding remaining key => value pairs. * Pass the new scheduled event array as argument to the `set_events` method #### Add `set_events` method * It should accept event array as argument * Retrieve all scheduled events from `googlesitekit_scheduled_events` option * Replicate the `merge` method to add event array to the option https://github.com/google/site-kit-wp/blob/2d1a97ac9d56fbdb547d63e5bbe03a4a800ca679/includes/Core/Modules/Module_Settings.php#L45-L56 #### Add `clear_event` method: * It should accept `timestamp` argument, and optional `hook` * Retrieve all scheduled events from `googlesitekit_scheduled_events` option. * If optional `hook` argument is present, loop through retrieved events, and remove all that match the `hook`, then return early * Check if the event with passed `timestamp` key is present in array. * Return early if not. * Otherwise unset the item matching with the passed `timestamp` key from array, and use `Options::set` method to save a new array. #### Add `has_event_scheduled` method: * It should accept `hook` argument * Retrieve all events from `googlesitekit_scheduled_events` and find the one(s) matching the `hook` * If no event is found for the `hook` return `false` * If found, return the key from that event - which is the `timestamp` #### Add `get_ready_to_run_events` method: * Retrieve all events from option and filter the array * Compare their timestamp key with `microtime(true)`, and return only the ones that are due - value is equal or less the current timestamp * Return the resulting array of schedules events that should be executed #### Add `run_scheduled_events` method: * Use `Google\Site_Kit\Core\Storage\Transients` to retrieve value from `googlesitekit_scheduled_events_running`, and if transient exits return early. * Retrieve the events that are due from `get_ready_to_run_events` * If array is empty, return early * Set transient, eg `googlesitekit_scheduled_events_running`, and set the value to `true` to prevent simultaneously running the same events. Set expiration to 1 minute * Loop through retrieved events and make a non-blocking request using `wp_remote_post` to a schedule REST endpoint, by passing these [arguments](https://github.com/WordPress/wordpress-develop/blob/7714f5deec6cf2216998b6f0e1350b67508ea0b9/src/wp-includes/cron.php#L938-L943), and also pass a `body` argument, which should hold `timestamp` key, and as a value use the current event key * Delete `googlesitekit_scheduled_events_running` transient ### Add `Google\Site_Kit\Core\Schedule\REST_Schedule_Controller.php` class * Pass `Events` instance as dependency in the constructor and assign it as a class property * Include `register`, and `get_rest_routes` methods. You can see example in any other `REST_*_Controller` class * Define a route `core/schedule/run-scheduled-event`, and in the callback function, retrieve the `timestamp` parameter from request * Retrieve the available events using `Events::get_ready_to_run_events` method, and extract the one with passed `timestamp` parameter value. If it is not in array, return early * Otherwise execute it using [`do_action_ref_array`](https://developer.wordpress.org/reference/functions/do_action_ref_array/) function, pass the `hook` and `arguments`. * Check if event has a value for `recurrent` * If it does use `Events::add_event`, but for `timestamp` use new value like `time() + $event['recurrent']` to re-schedule the new event * Use `Events::clear_event` passing the initially retrieved `timestamp` from request to remove current event that has been executed from the list ### Add `Google\Site_Kit\Core\Schedule\Schedule.php` class * Instantiate `Events` and `REST_Schedule_Controller` classes in the constructor and assign them as class properties * Add `register` method and invoke `register` on both instances ### Instantiate the Schedule class * Update `includes/Plugin::register`, in the `init` hook instantiate the `Schedule` class and invoke `register` method on it ### Replace cron usage with new Schedule infrastructure * Replace the usage of `wp_schedule_single_event` and `wp_schedule_event` within the plugin with `Events::add_event`, `wp_clear_scheduled_hook` with `Events::clear_event` and `wp_next_scheduled` with `Events::has_event_scheduled` For some inspiration you can check how WP core handles the cron logic [here](https://github.com/WordPress/wordpress-develop/blob/6338d63ec7f32a511bfe4fe12da002171f59edbe/src/wp-includes/cron.php)
zutigrm commented 1 month ago

Thanks @aaemnnosttv

which can happen by listening for updates to the remote features option, for example.

Since option is updated only if sync is successful, and will not be invoked if options don't update, I proposed alternative approach of running the checks and making non blocking request to invoke cron action.

With that said, let's update the AC and restart with a new IB (sorry @zutigrm – I do like the abstraction around Events – maybe we can save some of this in https://github.com/google/site-kit-wp/issues/6992?)

I will take a look at it once it lands in IB. Thanks

eugene-manuilov commented 1 month ago

Thanks, @zutigrm. IB ✔️

zutigrm commented 1 month ago

Parked until dependency #8341 is merged

cc: @binnieshah

mohitwp commented 2 weeks ago

QA Update ✅

Verified that the fallback ensures features are synchronized if the last sync occurred more than 24 hours ago. https://github.com/user-attachments/assets/4efd1dde-7b2d-4249-ad08-519ca8322b23 Verified that the fallback ensures features are not synchronized if the last sync occurred less than 24 hours ago. https://github.com/user-attachments/assets/6884c7fb-33c7-4995-9660-5dfff7ea62c1