rhubarbgroup / redis-cache

A persistent object cache backend for WordPress powered by Redis. Supports Predis, PhpRedis, Relay, replication, sentinels, clustering and WP-CLI.
https://wordpress.org/plugins/redis-cache/
GNU General Public License v3.0
434 stars 151 forks source link

Transient cache not properly invalidated #523

Open JUVOJustin opened 4 months ago

JUVOJustin commented 4 months ago

Through two async requests, the transient cache is not properly invalidated

Description

I have two async jobs utilizing mutual locking by transient. This means each job trys to achieve a lock by setting a transient. Once finished, the "lock" transient is deleted. When the second job arrives in parallel, it waits till the lock is released. The problem now is that even when job 1 releases the lock by deleting the transient, the get_transient function of job 2 sill returns the old "lock" status.

Expected Behavior

After deleting the "lock" transient in job 1 a get_transient call in job 2 should return false.

Actual Behavior

After deleting the "lock" transient in job 1 a get_transient call in job 2 returns "1" which is the value representing the "lock" status.

Steps to Reproduce

  1. Create a script emitting two jobs using https://actionscheduler.org/
  2. With step debugging enabled, trigger the first job which creates a "lock" transient
  3. Start the second job manually and check the return of the get_transient for the same transient name previously set in job 1
  4. Continue with job 1 until the transient is deleted, meaning the lock is released.
  5. Check the return of get_transeint in job 2. Should return "false" will return "1"

Additional context

I am importing a lot of data using action-scheduler jobs. Each processed post_id is stored in a transient. To avoid race conditions, i integrated mutual locking using a transient. If a job is locked the running job will wait 1 second and try again.

Environment

tillkruss commented 4 months ago

Transients are handled entirely by WordPress core.

If you can post exact code samples to reproduce this, we might be able to help.

JUVOJustin commented 4 months ago

Hi, the problem only occurs with persisted object cache. Not saying it is caused by your plugin :) Appreciate any help.

The importer is open source. The main function can be found here: https://github.com/JUVOJustin/as-processor/blob/main/src/Sync_Data.php#L158

There is a clear_caches function that is required for non-persistent object caches: https://github.com/JUVOJustin/as-processor/blob/main/src/Sync_Data.php#L346 . This method basically handles the case where the first get_transient call for the lock returns "1" and is then stored inside the object cache. When another job releases the lock, the non-persistent object cache has to be deleted to query the actual database value again.

This is an example integration. The code is run in multiple action-scheduler jobs in parallel:

/**
* @throws Exception
*/
function process_chunk_data(\Generator $chunkData): void
{
  $chunk_post_ids = [];
  foreach ($chunkData as $row) {

      if (empty($row['Händler-SKU']) || empty($row['Artikelbezeichnung'])) {
          continue;
      }

      $sku = $row['Händler-SKU'];
      $asin = $row['ASIN 1'];

      $product_id = $this->upsert_product_by_sku($sku, $asin, $row);
      $chunk_post_ids[] = $product_id;
  }

  $attempts = 0;

  // Update sync data
  do {
      try {
          $this->update_sync_data(['post_ids' => $chunk_post_ids], true, true);
          return;
      } catch (Exception $e) {
          $attempts++;
          sleep(1);
          continue;
      }
  } while($attempts < 5);

  throw new Exception('Sync Data could not be added');

}

Not sure if it does matter, but i did the step debugging using: https://github.com/ddev/ddev-redis

tillkruss commented 4 months ago

If the action-scheduler is a long running process it's cache might go stale, because it's cached in PHP memory as well.

If you switch to wp_cache_get() with the $force parameter instead of get_transient() the cache would always fetch from Redis and not go stale, try using that.

JUVOJustin commented 4 months ago

@tillkruss thank you very much. This fixed it! For reference, I fixed this by creating my own get_transient function: https://github.com/JUVOJustin/as-processor/blob/f898df613b7bc47beff2787a4d57237a434f54ce/src/Sync_Data.php#L361

Personally, I would love to see this fixed in core. I opened a ticket and would love to hear your opinion since you seem to be very invested in these APIs: https://core.trac.wordpress.org/ticket/61296#ticket