markjaquith / feedback

Ask @markjaquith anything!
42 stars 4 forks source link

Alternative to transients #45

Closed LoreleiAurora closed 9 years ago

LoreleiAurora commented 9 years ago

Hi Mark,

I'm working on a plugin that calls data from a external API, up to now I have been storing this data using set_transient(), or on multisite set_site_transient(). This was fine when there was only a small amount of data, but now there is in a worst case 100,000,000 possible API calls to be stored, as you know this isn't good for the options table.

Do you know of a better way to store the data?

If you need any more info, just ask. I didn't want to give to much away to the world while the plugin is still in development.

bueltge commented 9 years ago

The transient cache give you a guarantee that your data will be cached. But you can use the object cache API form WordPress. But this is not a guarantee. For this technique it is important, that the install have a object cache on the server or via plugin, like Memcache or APC.

By default, the object cache is non-persistent. This means that data stored in the cache resides in memory only and only for the duration of the request. Cached data will not be stored persistently across page loads unless you install a persistent caching plugin.

I think it is important that you see the goal of the plugin. It is public for users or for a customer, how you know, which caching technique is usable on the server. best.

JDGrimes commented 9 years ago

Are all of these data going to be needed again? I'm doubting that anyone really needs 100,000,000 different pieces of data that regularly. Are you using an expiry time for your transients? How often is the user actually going to need that bit of data again within that time frame?

I think you really need to consider whether it is worth caching these as transients. You may not want to cache the data at all, or only certain requests that you know will be very regular. For some you may want to use the object cache mentioned by Frank instead, if you'll call for the data multiple times in a single PHP run, but don't necessarily need them for the future.=

markjaquith commented 9 years ago

By default, transients are stored in the WordPress options table. They require two rows: one for the data, and one for the expiry. But these transients aren't regularly garbage collected. They're garbage collected on access and on db update (so, WP updates). Allowing even a few thousand items to stagnate here would be a bad idea.

One idea would be to reduce the number of buckets, either by collapsing several small API results into one big "type X of API call" caches, or by simply not caching things you expect to be run rarely (though some APIs may require caching in their TOS).

Or, if a persistent object cache will be available, use that. But this will generally only work on hosts you control.

Or, spin up a custom cache table. If you truly expect hundreds of thousands of cache entries, this is the only way to scale it on every WordPress install. You could even write a wrapper layer that checks wp_using_ext_object_cache() and uses the available external object cache if present, or falls back to using a custom table if not.

logoscreative commented 9 years ago

I've been running an options table with 1-2MM rows of transients with pretty decent performance for years. A few thoughts that might be helpful in addition to what @markjaquith pointed out:

  1. There are a few ways to garbage collect expired transients. We use a cron job that deletes the rows and optimizes the table.
  2. Add an index to your table for the autoload key. This made a huge difference in performance, because I'd guess most of your transients do not need to be autoloaded. See the details in this Trac ticket, where it's been decided that WP won't do that by default because it doesn't help at smaller table sizes.
  3. Depending on how many of those transients you load on a given page, you may need to disable object caching there entirely. It's not tough to overload memory entirely if you bring in a lot of data at once, and it can bring the load to a total halt.
  4. For performance: lazily load anything not in the critical path, and ensure the data stored in the transient is at the highest level possible considering the calls it has to make. Try to keep relevant transients around until it's absolutely necessary that they be purged by setting a long time limit—obviously, don't forget to set one or it'll autoload—and attaching the delete function to relevant hooks. As well, build in safeguards for race conditions if the site might see a massive spike in traffic on a server that would struggle to keep up. I outline many of these approaches in my WordCamp presentation.

If you're able to build the external cache like Mark suggested, you'll have more options and it should be more scale-proof, but I've been pleased enough with my setup to not be worth the extra dev time.