Closed lkraav closed 7 years ago
Ah. WP LCache does use the database, so a crashed table may cause it to fail.
Glad you figured it out! If it becomes a repeat problem, let us know and we can look into how we can make it more fail safe.
Ah. WP LCache does use the database, so a crashed table may cause it to fail.
But it wasn't a crashed table. All tables were fine. I simply restarted php-fpm
which I believe clears apcu
.
But of course, to pollute the data collection here... I may have emptied wp_lcache_*
tables BEFORE the restart to see if that helps. It didn't. Then I restarted php-fpm
. Which means that we can't really tell right now who got their state towards corruption, since wp-lcache
didn't have anything to re-populate apcu
with.
Anyway, awareness has been raised, maybe someone else will see something similar, or this will happen again. I have a feeling we'll be back here.
Something is still going wrong here. WP-CLI crashes immediately regardless of restarts etc. mysqlcheck
passes with flying colors for the whole database. As soon as object-cache.php
is renamed, everything works.
Here's mysql query log of where we crash. There's a bunch of similar SELECTs on top, so I'm cutting the list to bottom five, and then there's that suspicious DELETE
:
...
7 Query SELECT "event_id", "pool", "address", "value", "created", "expiration" FROM wp_lcache_events WHERE "address" = '29:cr_1:lcache_alloptions_values:mainwp_db_version' AND ("expiration" >= 1481222716 OR "expiration" IS NULL) ORDER BY "event_id" DESC LIMIT 1
7 Query SELECT "event_id", "pool", "address", "value", "created", "expiration" FROM wp_lcache_events WHERE "address" = '29:cr_1:lcache_alloptions_values:mainwp_extensions' AND ("expiration" >= 1481222716 OR "expiration" IS NULL) ORDER BY "event_id" DESC LIMIT 1
7 Query SELECT "event_id", "pool", "address", "value", "created", "expiration" FROM wp_lcache_events WHERE "address" = '29:cr_1:lcache_alloptions_values:mainwp_manager_extensions' AND ("expiration" >= 1481222716 OR "expiration" IS NULL) ORDER BY "event_id" DESC LIMIT 1
7 Query SELECT "event_id", "pool", "address", "value", "created", "expiration" FROM wp_lcache_events WHERE "address" = '29:cr_1:lcache_alloptions_values:mainwp_plugin_version' AND ("expiration" >= 1481222716 OR "expiration" IS NULL) ORDER BY "event_id" DESC LIMIT 1
7 Query SELECT "event_id", "pool", "address", "value", "created", "expiration" FROM wp_lcache_events WHERE "address" = '29:cr_1:lcache_alloptions_values:mainwp_upgradeVersionInfo' AND ("expiration" >= 1481222716 OR "expiration" IS NULL) ORDER BY "event_id" DESC LIMIT 1
7 Query INSERT INTO wp_lcache_events ("pool", "address", "value", "created", "expiration") VALUES ('5849aa3cec41f5.92269565:1790831999', '12:cr_1:default:is_blog_installed', 's:4:\"b:0;\";', 1481222716, NULL)
6 Query DESCRIBE wp_users
7 Query DELETE FROM wp_lcache_events WHERE "event_id" < '15826' AND "address" IN ('15:cr_site-options:1:ms_files_rewriting','12:cr_1:default:is_blog_installed')
7 Quit
6 Quit
161208 20:45:31 4 Query SET global general_log = 0
EDIT actually, that INSERT
looks malformed? ... OK, copy-pasting it from the log output, have to replace column name quotes with backticks.
Solution may be wp core update-db --network
here.
I was on beta releases for a while. strace
actually points in the right direction by revealing open wp-includes/version.php
syscall, but it took a while to decipher.
...
open("/home/warmpress/wp-includes/version.php", O_RDONLY) = 5
fstat(5, {st_mode=S_IFREG|0644, st_size=617, ...}) = 0
fstat(5, {st_mode=S_IFREG|0644, st_size=617, ...}) = 0
fstat(5, {st_mode=S_IFREG|0644, st_size=617, ...}) = 0
mmap(NULL, 617, PROT_READ, MAP_SHARED, 5, 0) = 0x7f2ec45d5000
munmap(0x7f2ec45d5000, 617) = 0
close(5) = 0
lstat("/home/warmpress/wp-includes/l10n.php", {st_mode=S_IFREG|0644, st_size=43130, ...}) = 0
lstat("/home/warmpress/wp-includes/class-wp-locale.php", {st_mode=S_IFREG|0644, st_size=14739, ...}) = 0
lstat("/home/warmpress/wp-includes/class-wp-locale-switcher.php", {st_mode=S_IFREG|0644, st_size=5139, ...}) = 0
access("/home/warmpress/wp-content/db-error.php", F_OK) = -1 ENOENT (No such file or directory)
write(2, "\33[31;1mError:\33[0m One or more database tables are unavailable. The database may need to be repaired.\n", 101Error: One or more database tables are unavailable. The database may need to be repaired.
) = 101
...
It looks like everything sprang back to life. Exact underlying cause is still unknown to me, but maybe some database revision number was synced up somewhere using wp core update-db
.
EDIT not even that simple, loading /wp-admin/
immediately regressed back to some cached invalid value. Had to remove wp-content/object-cache.php
, re-run wp core update-db --network
and NOW everything seems to be alive, since the right values maybe finally reached the db.
EDIT interestingly, several caching related problems reported with 4.7 https://make.wordpress.org/core/2016/12/08/dev-chat-summary-december-7th-4-7-launch-week/ I wonder if this belongs to the set.
Interesting.
Can you configure and test against a stock WordPress install on this machine? I'm curious as to whether there's a confluence of factors from other code, or whether this is an isolated issue with WP LCache.
Other subsites load fine.
Just to confirm, this issue is present across all sites on your network, right?
Renaming
object-cache.php
resolves the issue.Nothing is visible in error logs or
mysql.err
. How would you recommend I debug this?I've had one issue with
wp-lcache
before, where had to fine tuneinnodb_data_file_path = ibdata1:10M:autoextend:max:512M
where max:128M had to be increased to max:512M. Not sure if related.Other subsites load fine.
EDIT restarting php-fpm resolved the deadlock. But why would this happen? Before restart:
EDIT I also now updated
pecl-apcu-5.1.5
->5.1.7
. Maybe some bug involved there. Will monitor for future re-occurrence.