Closed rrennick closed 3 years ago
------------------------
LATEST DETECTED DEADLOCK
------------------------
2020-04-21 14:44:56 0x7f121c442700
* (1) TRANSACTION:
TRANSACTION 269973366, ACTIVE 4 sec updating or deleting
mysql tables in use 1, locked 1
LOCK WAIT 4 lock struct(s), heap size 1136, 3 row lock(s), undo log entries 1
MySQL thread id 2099893, OS thread handle 139742331373312, query id 218071551 xxx.xxx.xxx.xxx yyyyy Updating
UPDATE wp_actionscheduler_actions SET attempts = attempts+1, status='in-progress', last_attempt_gmt = '2020-04-21 13:44:52', last_attempt_local = '2020-04-21 14:44:52' WHERE action_id = 1890552
* (1) WAITING FOR THIS LOCK TO BE GRANTED:
RECORD LOCKS space id 3524 page no 2736 n bits 672 index status of table `yyyyy`.`wp_actionscheduler_actions` trx id 269973366 lock_mode X insert intention waiting
Record lock, heap no 1 PHYSICAL RECORD: n_fields 1; compact format; info bits 0
0: len 8; hex 73757072656d756d; asc supremum;;
* (2) TRANSACTION:
TRANSACTION 269971034, ACTIVE 202 sec fetching rows
mysql tables in use 3, locked 3
3310 lock struct(s), heap size 385232, 339898 row lock(s)
MySQL thread id 2099912, OS thread handle 139715760367360, query id 218027094 xxx.xxx.xxx.xxx yyyyy Init for update
UPDATE wp_actionscheduler_actions SET claim_id=183806, last_attempt_gmt='2020-04-21 13:41:34', last_attempt_local='2020-04-21 14:41:34' WHERE claim_id = 0 AND scheduled_date_gmt <= '2020-04-21 13:41:34' AND status='pending' ORDER BY attempts ASC, scheduled_date_gmt ASC, action_id ASC LIMIT 100
* (2) HOLDS THE LOCK(S):
RECORD LOCKS space id 3524 page no 2736 n bits 672 index status of table `yyyyy`.`wp_actionscheduler_actions` trx id 269971034 lock_mode X
Record lock, heap no 1 PHYSICAL RECORD: n_fields 1; compact format; info bits 0
0: len 8; hex 73757072656d756d; asc supremum;;
* (2) WAITING FOR THIS LOCK TO BE GRANTED:
RECORD LOCKS space id 3524 page no 2737 n bits 776 index status of table `yyyyy`.`wp_actionscheduler_actions` trx id 269971034 lock_mode X waiting
Record lock, heap no 665 PHYSICAL RECORD: n_fields 2; compact format; info bits 32
0: len 7; hex 70656e64696e67; asc pending;;
1: len 8; hex 00000000001cd8f8; asc ;;
* WE ROLL BACK TRANSACTION (1)
------------
TRANSACTIONS
------------
Trx id counter 269973434
Purge done for trx's n:o < 269973434 undo n:o < 0 state: running but idle
History list length 0
LIST OF TRANSACTIONS FOR EACH SESSION:
---TRANSACTION 421217314710296, not started
0 lock struct(s), heap size 1136, 0 row lock(s)
---TRANSACTION 421217314582616, not started
0 lock struct(s), heap size 1136, 0 row lock(s)
---TRANSACTION 269973429, ACTIVE 2 sec starting index read
mysql tables in use 3, locked 3
LOCK WAIT 2 lock struct(s), heap size 1136, 1 row lock(s)
MySQL thread id 2100342, OS thread handle 139715226101504, query id 218072430 xxx.xxx.xxx.xxx yyyyy Init for update
UPDATE wp_actionscheduler_actions SET claim_id=183868, last_attempt_gmt='2020-04-21 13:44:57', last_attempt_local='2020-04-21 14:44:57' WHERE claim_id = 0 AND scheduled_date_gmt <= '2020-04-21 13:44:57' AND status='pending' ORDER BY attempts ASC, scheduled_date_gmt ASC, action_id ASC LIMIT 100
------- TRX HAS BEEN WAITING 2 SEC FOR THIS LOCK TO BE GRANTED:
RECORD LOCKS space id 3524 page no 3281 n bits 824 index claim_id of table `yyyyy`.`wp_actionscheduler_actions` trx id 269973429 lock_mode X waiting
Record lock, heap no 2 PHYSICAL RECORD: n_fields 2; compact format; info bits 0
0: len 8; hex 0000000000000000; asc ;;
1: len 8; hex 00000000001975ed; asc u ;;
As mentioned on the other thread, i think the code is trying to log the execution (log_execution), before it's done claiming all jobs, so one update query bumps onto the other.
If that's the case, making the claim query faster (or smaller) or delaying log_execution might help?
According to NewRelic, the following query is quite time-consuming taking 5-10s to complete
UPDATE wp_actionscheduler_actions SET claim_id=?, last_attempt_gmt=?, last_attempt_local=? WHERE claim_id = ? AND scheduled_date_gmt <= ? AND status=? ORDER BY attempts ASC, scheduled_date_gmt ASC, action_id ASC LIMIT ?
See also #457 for prior deadlock issues (that may be related).
the following query is quite time-consuming taking 5-10s to complete
@glagonikas Thanks for the detailed info. That does look like something is blocking the query. Do you have actions that might take that amount of time to execute?
We don't have any actions that would fire on that stack trace, but we do have a large and busy site. We use Redis and everything is optimised, but running loads of update queries consequitively is probably is not ideal as they are not cached.
Can you direct me to the place where the code claims 100 actions and confirm log_execution is where each claimed action runs? I can then try to debug locally and see if there is any room for improvement
Can you direct me to the place where the code claims 100 actions and confirm log_execution is where each claimed action runs?
The default batch size is 25 for Cron or Async - https://github.com/woocommerce/action-scheduler/blob/master/classes/ActionScheduler_QueueRunner.php#L131
The claim / run process starts at https://github.com/woocommerce/action-scheduler/blob/master/classes/ActionScheduler_QueueRunner.php#L154
@glagonikas You could try lowering the batch size in your cron job to 20
or similar. CLI will still process all the pending actions, just in smaller batches. Make the change at a time when you'll be able to monitor and revert if it increases the number of deadlocks. https://github.com/woocommerce/action-scheduler/blob/master/classes/WP_CLI/ActionScheduler_WPCLI_Scheduler_command.php#L13
A followup thought is that if your cron job is scheduled to run every minute then it could be multiple instances of the cron job competing. Cron in Linux spawns the task on the schedule regardless of the disposition of the previous iteration of the task. Here's a demo script I wrote and scheduled to run every minute. Log shows the first 190 seconds:
$ cat cron-test.sh
#!/bin/bash
#
for I in {1..20}
do
echo $I >> /var/www/cron-test.log
sleep 10
done
$ cat cron-test.log
1
2
3
4
5
6
1
7
2
8
3
9
4
10
5
11
6
12
1
7
13
2
8
14
3
9
15
4
10
16
5
11
17
6
12
18
1
7
13
19
2
8
14
20
@glagonikas are you running the Action Scheduler High Volume plugin?
The default batch size is 25 for Cron or Async - https://github.com/woocommerce/action-scheduler/blob/master/classes/ActionScheduler_QueueRunner.php#L131
That's odd... If you look on my deadlock details, the second query that's claiming processes, (UPDATE wp_actionscheduler_actions SET claim_id=183806,...) has a limit of 100, not 25!?!
@glagonikas are you running the Action Scheduler High Volume plugin?
No, wasn't aware of that plugin. We use the one that ships with WC, which overrides the one used by WCS.
That's odd... If you look on my deadlock details, the second query that's claiming processes, (UPDATE wp_actionscheduler_actions SET claim_id=183806,...) has a limit of 100, not 25!?!
OK, the increased limit comes from https://actionscheduler.org/perf/ that we've added previously. We had set the following
This obviously means that we did use the equivalent of the Action Scheduler High Volume plugin
Removing those filters and reverting to 25 didn't make much of a difference. Tried 10 but still no luck.
A followup thought is that if your cron job is scheduled to run every minute then it could be multiple instances of the cron job competing. Cron in Linux spawns the task on the schedule regardless of the disposition of the previous iteration of the task. Here's a demo script I wrote and scheduled to run every minute. Log shows the first 190 seconds:
Correct me if I'm wrong, but even if cron jobs double up, having the lock mechanism would prevent jobs bumping to each other. On each spawn of wp-cron.php I'd imagine you'll have a new claim_id, different rows would be claimed and then updated and jobs would work in parallel on the same INNODB table.
OK, so I run an explain
on the claim query and noticed it states Using intersect(claim_id,status); Using where; Using filesort
. This is because there are indexes on each individual column, but not on the three columns used by the update query.
Is there a particular reason the table wp_actionscheduler_actions
doesn't have an index for claim_id, status, scheduled_date_gmt
? This would result in Using where; Using filesort
and therefore a faster query
I do understand that excessive use of keys is not a good idea, but in our case, it seems to have solved the issue and the deadlocks have gone away. I'll keep an eye on logs for the next couple of days to confirm they've stopped permanently.
According to NewRelic, adding the extra key, reduced the query time from ~1.5s to around ~1.3ms!
The High Volume filters have been removed, but I might give them another go if the keys have solved the deadlocks.
@glagonikas Nice find. If things still look good after a couple days would you submit a PR adding the index & bumping the schema version? Schema is in https://github.com/woocommerce/action-scheduler/blob/master/classes/schema/ActionScheduler_StoreSchema.php
This obviously means that we did use the equivalent of the Action Scheduler High Volume plugin
@glagonikas that's largely what I was asking - have you tweaked the defaults. In particular, the action_scheduler_queue_runner_concurrent_batches
increase there is what was leading to the higher concurrency and likely deadlocks (the default is 1 in AS now, whereas it used to be 5 or 10, because only WP Cron used to run actions, and only on the WP Cron request, not we have the async runner and chained, loop-back requests meaning both that after a WP Cron queue starts, it can go on for much longer in separate requests, and that it can process many more actions.)
The High Volume filters have been removed, but I might give them another go if the keys have solved the deadlocks.
If you can reinstate the filters, that would be a great help. As above, simply restoring action_scheduler_queue_runner_concurrent_batches
to the default of 1
will almost certainly fix the deadlocks also.
Is there a particular reason the table wp_actionscheduler_actions doesn't have an index for claim_id, status, scheduled_date_gmt?
Almost certainly just:
I'm all for adding them given what we've found here.
Hi @thenbrent prior to adding the new index, i was getting deadlocks regardless of the setting on action_scheduler_queue_runner_concurrent_batches
.
After adding the index, the deadlocks stopped. I then reinstated the filters and I haven't seen any issues, so the index seems to have fixed the issue for us.
I added some testing code to my dev install that created 100+ actions at a time for testing a PR. I forgot to comment it out. This morning I saw deadlock errors in my debug log. AS has 700K+ actions.
I added quite a bit of logging and it looks like MySQL returns the action IDs before locking the table/rows for WRITE. The next 2 queries are the fetch & status check of the first action. The deadlock error is logged after the status check of the first action in the list of claimed actions.
We are getting those deadlocks quite frequently (almost daily). We have only 7k items in Scheduled Actions admin area table. We do not use high volume plugin or those filters to modify default AS behavior.
We use WooCommerce 4.6.1.
@slaFFik Have you tried adding the index in #535?
No, we haven't tried that. We will apply that fix early next week and I will report back with the results. Thank you.
Hi @slaFFik , were you able to resolve your problem with the fix in the PR #535 ?
We've also been looking into these deadlocks at our end when working with a busy WooCommerce Subscription site as well. I've been following this issue and the Pull Request #535 and I'm wondering, that was the root cause of the issue ever unambiguously identified?
We've been seeing this issue only when running concurrent batches. We have used Action Scheduler High Volume plugin for this.
Adding the index fixes the issue, but in my opinion it's very important to understand the root cause even if we seem to have found a fix.
To my understanding the UPDATE query in ActionScheduler_DBStore.php used to claim actions locks the entire wp_actionscheduler_actions table. This happens when the WHERE clause does not match any index.
This could explain why updates relating to the running queue collide with the claim_actions of a freshly created concurrent batch.
Adding an index that matches the WHERE clause of the claim_actions method helps, because it essentially prevents locking the entire table and locks only the claimed rows instead.
In my opinion this is a very important fix and clear summary of the cause is in order. I might be wrong with the diagnosis, but would much appreciate a second opinion on this.
We've been discussing this with our hosting provider and it seems that there are several other sites experiencing the deadlocks as well. In my opinion this should be tagged as a critical fix.
Finally I'd like to mention that we've also seen an intermittent extra concurrent batch. In other words when setting the maximum number of concurrent batches to e.g. 2, we've seen 3 concurrent batches running. This problem was confirmed by three unique and simultaneous batch_id values in the wp_actioncheduler_actions table. Based on the source code, this should not be possible. We have not looked into this more in-depth, but it seems that adding the index has a positive effect on this one as well. That being sais, we don't have enough data be sure yet.
References:
I'll try to sum up my findings while I continue working on this. Hope it would provide a bit of valuable info to people participating in this thread.
I've generated 1.32M actions and I'm still at roughly 1.2M pending ones, the actions keep running (pretty slowly for now).
I haven't seen any deadlocks, but I'm seeing transaction aborts, very slow processing, and high CPU utilization (100%+) coming from the mysqld.
Lots of the following messages in the mysql error log:
2021-03-24 11:24:56 5330 [Warning] Sort aborted, host: localhost, user: wp, thread: 5330, query: UPDATE wp_actionscheduler_actions SET claim_id=64765, last_attempt_gmt='2021-03-24 11:24:05', last_attempt_local='2021-03-24 11:24:05' WHERE claim_id = 0 AND scheduled_date_gmt <= '2021-03-24 11:24:05' AND status='pending' ORDER BY attempts ASC, scheduled_date_gmt ASC, action_id ASC LIMIT 25
When I increased the number of concurrent batches to 10 and tried to run 5 instances of WP CLI processing (php8.0 wp-cli.phar action-scheduler run --batch-size=100
), I was constantly running into database timeouts, such as below and couldn't run the actions in parallel reliably enough, the CLI kept returning errors.
<div id="error"><p class="wpdberror"><strong>WordPress database error:</strong> [Lock wait timeout exceeded; try restarting transaction]<br /><code>UPDATE wp_actionscheduler_actions SET claim_id=59348, last_attempt_gmt='2021-03-23 11:19:17', last_attempt_local='2021-03-23 11:19:17' WHERE claim_id = 0 AND scheduled_date_gmt <= '2021-03-23 11:19:17' AND status='pending' ORDER BY attempts ASC, scheduled_date_gmt ASC, action_id ASC LIMIT 100</code></p></div>Error: There was an error running the action scheduler: Unable to claim actions. Database error.
Using these concurrent runners, I tested the level of concurrency with both the new index proposed by @glagonikas and without it. If anything, I saw even larger congestion and more frequent lock timeouts with the index than without it. The actions table itself already has plenty of indices, so it seemed rather strange that this would cause a substantial difference. Though I think we'd need to be better familiar with the setup folks are using (db server vendor and version, my.cnf settings related to locking, key buffers, etc) to be able to see why adding this index had such a positive effect.
When examining the database using SHOW ENGINE INNODB STATUS
, I could see that mostly the following two queries were competing for the actions table:
SELECT a.action_id, a.last_attempt_gmt, a.scheduled_date_gmt FROM wp_actionscheduler_actions a WHERE 1=1 AND a.status='complete' AND a.last_attempt_gmt <= '2021-02-20 15:24:33' ORDER BY a.scheduled_date_gmt ASC LIMIT 0, 20;
UPDATE wp_actionscheduler_actions SET claim_id=60232, last_attempt_gmt='2021-03-23 15:24:34', last_attempt_local='2021-03-23 15:24:34' WHERE status='pending' AND claim_id = 0 AND scheduled_date_gmt <= '2021-03-23 15:24:34' ORDER BY attempts ASC, scheduled_date_gmt ASC, action_id ASC LIMIT 25;
Both were super-slow, and waiting tens of seconds until they can acquire the lock they need, as shown by SHOW PROCESSLIST
:
MariaDB [(none)]> SHOW PROCESSLIST;
+-----+-------------+-----------+---------------+---------+------+--------------------------+------------------------------------------------------------------------------------------------------+----------+
| Id | User | Host | db | Command | Time | State | Info | Progress |
+-----+-------------+-----------+---------------+---------+------+--------------------------+------------------------------------------------------------------------------------------------------+----------+
| 1 | system user | | NULL | Daemon | NULL | InnoDB purge worker | NULL | 0.000 |
| 3 | system user | | NULL | Daemon | NULL | InnoDB purge coordinator | NULL | 0.000 |
| 2 | system user | | NULL | Daemon | NULL | InnoDB purge worker | NULL | 0.000 |
| 4 | system user | | NULL | Daemon | NULL | InnoDB purge worker | NULL | 0.000 |
| 5 | system user | | NULL | Daemon | NULL | InnoDB shutdown handler | NULL | 0.000 |
| 138 | wp | localhost | wordpress-one | Query | 45 | Init for update | UPDATE wp_actionscheduler_actions SET claim_id=60176, last_attempt_gmt='2021-03-23 15:08:03', last_a | 0.000 |
| 141 | root | localhost | NULL | Query | 0 | Init | SHOW PROCESSLIST | 0.000 |
| 142 | wp | localhost | wordpress-one | Query | 19 | Init for update | UPDATE wp_actionscheduler_actions SET claim_id=60177, last_attempt_gmt='2021-03-23 15:08:30', last_a | 0.000 |
| 143 | wp | localhost | wordpress-one | Query | 19 | Init for update | UPDATE wp_actionscheduler_actions SET claim_id=60178, last_attempt_gmt='2021-03-23 15:08:30', last_a | 0.000 |
| 144 | wp | localhost | wordpress-one | Query | 16 | Sending data | SELECT a.action_id FROM wp_actionscheduler_actions a WHERE 1=1 AND a.status='complete' AND a.last_at | 0.000 |
| 145 | wp | localhost | wordpress-one | Query | 11 | Sending data | SELECT a.action_id FROM wp_actionscheduler_actions a WHERE 1=1 AND a.status='complete' AND a.last_at | 0.000 |
+-----+-------------+-----------+---------------+---------+------+--------------------------+------------------------------------------------------------------------------------------------------+----------+
11 rows in set (0.000 sec)
(Notice the SELECTs waiting for 11 and 16 seconds to acquire locks and UPDATE queries waiting for 45 and 19 seconds.)
Looking at the SELECT first, there was nothing really suspicious about the query itself at first sight. However, there are 2 aspects I found suboptimal after digging a bit more:
\ActionScheduler_Abstract_QueueRunner::run_cleanup
in \ActionScheduler_QueueRunner::run
and that seems rather unnecessary. Why not schedule this to run once a day rather than with each queue run, especially if it competes for resources with action claim, i.e. actual processing of the queue?scheduled_date_gmt
, i.e. column that's used only for the final sorting:MariaDB [wordpress-one]> EXPLAIN SELECT a.action_id FROM wp_actionscheduler_actions a WHERE 1=1 AND a.status='complete' AND a.last_attempt_gmt <= '2021-02-20 15:24:33' ORDER BY a.scheduled_date_gmt ASC LIMIT 0, 20;
+------+-------------+-------+-------+-------------------------+--------------------+---------+------+------+-------------+
| id | select_type | table | type | possible_keys | key | key_len | ref | rows | Extra |
+------+-------------+-------+-------+-------------------------+--------------------+---------+------+------+-------------+
| 1 | SIMPLE | a | index | status,last_attempt_gmt | scheduled_date_gmt | 5 | NULL | 538 | Using where |
+------+-------------+-------+-------+-------------------------+--------------------+---------+------+------+-------------+
1 row in set (0.002 sec)
Changing the ORDER BY from scheduled_date_gmt
to action_id
cut the processing time for this query considerably (16.5sec to 0.4sec) and I don't think it would cause any problems with the clean up:
MariaDB [wordpress-one]> SELECT a.action_id, a.last_attempt_gmt, a.scheduled_date_gmt FROM wp_actionscheduler_actions a WHERE 1=1 AND a.status='complete' AND a.last_attempt_gmt <= '2021-02-20 15:36:39' ORDER BY a.scheduled_date_gmt ASC LIMIT 0, 20;
Empty set (16.512 sec)
MariaDB [wordpress-one]> SELECT a.action_id, a.last_attempt_gmt, a.scheduled_date_gmt FROM wp_actionscheduler_actions a WHERE 1=1 AND a.status='complete' AND a.last_attempt_gmt <= '2021-02-20 15:36:39' ORDER BY a.action_id ASC LIMIT 0, 20;
Empty set (0.473 sec)
I will prep a PR to add this change, as I think it's much easier to do than change how and when AS does its cleanup.
Interestingly, in my MariaDB instance, if I set the date cut-off to something closer to now, so that the query actually returns some results, it runs much faster and returns almost instantly:
MariaDB [wordpress-one]> SELECT a.action_id, a.last_attempt_gmt, a.scheduled_date_gmt FROM wp_actionscheduler_actions a WHERE 1=1 AND a.status='complete' AND a.last_attempt_gmt <= '2021-03-01 15:36:39' ORDER BY a.scheduled_date_gmt ASC LIMIT 0, 20;
+-----------+---------------------+---------------------+
| action_id | last_attempt_gmt | scheduled_date_gmt |
+-----------+---------------------+---------------------+
| 16965 | 2021-02-23 10:59:45 | 2021-02-19 18:22:10 |
... 18 more rows
| 16984 | 2021-02-26 14:48:38 | 2021-02-25 19:41:22 |
+-----------+---------------------+---------------------+
20 rows in set (0.004 sec)
I've tested this also on MySQL 5.7 and saw very similar performance pattern:
mysql> SELECT a.action_id, a.last_attempt_gmt, a.scheduled_date_gmt FROM wp_actionscheduler_actions a WHERE 1=1 AND a.status='complete' AND a.last_attempt_gmt <= '2021-02-20 15:36:39' ORDER BY a.scheduled_date_gmt ASC LIMIT 0, 20;
Empty set (12.43 sec)
mysql> SELECT a.action_id, a.last_attempt_gmt, a.scheduled_date_gmt FROM wp_actionscheduler_actions a WHERE 1=1 AND a.status='complete' AND a.last_attempt_gmt <= '2021-02-20 15:36:39' ORDER BY a.action_id ASC LIMIT 0, 20;
Empty set (0.19 sec)
I will look into the UPDATE next.
Hi @peterfabian good job with the investigation on this.
It's been quite a while since we first reported this issue, so we no longer have the my.ini settings, but i don't think they will be much different to our current ones.
Our setup at the time, consisted of one database server (64GB RAM i think) with the latest MariaDB and many web servers (32GB RAM) with Nginx behind a Load Balancer. At the time, they were all dedicated cloud VMs running the latest Ubuntu LTS.
I am not sure if the network latency would have anything to do with this, but query times are certainly affected if you use different machines, even if they are all in the same datacentre. On a different setup, we saw a major improvement when we introduced a dedicated trunk network cable between the application and database servers, because the query data used a different network port and not the one already used for internet access.
We currently use
action_scheduler_queue_runner_time_limit - 120
action_scheduler_queue_runner_batch_size - 100
action_scheduler_queue_runner_concurrent_batches - 10
and we have the following keys
PRIMARY KEY (`action_id`) USING BTREE,
INDEX `hook` (`hook`) USING BTREE,
INDEX `status` (`status`) USING BTREE,
INDEX `scheduled_date_gmt` (`scheduled_date_gmt`) USING BTREE,
INDEX `args` (`args`) USING BTREE,
INDEX `group_id` (`group_id`) USING BTREE,
INDEX `last_attempt_gmt` (`last_attempt_gmt`) USING BTREE,
INDEX `claim_id` (`claim_id`) USING BTREE,
INDEX `claim_id, status, scheduled_date_gmt` (`claim_id`, `status`, `scheduled_date_gmt`) USING BTREE,
INDEX `hook, status` (`hook`, `status`) USING BTREE
After adding the index, the deadlocks stopped immediately. I didn't check for any performance issues at the time and if the load or response time was increased due to this, but deadlocks have to be eliminated at any cost (if the app cannot mitigate the issues).
Maybe you can try the same tests on a multi-server setup as described before and see if this would make a difference? I have a feeling you'll see different results due to network latency and data transfer.
Our deadlocks would occur with less that 1m records.
Also, you mentioned that
Changing the ORDER BY from scheduled_date_gmt to action_id cut the processing time for this query considerably (16.5sec to 0.4sec) and I don't think it would cause any problems with the clean up:
To me, this makes sense, because you are just sorting by integers (albeit bigint) instead of datetime (or any other complicated format). I have no evidence to support this, but came across this which seems to suggest that int is faster. Regardless of performance, I see no reason not to use the action_id - it is very unlikely (impossible) to have a bigger action_id created before a smaller one.
The output of our current SHOW VARIABLES
is below. (If anyone spots something that can be improved, I'd gladly onboard the comments)
As for UPDATE, things a bit more complex here. I'm evaluating the SELECT subquery to make sure the selection of rows to be updated is quick: SELECT action_id, attempts, scheduled_date_gmt FROM wp_actionscheduler_actions WHERE claim_id = 0 AND scheduled_date_gmt <= '2021-03-24 15:34:20' AND status='pending' ORDER BY attempts ASC, scheduled_date_gmt ASC, action_id ASC LIMIT 25;
Now the easiest solution would be to just drop the scheduled_date_gmt ASC
from the ORDER BY clause and we're done (11sec to 0.005sec):
+-----------+----------+---------------------+
| action_id | attempts | scheduled_date_gmt |
+-----------+----------+---------------------+
| 71462 | 0 | 2021-03-18 16:41:40 |
...
| 71358 | 0 | 2021-03-18 16:42:11 |
+-----------+----------+---------------------+
25 rows in set (11.004 sec)
MariaDB [wordpress-one]> SELECT action_id, attempts, scheduled_date_gmt FROM wp_actionscheduler_actions WHERE claim_id = 0 AND scheduled_date_gmt <= '2021-03-24 15:34:20' AND status='pending' ORDER BY attempts ASC, action_id ASC LIMIT 25;
+-----------+----------+---------------------+
| action_id | attempts | scheduled_date_gmt |
+-----------+----------+---------------------+
| 70605 | 0 | 2021-03-18 16:42:13 |
...
| 70721 | 0 | 2021-03-18 16:42:42 |
+-----------+----------+---------------------+
25 rows in set (0.005 sec)
Theoretically speaking, I don't think the docs mention anywhere explicitly what order past actions will be run in. That being said, since this is a scheduling library and most people consider time to be somewhat linear, it's not the best solution. Especially if we can somehow make sure the invariant that action a1 scheduled for time t1 runs before action a2 scheduled for time t2 if t1 < t2 is upheld.
One possible optimization is that it seems that query SELECT scheduled_date_gmt FROM wp_actionscheduler_actions WHERE claim_id = 0 AND status='pending' ORDER BY scheduled_date_gmt ASC LIMIT 25;
is a relatively quick one.
MariaDB [wordpress-one]> select scheduled_date_gmt from wp_actionscheduler_actions where claim_id = 0 AND status='pending' order by scheduled_date_gmt ASC limit 25;
+---------------------+
| scheduled_date_gmt |
+---------------------+
| 2021-03-18 16:42:14 |
...23 more items...
| 2021-03-18 16:42:48 |
+---------------------+
25 rows in set (0.201 sec)
The largest value in the result set gives a better upper bound on scheduled_date_gmt
than looking at all actions that are scheduled before now (if there is at most 1 action scheduled for each second, then this is equivalend to the result set; if there are mutliple actions scheduled for each second, the 25 earliest ones are for sure schedled in the interval between the min and max from the subquery above).
Side note: don't forget to take into account group, hook and possible query filtering.
Now once we have a better upper bound, we can make the subsequent query more exclusive and faster:
New query: 0.2s:
MariaDB [wordpress-one]> SELECT action_id, attempts, scheduled_date_gmt FROM wp_actionscheduler_actions WHERE claim_id = 0 AND scheduled_date_gmt <= '2021-03-25 09:24:20' AND scheduled_date_gmt <= '2021-03-18 16:43:02' AND status='pending' ORDER BY attempts ASC, scheduled_date_gmt ASC, action_id ASC LIMIT 25;
+-----------+----------+---------------------+
| action_id | attempts | scheduled_date_gmt |
+-----------+----------+---------------------+
| 71652 | 0 | 2021-03-18 16:42:29 |
| 71657 | 0 | 2021-03-18 16:42:30 |
| 71662 | 0 | 2021-03-18 16:42:31 |
| 71667 | 0 | 2021-03-18 16:42:32 |
| 71672 | 0 | 2021-03-18 16:42:33 |
| 71677 | 0 | 2021-03-18 16:42:34 |
| 71682 | 0 | 2021-03-18 16:42:36 |
| 71687 | 0 | 2021-03-18 16:42:37 |
| 71692 | 0 | 2021-03-18 16:42:38 |
| 71697 | 0 | 2021-03-18 16:42:46 |
| 71702 | 0 | 2021-03-18 16:42:47 |
| 71707 | 0 | 2021-03-18 16:42:48 |
| 71712 | 0 | 2021-03-18 16:42:49 |
| 71717 | 0 | 2021-03-18 16:42:50 |
| 71722 | 0 | 2021-03-18 16:42:51 |
| 71727 | 0 | 2021-03-18 16:42:52 |
| 71732 | 0 | 2021-03-18 16:42:53 |
| 71737 | 0 | 2021-03-18 16:42:55 |
| 71742 | 0 | 2021-03-18 16:42:56 |
| 71747 | 0 | 2021-03-18 16:42:57 |
| 71752 | 0 | 2021-03-18 16:42:58 |
| 71757 | 0 | 2021-03-18 16:42:59 |
| 71762 | 0 | 2021-03-18 16:43:00 |
| 71767 | 0 | 2021-03-18 16:43:01 |
| 71772 | 0 | 2021-03-18 16:43:02 |
+-----------+----------+---------------------+
25 rows in set (0.205 sec)
Old query (13s):
MariaDB [wordpress-one]> SELECT action_id, attempts, scheduled_date_gmt FROM wp_actionscheduler_actions WHERE claim_id = 0 AND scheduled_date_gmt <= '2021-03-25 09:24:20' AND status='pending' ORDER BY attempts ASC, scheduled_date_gmt ASC, action_id ASC LIMIT 25;
+-----------+----------+---------------------+
| action_id | attempts | scheduled_date_gmt |
+-----------+----------+---------------------+
| 71662 | 0 | 2021-03-18 16:42:31 |
| 71667 | 0 | 2021-03-18 16:42:32 |
| 71672 | 0 | 2021-03-18 16:42:33 |
| 71677 | 0 | 2021-03-18 16:42:34 |
| 71682 | 0 | 2021-03-18 16:42:36 |
| 71687 | 0 | 2021-03-18 16:42:37 |
| 71692 | 0 | 2021-03-18 16:42:38 |
| 71697 | 0 | 2021-03-18 16:42:46 |
| 71702 | 0 | 2021-03-18 16:42:47 |
| 71707 | 0 | 2021-03-18 16:42:48 |
| 71712 | 0 | 2021-03-18 16:42:49 |
| 71717 | 0 | 2021-03-18 16:42:50 |
| 71722 | 0 | 2021-03-18 16:42:51 |
| 71727 | 0 | 2021-03-18 16:42:52 |
| 71732 | 0 | 2021-03-18 16:42:53 |
| 71737 | 0 | 2021-03-18 16:42:55 |
| 71742 | 0 | 2021-03-18 16:42:56 |
| 71747 | 0 | 2021-03-18 16:42:57 |
| 71752 | 0 | 2021-03-18 16:42:58 |
| 71757 | 0 | 2021-03-18 16:42:59 |
| 71762 | 0 | 2021-03-18 16:43:00 |
| 71767 | 0 | 2021-03-18 16:43:01 |
| 71772 | 0 | 2021-03-18 16:43:02 |
| 71777 | 0 | 2021-03-18 16:43:03 |
| 70786 | 0 | 2021-03-18 16:43:04 |
+-----------+----------+---------------------+
25 rows in set (13.187 sec)
(the result set is slightly different since 2 actions got done in the meantime).
Now it's nice that we can schedule actions up to a second accuracy, but now that I see that actions older than a month get purged, I'm intrigued by the fact that there are only 30 days worth of timeslots. Of course, a lot depends on specific distribution of actions over that time period. I can see 2 possible extremes that we should be able to cater to in some way:
In the first case, adding a column that would count number of days since the beginning of the epoch and filter actions based on that could help quite a bit, as it would quickly eliminate 97% of actions (100% of actions within the last 30 days, each day is ±3% of actions, assuming more than 25 actions per day) with a simple index on an integer column not larger than 16 bits (and it would break in year 2149). However, this will be pretty useless for the antiuniform distribution (i.e. all actions scheduled on 1 day, rest of the month nothing).
Theoretically speaking, we could go one step more granular to hours. Hopefully, nobody wants AS to process hundreds of thousands of actions per hour and then hours since epoch is still quite okayish 19bit int.
So I've tried to add column hours_since_epoch
, but the performance wasn't any better than Optimization 2 above, so I stopped going in that direction.
The solution proposed in https://github.com/woocommerce/action-scheduler/pull/535. I wasn't able to verify this brings any benefit, so I'm reluctant to merge it tbh. I've tried other possible combinations for difference composite indices, but could not see much of a positive difference, maybe even slow down.
In conclusion, for now, I'll prep a PR for approach no 2 and see where the next bottleneck lies after applying both the changes mentioned in my last 2 comments.
@peterfabian if you can, it would be great to run your tests on a multi-server (one db and one app server) environment to see if you get different results.
I suspect it might and if so, it will be good to remember in the future when testing performance-related improvements.
In my local testing on MariaDB with 5 CLI runners I was able to go from ±65 noop actions per minute before the change to ±1800 noop actions after applying the changes in #678 and #680. I will test it tomorrow on local MySQL 5.7 server and then on actual wp.com setup with a separate HTTP and DB server. In the meantime, feel free to test these PRs.
I'm happy to report back that the test on wordpress.com (ephemeral site) was successful. I've imported the 1.3M actions into the database and did the comparison below:
I let the AS run for 20 minutes with 1 wp-cli worker, I haven't tried more because I was already getting large number of deadlocks:
I let the AS run for 10 minutes with 1-3 wp-cli workers (gradually increased the load).
I think it looks solid so far.
I'd also like to note that perhaps there's a possibility to improve \ActionScheduler_DBStore::find_action
, as it's used quite widely (e.g. in as_unschedule_action
and as_next_scheduled_action
) and it also sorts by the scheduled_date_gmt
, but that's perhaps outside of scope for this one now.
That's great work @peterfabian! I'm glad you went deep into this and came up with solid changes.
It's also great that you confirmed my suspicions that having a separate DB server [in some cases/scenarios] can cause completely different behaviour than you'd get on local/single server.
@peterfabian and @glagonikas I'm super glad to see this investigated further. Great work @peterfabian with the performance optimisation.
To my understanding the optimisation in #678 and #680 will, in addition to improved performance, reduce the likelihood of deadlocks, but it won't eliminate it completely. Adding the index as would (as explained in my earlier comment). @peterfabian what's your opinion on this?
What I'm trying to say is that I would add the index to compliment the performance optimisation in 678 and 680.
We've now had the new composite index running for several weeks on a large WooCommerce Subscriptions site (with several thousand payments scheduled to run at the same time) We've had zero deadlocks since we applied the changes.
Hi @samimakela , I'm not really an expert on MySQL, so happy to learn and update my mental model, but I don't think you can actually eliminate the deadlocks completely if you have 2 queries asking to update the same rows which could end up waiting for each other. Based on some more reading I've done now, my understanding is that the only way to avoid deadlocks would be to create a covered index that would include all columns as the primary index, so that all the data would be directly included in index nodes and the data doesn't need to be fetched from the table itself. Worth a try maybe, but I think the change proposed in #535 does not completely eliminate deadlocks.
As for other effects of adding the extra composite index as proposed in https://github.com/woocommerce/action-scheduler/pull/535, I saw in the EXPLAIN
that the query using the new index actually didn't make much of a difference, MariaDB estimated the number of examined rows even higher (when forcing the new index, otherwise sql optimizer doesn't choose the composite index for me).
Without the index:
MariaDB [wordpress-one]> EXPLAIN SELECT action_id FROM wp_actionscheduler_actions WHERE status='pending' AND claim_id = 0 AND scheduled_date_gmt <= '2021-03-27 13:04:34' ORDER BY attempts ASC, scheduled_date_gmt ASC, action_id ASC LIMIT 25;
+------+-------------+----------------------------+------+------------------------------------+--------+---------+-------+--------+----------------------------------------------------+
| id | select_type | table | type | possible_keys | key | key_len | ref | rows | Extra |
+------+-------------+----------------------------+------+------------------------------------+--------+---------+-------+--------+----------------------------------------------------+
| 1 | SIMPLE | wp_actionscheduler_actions | ref | status,scheduled_date_gmt,claim_id | status | 82 | const | 639595 | Using index condition; Using where; Using filesort |
+------+-------------+----------------------------+------+------------------------------------+--------+---------+-------+--------+----------------------------------------------------+
With the index, default query:
MariaDB [wordpress-one]> EXPLAIN SELECT action_id FROM wp_actionscheduler_actions WHERE status='pending' AND claim_id = 0 AND scheduled_date_gmt <= '2021-03-27 13:04:34' ORDER BY attempts ASC, scheduled_date_gmt ASC, action_id ASC LIMIT 25;
+------+-------------+----------------------------+------+-----------------------------------------------------------------------+--------+---------+-------+--------+----------------------------------------------------+
| id | select_type | table | type | possible_keys | key | key_len | ref | rows | Extra |
+------+-------------+----------------------------+------+-----------------------------------------------------------------------+--------+---------+-------+--------+----------------------------------------------------+
| 1 | SIMPLE | wp_actionscheduler_actions | ref | status,scheduled_date_gmt,claim_id,claim_id-status-scheduled_date_gmt | status | 82 | const | 639595 | Using index condition; Using where; Using filesort |
+------+-------------+----------------------------+------+-----------------------------------------------------------------------+--------+---------+-------+--------+----------------------------------------------------+
1 row in set (0.001 sec)
WIth index, giving a hint to use the new index:
MariaDB [wordpress-one]> EXPLAIN SELECT action_id FROM wp_actionscheduler_actions USE INDEX(`claim_id-status-scheduled_date_gmt`) WHERE status='pending' AND claim_id = 0 AND scheduled_date_gmt <= '2021-03-27 13:04:34' ORDER BY attempts ASC, scheduled_date_gmt ASC, action_id ASC LIMIT 25;
+------+-------------+----------------------------+------+------------------------------------+------------------------------------+---------+-------------+--------+----------------------------------------------------+
| id | select_type | table | type | possible_keys | key | key_len | ref | rows | Extra |
+------+-------------+----------------------------+------+------------------------------------+------------------------------------+---------+-------------+--------+----------------------------------------------------+
| 1 | SIMPLE | wp_actionscheduler_actions | ref | claim_id-status-scheduled_date_gmt | claim_id-status-scheduled_date_gmt | 90 | const,const | 662406 | Using index condition; Using where; Using filesort |
+------+-------------+----------------------------+------+------------------------------------+------------------------------------+---------+-------------+--------+----------------------------------------------------+
1 row in set (0.000 sec)
Just for completeness, also including the EXPLAIN
for the UPDATE after adding the index, first without index hint, then with the hint:
MariaDB [wordpress-one]> EXPLAIN UPDATE wp_actionscheduler_actions SET claim_id=60232, last_attempt_gmt='2021-03-23 15:24:34', last_attempt_local='2021-03-23 15:24:34' WHERE status='pending' AND claim_id = 0 AND scheduled_date_gmt <= '2021-03-27 15:24:34' ORDER BY attempts ASC, scheduled_date_gmt ASC, action_id ASC LIMIT 25;
+------+-------------+----------------------------+-------+-----------------------------------------------------------------------+--------------------+---------+------+--------+-----------------------------+
| id | select_type | table | type | possible_keys | key | key_len | ref | rows | Extra |
+------+-------------+----------------------------+-------+-----------------------------------------------------------------------+--------------------+---------+------+--------+-----------------------------+
| 1 | SIMPLE | wp_actionscheduler_actions | range | status,scheduled_date_gmt,claim_id,claim_id-status-scheduled_date_gmt | scheduled_date_gmt | 5 | NULL | 662406 | Using where; Using filesort |
+------+-------------+----------------------------+-------+-----------------------------------------------------------------------+--------------------+---------+------+--------+-----------------------------+
1 row in set (0.001 sec)
MariaDB [wordpress-one]> EXPLAIN UPDATE wp_actionscheduler_actions USE INDEX(`claim_id-status-scheduled_date_gmt`) SET claim_id=60232, last_attempt_gmt='2021-03-23 15:24:34', last_attempt_local='2021-03-23 15:24:34' WHERE status='pending' AND claim_id = 0 AND scheduled_date_gmt <= '2021-03-27 15:24:34' ORDER BY attempts ASC, scheduled_date_gmt ASC, action_id ASC LIMIT 25;
+------+-------------+----------------------------+-------+------------------------------------+------------------------------------+---------+------+--------+-----------------------------+
| id | select_type | table | type | possible_keys | key | key_len | ref | rows | Extra |
+------+-------------+----------------------------+-------+------------------------------------+------------------------------------+---------+------+--------+-----------------------------+
| 1 | SIMPLE | wp_actionscheduler_actions | range | claim_id-status-scheduled_date_gmt | claim_id-status-scheduled_date_gmt | 95 | NULL | 662406 | Using where; Using filesort |
+------+-------------+----------------------------+-------+------------------------------------+------------------------------------+---------+------+--------+-----------------------------+
1 row in set (0.000 sec)
I've also tried the code with the additional index on a new site on wordpress.com with 1.2M pending/1.3M total actions and got the following result in 10 minutes:
While I agree that reasoning from you folks makes sense and the 2 anecdata is already promising, ultimately, I wasn't able to replicate the positive effect of the additional index (actually it seemed worse, I wasn't even able to start any WP CLI batch processing despite multiple retries), so can't really recommend it to everyone.
However, if you'd be willing to privately share a db dump from the actions table with us, we can maybe get a better understanding of what happens on your sites. Unfortunately, I won't be able to follow up on this any time soon, as I'll be on extended leave from Monday, but maybe someone else from the team might be able to test it.
@peterfabian thanks for taking the time to answer.
The underlying issue in my opinion is that whenever actions are claimed the whole table is locked until the query is complete. This is stated in the InnoDB documentation.
If there is another batch already running, the execution logging might bump into the query used to claim actions. Making the queries faster (as you did) reduces the likelihood of this happening, but does not completely prevent it.
As explained in my earlier answer, adding the index would mean, that only the rows matching the query are locked during the query (instead of the whole table)
Since the two queries (claiming actions and logging the execution) don't try to update the same rows in any case, adding the index essentially prevents the deadlocks completely.
As mentioned after adding the index we haven't seen any deadlocks since. We are running a WooCommerce subscriptions site, where several thousand Stripe payments are scheduled to run at the same time. We used to hundreds of deadlocks during the execution.
Creating the index is enough. You don't need to change the query itself.
You can create an index for testing purposes by running the following query:
CREATE INDEX composite_test_index ON wp_actionscheduler_actions (claim_id, status, scheduled_date_gmt);
After adding the index run the default query e.g:
EXPLAIN SELECT action_id FROM wp_actionscheduler_actions WHERE status='pending' AND claim_id = 0 AND scheduled_date_gmt <= '2021-03-28 13:04:34' ORDER BY attempts ASC, scheduled_date_gmt ASC, action_id ASC LIMIT 25;
The result should be as follows:
+------+-------------+----------------------------+-------+---------------------------------------------------------+----------------------+---------+------+------+---------------------------------------+
| id | select_type | table | type | possible_keys | key | key_len | ref | rows | Extra |
+------+-------------+----------------------------+-------+---------------------------------------------------------+----------------------+---------+------+------+---------------------------------------+
| 1 | SIMPLE | wp_actionscheduler_actions | range | status,scheduled_date_gmt,claim_id,composite_test_index | composite_test_index | 95 | NULL | 1 | Using index condition; Using filesort |
+------+-------------+----------------------------+-------+---------------------------------------------------------+----------------------+---------+------+------+---------------------------------------+
The difference to your results is that the key used for the query is the _composite_testindex while in your case the key was the status Please note that I did not make any changes to the default query.
It looks like there is something wrong with the index in your test environment. I would very much encourage to take another look at this and check that the index is actually working.
I'm happy to provide help with this one. Let me know what you need and I'll do my best to help out.
I'm not sure why the EXPLAIN @peterfabian gets on the default query doesn't pick the index and requires a nudge. In our case, once the index is added, the default query starts using it straight-away, which is why I proposed this as a quick fix.
Our output is identical to @samimakela (just a different index name), so ref
is null, rows
is 1 and Extra
doesn't have "using where;" which all make sense because the index is used correctly.
Since we've added the index, we too have never seen a single deadlock, but it is possible that we simply don't do such intensive processing as in Peter's case, and might resurface when the site grows further.
I see no harm in merging Peter's changes and speed up all queries, then either merge my PR, or simply put it in the docs as a possible solution for sites that might still experience deadlocks.
I'm not sure why there aren't many others reporting deadlocks, given that WC is used across loads of websites and certainly some will be big and busy ones.
off-topic As a general thought, it might be good for WC to compile a list of large websites using it and reach out to them for feedback on performance or issues like this. (Usage Tracking won't always help because some folks like us might have this switched off). Surely others will be experiencing the same frustrations like we do and would like to see other performance improvements like custom orders table and custom products table being implemented in core. Some (like us) might even be willing to fund those changes considering the huge impact they might have on server costs and lost income caused by slow page load.
One more thought. Adding the index should also speed up the query itself as reported by @glagonikas earlier.
Since it seems, that the index did not work correctly during the tests made by @peterfabian, also the results regarding the query performance can't be relied on.
I know I'm sounding like a broken record at this point, but I strongly feel that it would be important to revisit the issue and reconsider the index as a primary solution.
As mentioned - happy to help in any way possible.
I want to thank all those working on this thread. We have 25k+ actions on our large server and were having Deadlock issues all the time which was affecting our site performance. I added the index and we haven't had one deadlock since. Look forward to the further improvements suggested here to Action Scheduler.
I know I'm sounding like a broken record at this point, but I strongly feel that it would be important to revisit the issue and reconsider the index as a primary solution.
Re being a broken record ... not at all: what you're drawing attention to is important.
Specifically in relation to the work done over here (to add a new index), I think it's worth us considering this as a further (and significant) incremental improvement, but it probably will not be the 'last word' on the matter and, likely, we are going to need to continue to refine and expand upon the menu of options that exist for running AS smoothly at scale.
Just an update on this, we recently got AutomateWoo and this has pushed our actions to over 1m. Roughly 900k come from AW (!).
During a large promo, where hundreds of customers were browsing and purchasing, we saw the deadlocks again... 😭
So it seems that having the index helps a lot, but as @peterfabian said, it won't eliminate the deadlocks in all scenarios.
We've had similar (and even more) traffic before adding AW and hadn't had any deadlocks since adding the index, but our table was never nearly as full.
Thanks for the update, @glagonikas — out of interest, have you had an opportunity to try this related change? I'm currently reviewing that alongside the new index PR, and so far (but I still have more testing to do) the former seems to have the greater impact in terms of reducing deadlocks.
hi @barryhughes i tried the code you sent me, but some deadlocks still happen. Traffic has died down however so it's not a like-for-like comparison
Thanks for taking it for a spin—this is all useful feedback :+1:
This is easily the most interesting, thorough, and exciting issue I've ever read. Thanks to everyone. 🤔 😂 💯
@glagonikas could you run EXPLAIN on the query claiming the actions?
On a quick glimpse it seems that the entire table is locked. As long as the claim actions query locks the entire table the possibility of deadlocks exits. In my opinion the only way to eliminate these deadlocks is to figure out a way to claim actions without locking the entire table.
The two queries are not updating the same rows by design. However one of them (the query used to claim actions) clearly locks more rows than it needs to.
Hi @samimakela
The explain is as follows
"id" "select_type" "table" "type" "possible_keys" "key" "key_len" "ref" "rows" "Extra"
"1" "SIMPLE" "wp_actionscheduler_actions" "range" "status,scheduled_date_gmt,claim_id,claim_id, status, scheduled_date_gmt,status, scheduled_date_gmt" "status, scheduled_date_gmt" "87" \N "1" "Using where; Using filesort"
There's even an index for all three conditions (claim_id, status, scheduled_date_gmt
) but doesn't seem to help, however, Ii'm not sure if having status
first would help this index.
@glagonikas have you added any other "extra" indexes on top of the composite index combining the claim_id, status, scheduled_date_gmt?
If I'm not mistaken the query seems to have used an index containing only the status and scheduled_date_gmt, which should not exist.
@samimakela yes, we've added more indexes depending on what was necessary at the time. Those would have come from slow queries in slow.log or NewRelic
@glagonikas any chance you could try this out with only one composite index (claim_id, status, scheduled_date_gmt)?
I'm not sure why the optimiser seems to have picked a different index, but unless the chosen index matches fully with the WHERE clause of the query all the rows will be locked, which opens a window for deadlocks.
I've removed the index status and scheduled_date_gmt
and the EXPLAIN
now uses the other composite index as below.
"id" "select_type" "table" "type" "possible_keys" "key" "key_len" "ref" "rows" "Extra"
"1" "SIMPLE" "wp_actionscheduler_actions" "range" "status,scheduled_date_gmt,claim_id,claim_id, status, scheduled_date_gmt" "claim_id, status, scheduled_date_gmt" "95" \N "1" "Using where; Using filesort"
This didn't make any difference and deadlocks keep cropping up...
Closing this issue: we'll see how the new index impacts this issue 'in the wild' via the next AS release. If there are still ongoing problems for a sizeable cross-section of the community we can re-evaluate via a fresh issue and perhaps re-examine some of the other approaches discussed earlier on.
Confirming we were also seeing the problem intermittently across our sites. Have deployed the patch today and not seen any further deadlocks -- will keep observing.
Continuing discussion from #438 on intermittent deadlocks:
and