Closed rfk closed 7 years ago
The stuff about EXPLAIN
above is a good reminder that MSQL doesn't always know how to do things like basic algebra. Here's the EXPLAIN
for the original "delete outdated reminders" query, with concrete values inserted for the input variables:
mysql> EXPLAIN DELETE FROM verificationReminders WHERE (ROUND(UNIX_TIMESTAMP(CURTIME(4)) * 1000) - createdAt) > 123 AND type = 'first';
+----+-------------+-----------------------+------------+------+---------------+------+---------+------+------+----------+-------------+
| id | select_type | table | partitions | type | possible_keys | key | key_len | ref | rows | filtered | Extra |
+----+-------------+-----------------------+------------+------+---------------+------+---------+------+------+----------+-------------+
| 1 | DELETE | verificationReminders | NULL | ALL | NULL | NULL | NULL | NULL | 1 | 100.00 | Using where |
+----+-------------+-----------------------+------------+------+---------------+------+---------+------+------+----------+-------------+
It claims that it can't use any indexes and must do a full table scan.
Here's the same query, with the WHERE
clause re-arranged to put createdAt
by itself on one side:
EXPLAIN DELETE FROM verificationReminders WHERE (ROUND(UNIX_TIMESTAMP(CURTIME(4)) * 1000) - 123) > createdAt AND type = 'first';
+----+-------------+-----------------------+------------+-------+--------------------+--------------------+---------+-------+------+----------+-------------+
| id | select_type | table | partitions | type | possible_keys | key | key_len | ref | rows | filtered | Extra |
+----+-------------+-----------------------+------------+-------+--------------------+--------------------+---------+-------+------+----------+-------------+
| 1 | DELETE | verificationReminders | NULL | range | reminder_createdAt | reminder_createdAt | 8 | const | 1 | 100.00 | Using where |
+----+-------------+-----------------------+------------+-------+--------------------+--------------------+---------+-------+------+----------+-------------+
And suddenly it cam use the index on createdAt
! Perhaps there's some reason why MySQL thinks this re-arrangement is relationally unsound, but it seems fine to me.
grumble grumble MySQL...
ORDER BY createdAt, uid, type
Also, for this order-by to be efficient, you have either make (uid, type)
the primary key or include those columns in the createdAt
index.
So summing up my ramblings, I guess my recommendations are to try:
(uid, type)
the primary key on this table.GET_LOCK
and RELEASE_LOCK
for concurrency management@rfk - can you triage this as you see appropriate?
-> next
because @vladikoff mentioned he's hoping to find time for this soon, and I think it's important to figure out why the current setup is causing problems in our dev environments.
Some notes / TODO after talking to @jrgm:
I had a bit of a look at the way we're dealing with verification reminders, with an eye to better understanding performance and interaction with replication etc. Some thoughts and observations for later followup below - sorry they're a bit free-form, but I wanted to capture them before I head off on PTO.
The table schema is:
It's not obvious to me whether we use the
id
field for anything. Do we? If not, perhaps we should haveuid, type
as a composite primary key. This would avoid us accidentally creating duplicate reminder entries, and would also help with the next issue below.We delete verification reminders by doing:
But we don't have an index on those columns, and
EXPLAIN
confirms that this query will do a full table scan.We should add an index on (uid, type), either by making it the primary key, or using a secondary index.
I also wonder whether we should delete the verification reminders as part of the
verifyEmail
stored procedure rather than as a separate call to the DB. We don't need to call this procedure when fetching the reminders, see below.We do several things in
fetchVerificationReminders_1
that are not replication-friendly, including asking MySQL for the current time, and usingSELECT .. FOR UPDATE
, and doingLIMIT
without anORDER BY
.I think MySQL replication handles timestamps sanely, but it might simplify the query logic if we passed in the timestamp explicitly.
We may be able to use
GET_LOCK
to simplify the query some. It's also not safe for replication, but may be simpler for MySQL to deal with overall. I'm thinking something like: