mozilla / fxa-auth-db-mysql

DEPRECATED - Migrated to https://github.com/mozilla/fxa
Mozilla Public License 2.0
12 stars 26 forks source link

Some thoughts on verification reminder queries #196

Closed rfk closed 7 years ago

rfk commented 7 years ago

I had a bit of a look at the way we're dealing with verification reminders, with an eye to better understanding performance and interaction with replication etc. Some thoughts and observations for later followup below - sorry they're a bit free-form, but I wanted to capture them before I head off on PTO.


The table schema is:

CREATE TABLE IF NOT EXISTS verificationReminders (
  id BINARY(32) PRIMARY KEY,
  uid BINARY(16) NOT NULL,
  type VARCHAR(255) NOT NULL,
  createdAt BIGINT SIGNED NOT NULL,
  INDEX reminder_createdAt (createdAt)
) ENGINE=InnoDB;

It's not obvious to me whether we use the id field for anything. Do we? If not, perhaps we should have uid, type as a composite primary key. This would avoid us accidentally creating duplicate reminder entries, and would also help with the next issue below.


We delete verification reminders by doing:

DELETE FROM verificationReminders WHERE uid = reminderUid AND type = reminderType;

But we don't have an index on those columns, and EXPLAIN confirms that this query will do a full table scan.

We should add an index on (uid, type), either by making it the primary key, or using a secondary index.

I also wonder whether we should delete the verification reminders as part of the verifyEmail stored procedure rather than as a separate call to the DB. We don't need to call this procedure when fetching the reminders, see below.


We do several things in fetchVerificationReminders_1 that are not replication-friendly, including asking MySQL for the current time, and using SELECT .. FOR UPDATE, and doing LIMIT without an ORDER BY.

I think MySQL replication handles timestamps sanely, but it might simplify the query logic if we passed in the timestamp explicitly.

We may be able to use GET_LOCK to simplify the query some. It's also not safe for replication, but may be simpler for MySQL to deal with overall. I'm thinking something like:

CREATE PROCEDURE `fetchVerificationReminders_2` (
    IN currentTime BIGINT SIGNED,
    IN reminderType VARCHAR(255),
    IN reminderTime BIGINT SIGNED,
    IN reminderTimeOutdated BIGINT SIGNED,
    IN reminderLimit INTEGER
)
BEGIN
  DECLARE EXIT HANDLER FOR SQLEXCEPTION
  BEGIN
    DO RELEASE_LOCK('fxa-auth-server.verification-reminders-lock');
    ROLLBACK;
    RESIGNAL;
  END;

  START TRANSACTION;

  SELECT @lockAcquired:=GET_LOCK('fxa-auth-server.verification-reminders-lock', 1)
  AS lockAcquired;

  IF @lockAcquired THEN

    DROP TEMPORARY TABLE IF EXISTS reminderResults;

    -- Select these straight into the temporary table
    -- rather than using a cursor.  Use ORDER BY to
    -- ensure the query is deterministic.

    CREATE TEMPORARY TABLE reminderResults AS 
      SELECT uid, type
      FROM verificationReminders
      -- Since we want to order by `createdAt`, we have to rearrange
      -- the `WHERE` clauses so `createdAt` is positive rather than negated.
      WHERE  createdAt < currentTime - reminderTime
      AND createdAt > currentTime - reminderTimeOutdated
      AND type = reminderType
      ORDER BY createdAt, uid, type
      LIMIT reminderLimit;

    -- Because the query is deterministic we can delete
    -- all the selected items at once, rather than calling
    -- deleteVerificationReminder()

    DELETE FROM verificationReminders 
      WHERE  createdAt < currentTime - reminderTime
      AND createdAt > currentTime - reminderTimeOutdated
      AND type = reminderType
      ORDER BY createdAt, uid, type
      LIMIT reminderLimit;

    -- Clean up outdated reminders.
    -- `EXPLAIN` says this wasn't actually using the index on `createdAt` because of
    -- the way the `WHERE` clause was structured, rearranging it to put `createdAt`
    -- by itself on one side of the comparison allows it to use the index.

    DELETE FROM
      verificationReminders
    WHERE
      createdAt < currentTime - reminderTimeOutdated
    AND
      type = reminderType;

    -- Return the result
    SELECT * FROM reminderResults;

    DO RELEASE_LOCK('fxa-auth-server.verification-reminders-lock');

  END IF;
  COMMIT;
rfk commented 7 years ago

The stuff about EXPLAIN above is a good reminder that MSQL doesn't always know how to do things like basic algebra. Here's the EXPLAIN for the original "delete outdated reminders" query, with concrete values inserted for the input variables:

mysql> EXPLAIN DELETE FROM verificationReminders WHERE (ROUND(UNIX_TIMESTAMP(CURTIME(4)) * 1000) - createdAt) > 123 AND type = 'first';
+----+-------------+-----------------------+------------+------+---------------+------+---------+------+------+----------+-------------+
| id | select_type | table                 | partitions | type | possible_keys | key  | key_len | ref  | rows | filtered | Extra       |
+----+-------------+-----------------------+------------+------+---------------+------+---------+------+------+----------+-------------+
|  1 | DELETE      | verificationReminders | NULL       | ALL  | NULL          | NULL | NULL    | NULL |    1 |   100.00 | Using where |
+----+-------------+-----------------------+------------+------+---------------+------+---------+------+------+----------+-------------+

It claims that it can't use any indexes and must do a full table scan.

Here's the same query, with the WHERE clause re-arranged to put createdAt by itself on one side:

EXPLAIN DELETE FROM verificationReminders WHERE (ROUND(UNIX_TIMESTAMP(CURTIME(4)) * 1000) - 123) > createdAt AND type = 'first';
+----+-------------+-----------------------+------------+-------+--------------------+--------------------+---------+-------+------+----------+-------------+
| id | select_type | table                 | partitions | type  | possible_keys      | key                | key_len | ref   | rows | filtered | Extra       |
+----+-------------+-----------------------+------------+-------+--------------------+--------------------+---------+-------+------+----------+-------------+
|  1 | DELETE      | verificationReminders | NULL       | range | reminder_createdAt | reminder_createdAt | 8       | const |    1 |   100.00 | Using where |
+----+-------------+-----------------------+------------+-------+--------------------+--------------------+---------+-------+------+----------+-------------+

And suddenly it cam use the index on createdAt! Perhaps there's some reason why MySQL thinks this re-arrangement is relationally unsound, but it seems fine to me.

grumble grumble MySQL...

rfk commented 7 years ago
  ORDER BY createdAt, uid, type

Also, for this order-by to be efficient, you have either make (uid, type) the primary key or include those columns in the createdAt index.

rfk commented 7 years ago

So summing up my ramblings, I guess my recommendations are to try:

shane-tomlinson commented 7 years ago

@rfk - can you triage this as you see appropriate?

rfk commented 7 years ago

-> next because @vladikoff mentioned he's hoping to find time for this soon, and I think it's important to figure out why the current setup is causing problems in our dev environments.

vladikoff commented 7 years ago

Some notes / TODO after talking to @jrgm:

vladikoff commented 7 years ago