Closed eviljeff closed 5 years ago
My plan is to divide the rest of the migrations into two categories:
UPDATE
/DELETE
etc that need to run to prepare for the ALTER TABLE
ALTER TABLE
We should review the migrations to see if there are ALTER TABLE
that that add a constraint that was not there before, things like that, and prepare the relevant statement to fix the data, adding that to the first list. Then:
Note that we don't necessarily have to do all migrations in order. We do want to track which ones we've ran where though. I'm thinking in particular of running the one that touches update_counts
separately from the rest: it's an ALTER
, but it doesn't need the whole site to be put in read-only - it just needs the stats commands that are run via cron to not run while it's taking place.
Sounds like a good plan
mozilla/addons#6976 is all the migrations as a checklist
Addition to the plan: if ops have time, it'd be good to have a copy of the production database on which to run the update_counts
ALTER TABLE
to see how long it'd take in prod.
In fact we should do this for all migrations. So, the plan for today is:
DELETE FROM `file_validation` USING `file_validation`
LEFT JOIN `files` ON `file_id` = `files`.`id`
WHERE `files`.`id` IS NULL AND `file_id` IS NOT NULL;
DELETE FROM update_counts
USING update_counts
LEFT JOIN addons
ON addon_id
= addons
.id
WHERE addons
.id
IS NULL AND addon_id
IS NOT NULL;
UPDATE versions
SET license_id
=5 WHERE license_id
IN (6316,6317);
DELETE FROM licenses
WHERE id
IN (6316, 6317);
- Make a clone of production db that includes those fixes, and try to run every migration in https://github.com/mozilla/addons/issues/6976 one by one (maybe doing them in reverse order they are listed, to avoid the big ones first)
- Note how much time they take, to prepare for running them on Tuesday. If any of them fails, build a fix to clean up the corresponding table
- Bonus: Run the data fixes for username (run this multiple times until all data is fixed, altering the `LIMIT` value if needed. Use a new sql shell each time to automatically drop the temporary table):
```sql
CREATE TEMPORARY TABLE `duplicate_usernames`
SELECT LOWER(`username`) AS `username` FROM `users`
GROUP BY LOWER(`username`) HAVING COUNT(*) > 1 LIMIT 50000;
UPDATE `users`
SET `username`=CONCAT('anonymous-', MD5(RAND()))
WHERE LOWER(`username`) != `username`
AND LOWER(`username`) IN (SELECT `username` FROM `duplicate_usernames`);
Finished the 4 queries
MySQL [addons_mozilla_org]> DELETE FROM `file_validation` USING `file_validation`
-> LEFT JOIN `files` ON `file_id` = `files`.`id`
-> WHERE `files`.`id` IS NULL AND `file_id` IS NOT NULL;
Query OK, 1 row affected (35.24 sec)
MySQL [addons_mozilla_org]> DELETE FROM `update_counts` USING `update_counts`
-> LEFT JOIN `addons` ON `addon_id` = `addons`.`id`
-> WHERE `addons`.`id` IS NULL AND `addon_id` IS NOT NULL;
Query OK, 49448 rows affected (1 hour 12 min 27.54 sec)
MySQL [addons_mozilla_org]> UPDATE `versions` SET `license_id`=5 WHERE `license_id` IN (6316,6317);
Query OK, 24 rows affected (0.02 sec)
Rows matched: 24 Changed: 24 Warnings: 0
MySQL [addons_mozilla_org]> DELETE FROM `licenses` WHERE `id` IN (6316, 6317);
Query OK, 2 rows affected (0.01 sec)
Note how much time they take, to prepare for running them on Tuesday. If any of them fails, build a fix to clean up the corresponding table
All migrations except 1200 have finished testing. Below are the results. And note that results from migrations that took less than 1 minute are omitted.
ALTER TABLE
took 32 min 44.93 secALTER TABLE
in migration 1200 has been running 4.5 hours and not finished yet.
Ok cool. So we should run all remaining migrations except 1200 today in read-only mode. Based on those results they should all apply cleanly now, and the whole thing should take around one hour.
Then we'll need to figure out the usernames one and the update counts. For the update counts, it depends on how long it takes, but since nothing should write to this table except for the update_counts_from_file
cron that runs every day, maybe it should be ok to run it while the site is operating normally and just let it take a long time?
ALTER TABLE
in migration 1200 has been running 4.5 hours and not finished yet.
Now it's finished with this amount of time taken.
Query OK, 379915506 rows affected (9 hours 55 min 8.37 sec)
Records: 379915506 Duplicates: 0 Warnings: 0
So regarding updating the users
table, I have tried the CREATE TEMPORARY TABLE + UPDATE user
method on a test db. Here are a couple of comments.
The MD5(RAND())
is not random enough. It can only generate 10,000 to 15,000 unique random strings. During my test, first I had to change the LIMIT 50000
clause to LIMIT 5000
in order to get a successful run for UPDATE users
. And then with LIMIT 5000
, I can only run the 2 queries (CREATE TEMPORARY TABLE + UPDATE users
) twice. I always get an error for the third run, i.e., ERROR 1062 (23000): Duplicate entry
.
I had to create an index when creating the temporary table, i.e.
CREATE TEMPORARY TABLE `duplicate_usernames` (UNIQUE (username))
SELECT LOWER(`username`) AS `username` FROM `users`
GROUP BY LOWER(`username`) HAVING COUNT(*) > 1 LIMIT 5000;
Without the index, it will take 4+ hours for the UPDATE users
to finish even with just 5000 duplicated usernames, and there are about 130K of them.
In summary,
we need a different random algorithm that can generate at least 130K unique random strings without repeating itself.
I think previously, the most time consuming part was probably the IN (SELECT username
FROM duplicate_usernames
) clause, because it would have to do sequential scan to find the match, with the index, we may not need the LIMIT clause when creating the temporary table;
That's useful info, thanks. Can you try the same approach with the index and LOWER(HEX(RANDOM_BYTES(16)))
as the random function instead of MD5(RAND())
?
@diox LOWER(HEX(RANDOM_BYTES(16)))
worked. With that and the UNIQUE index, it took 46 seconds to finish on the test db.
Query OK, 149187 rows affected (46.36 sec)
Rows matched: 149187 Changed: 149187 Warnings: 0
https://github.com/mozilla/addons/issues/1812 says that we need to apply ALTER TABLE `users` CHANGE COLUMN `username` `username` VARCHAR (255) NOT NULL;
to -prod after fixing the inconsistency, is that still true?
Yes, the ALTER
is what gets rid of the collation we no longer want on the column.
Hmm, the ALTER
query fails with the following error,
mysql> ALTER TABLE `users` CHANGE COLUMN `username` `username` VARCHAR (255) NOT NULL;
ERROR 1062 (23000): Duplicate entry '***-häusler' for key 'username'
There seems to be a UTF8 character in that username, which may have caused the failure.
If I ran select username from users where username like '***-h%';
, 6 rows were returned, and in addition to the one in the error message, another one of them was ***-häußler
.
Note: ***
represents a few ASCII characters, used to protect user privacy.
Ah, yes, this is one consequence of the current collation we're getting rid of. It's possible there are several cases like this one, I was hoping that the check used to create the temporary duplicate_usernames
usernames tables was enough, but looks like it isn't.
We can re-use the same logic and reset those usernames to an anonymous version. If there are too many of them we'll need to change the CREATE TEMPORARY TABLE
+ UPDATE
query to include them, if not maybe we can just do it manually for those affected.
If there are too many of them we'll need to change the
CREATE TEMPORARY TABLE
+UPDATE
query to include them, if not maybe we can just do it manually for those affected.
I don't know how many there are. How do we find it out?
We can re-use the same logic and reset those usernames to an anonymous version.
I'm not sure about what you mean by "re-use the same logic" part, do you mean run the same CREATE TEMPORARY TABLE
query twice?
That won't work because with the current CREATE TEMPORARY TABLE
query, it doesn't think ***-häusler
and ***-häußler
are duplicated usernames, thus the ALTER TABLE
failed.
I don't know how many there are. How do we find it out?
I'm not sure besides just running the query multiple times and fixing the offending usernames every time, tbh. The query should already have picked it up :(
I'm not sure about what you mean by "re-use the same logic" part, do you mean run the same
CREATE TEMPORARY TABLE
query twice? That won't work because with the currentCREATE TEMPORARY TABLE
query, it doesn't think -häusler and -häußler are duplicated usernames, thus theALTER TABLE
failed.
We can manually add problematic usernames to the temporary tables (adding an INSERT INTO
before the ALTER
).
Here is what i just ran on the test db, and it failed.
mysql> INSERT INTO `duplicate_usernames` (`username`) VALUES ('***-häusler');
Query OK, 1 row affected (0.00 sec)
mysql> UPDATE `users`
-> SET `username`=CONCAT('anonymous-', HEX(RANDOM_BYTES(16)))
-> WHERE LOWER(`username`) != `username`
-> AND LOWER(`username`) IN (SELECT `username` FROM `duplicate_usernames`);
Query OK, 0 rows affected (51.67 sec)
Rows matched: 0 Changed: 0 Warnings: 0
mysql> SELECT `username` FROM `duplicate_usernames`;
+------------------+
| username |
+------------------+
| ***-häusler |
+------------------+
1 row in set (0.00 sec)
mysql> ALTER TABLE `users` CHANGE COLUMN `username` `username` VARCHAR (255) NOT NULL;
ERROR 1062 (23000): Duplicate entry '***-häusler' for key 'username'
I don't know what the purpose is for LOWER(`username`) != `username`
clause, but I suspect it doesn't work well with usernames that have UTF8 characters in them.
Latest update:
So far I've found 2 usernames that have UTF8 characters in them and have caused ALTER TABLE
to fail. Manually adding the 2 usernames into the temp table, and then run UPDATE
without the LOWER(`username`) != `username`
worked for these 2 usernames.
Current issue at hand,
mysql> ALTER TABLE `users` CHANGE COLUMN `username` `username` VARCHAR (255) NOT NULL;
ERROR 1062 (23000): Duplicate entry 'fas' for key 'username'
Even though this username doesn't have any UTF8 characters, @diox theory is that there may be other usernames that do have UTF8 characters that conflict with this one. However, there are thousands of usernames that contains fas
, so it's nearly impossible to manually identify which one causes the issue.
mysql> select count(*) from users where username like '%fas%';
+----------+
| count(*) |
+----------+
| 2633 |
+----------+
I guess one way to get around the fas
username is to manually change it to something random, however, the bigger questions is how many of those cases exist in the database? Manually trying them out could easily take hours.
Can we have a programmatical way to first figure out how many such conflicts there are, then we can come up with a plan to fix it, either manually or programmatically?
Can we have a programmatical way to first figure out how many such conflicts there are, then we can come up with a plan to fix it, either manually or programmatically?
I sort of answered the first part of my own question by running this,
mysql> select count(*) from users where username <> CONVERT(username USING ASCII);
+----------+
| count(*) |
+----------+
| 115338 |
+----------+
With that many usernames that contain non ASCII characters, I don't think we can solve the issue manually (e.g. I've spent over an hour repeatedly running the queries manually, and only found and fixed 5 such duplicated names, and there are apparently many many more).
Closing this issue, almost everything was done, we'll deal with usernames in mozilla/addons#1812
Prod has the following problems that need addressing
FileValidation
don't have a valid fk to aFile
mozilla/addons-server#12393UpdateCount
don't have a valid fk to anAddon
https://github.com/mozilla/addons-server/pull/12406License
is linking to the sameTranslation
mozilla/addons#6970In addition, we found this issue on stage and will affect prod too