owid / owid-grapher

A platform for creating interactive data visualizations
https://ourworldindata.org
MIT License
1.4k stars 228 forks source link

MySQL deadlocks caused by chart approval tool #3159

Closed Marigold closed 5 months ago

Marigold commented 10 months ago

When someone uses chart approval tool, admin tends to die on MySQL deadlocks. We turned on slow query log (ssh owid@owid-live-db 'sudo cat /var/log/mysql/slow-query.log') and the output around the downtime is below

# Time: 2024-02-01T09:01:14.312549Z
# User@Host: live_grapher[live_grapher] @  [209.97.185.49]  Id: 8754319
# Query_time: 120.505255  Lock_time: 120.504254 Rows_sent: 0  Rows_examined: 0
SET timestamp=1706777953;
UPDATE `users` SET `lastSeen` = '2024-02-01 08:59:13.798' WHERE `id` IN (62);
# Time: 2024-02-01T09:01:14.312549Z
# User@Host: live_grapher[live_grapher] @  [209.97.185.49]  Id: 8754204
# Query_time: 120.239866  Lock_time: 120.238823 Rows_sent: 0  Rows_examined: 0
SET timestamp=1706777954;
UPDATE `users` SET `lastSeen` = '2024-02-01 08:59:14.064' WHERE `id` IN (62);
# Time: 2024-02-01T09:01:14.312680Z
# User@Host: live_grapher[live_grapher] @  [209.97.185.49]  Id: 8754207
# Query_time: 120.016265  Lock_time: 120.015208 Rows_sent: 0  Rows_examined: 0
SET timestamp=1706777954;
UPDATE `users` SET `lastSeen` = '2024-02-01 08:59:14.203' WHERE `id` IN (62);
# Time: 2024-02-01T09:01:14.312709Z
# User@Host: live_grapher[live_grapher] @  [209.97.185.49]  Id: 8754203
# Query_time: 120.532144  Lock_time: 120.530995 Rows_sent: 0  Rows_examined: 0
SET timestamp=1706777953;
UPDATE `users` SET `lastSeen` = '2024-02-01 08:59:13.755' WHERE `id` IN (62);
# Time: 2024-02-01T09:01:14.312602Z
# User@Host: live_grapher[live_grapher] @  [209.97.185.49]  Id: 8754168
# Query_time: 120.585468  Lock_time: 120.584527 Rows_sent: 0  Rows_examined: 0
SET timestamp=1706777953;
UPDATE `users` SET `lastSeen` = '2024-02-01 08:59:13.687' WHERE `id` IN (62);
# Time: 2024-02-01T09:01:14.312774Z
# User@Host: live_grapher[live_grapher] @  [209.97.185.49]  Id: 8754206
# Query_time: 120.580691  Lock_time: 120.579640 Rows_sent: 0  Rows_examined: 0
SET timestamp=1706777953;
UPDATE `users` SET `lastSeen` = '2024-02-01 08:59:13.694' WHERE `id` IN (62);
# Time: 2024-02-01T09:01:14.312548Z
# User@Host: live_grapher[live_grapher] @  [209.97.185.49]  Id: 8754318
# Query_time: 120.570925  Lock_time: 120.569934 Rows_sent: 0  Rows_examined: 0
SET timestamp=1706777953;
UPDATE `users` SET `lastSeen` = '2024-02-01 08:59:13.700' WHERE `id` IN (62);
# Time: 2024-02-01T09:01:14.312570Z
# User@Host: live_grapher[live_grapher] @  [209.97.185.49]  Id: 8754208
# Query_time: 120.402001  Lock_time: 120.400908 Rows_sent: 0  Rows_examined: 0
SET timestamp=1706777953;
UPDATE `users` SET `lastSeen` = '2024-02-01 08:59:13.896' WHERE `id` IN (62);
# Time: 2024-02-01T09:01:14.312624Z
# User@Host: live_grapher[live_grapher] @  [209.97.185.49]  Id: 8753142
# Query_time: 120.126991  Lock_time: 120.125999 Rows_sent: 0  Rows_examined: 0
SET timestamp=1706777954;
UPDATE `users` SET `lastSeen` = '2024-02-01 08:59:14.164' WHERE `id` IN (62);
# Time: 2024-02-01T09:02:37.074489Z
# User@Host: live_grapher[live_grapher] @  [100.105.38.119]  Id: 8754385
# Query_time: 17.559864  Lock_time: 0.000008 Rows_sent: 440  Rows_examined: 1124
SET timestamp=1706778139;
SELECT `GdocPost`.`id` AS `GdocPost_id`, `GdocPost`.`slug` AS `GdocPost_slug`, `GdocPost`.`content` AS `GdocPost_content`, `GdocPost`.`published` AS `GdocPost_published`, `GdocPost`.`createdAt` AS `GdocPost_createdAt`, `GdocPost`.`publishedAt` AS `GdocPost_publishedAt`, `GdocPost`.`updatedAt` AS `GdocPost_updatedAt`, `GdocPost`.`revisionId` AS `GdocPost_revisionId`, `GdocPost`.`markdown` AS `GdocPost_markdown`, `GdocPost`.`publicationContext` AS `GdocPost_publicationContext`, `GdocPost`.`breadcrumbs` AS `GdocPost_breadcrumbs`, `GdocPost__GdocPost_tags`.`id` AS `GdocPost__GdocPost_tags_id`, `GdocPost__GdocPost_tags`.`name` AS `GdocPost__GdocPost_tags_name`, `GdocPost__GdocPost_tags`.`createdAt` AS `GdocPost__GdocPost_tags_createdAt`, `GdocPost__GdocPost_tags`.`updatedAt` AS `GdocPost__GdocPost_tags_updatedAt`, `GdocPost__GdocPost_tags`.`parentId` AS `GdocPost__GdocPost_tags_parentId`, `GdocPost__GdocPost_tags`.`isBulkImport` AS `GdocPost__GdocPost_tags_isBulkImport`, `GdocPost__GdocPost_tags`.`slug` AS `GdocPost__GdocPost_tags_slug`, `GdocPost__GdocPost_tags`.`specialType` AS `GdocPost__GdocPost_tags_specialType` FROM `posts_gdocs` `GdocPost` LEFT JOIN `posts_gdocs_x_tags` `GdocPost_GdocPost__GdocPost_tags` ON `GdocPost_GdocPost__GdocPost_tags`.`gdocId`=`GdocPost`.`id` LEFT JOIN `tags` `GdocPost__GdocPost_tags` ON `GdocPost__GdocPost_tags`.`id`=`GdocPost_GdocPost__GdocPost_tags`.`tagId` WHERE (`GdocPost`.`published` = true AND `GdocPost`.`publishedAt` <= '2024-02-01 09:02:19.470');
# Time: 2024-02-01T09:02:45.345085Z
# User@Host: live_wordpress[live_wordpress] @  [100.105.38.119]  Id: 8754396
# Query_time: 7.229589  Lock_time: 0.000005 Rows_sent: 919  Rows_examined: 919
use live_wordpress;
SET timestamp=1706778158;
SELECT post_content FROM wp_posts WHERE (post_type='page' OR post_type='post' OR post_type='wp_block') AND post_status='publish';
# Time: 2024-02-01T09:04:16.162961Z
# User@Host: live_grapher[live_grapher] @  [100.105.38.119]  Id: 8754385
# Query_time: 11.459827  Lock_time: 0.000005 Rows_sent: 4466  Rows_examined: 4798
use live_grapher;
SET timestamp=1706778244;
SELECT id, config FROM charts WHERE config->>"$.isPublished" = "true";

It looks like the approval tool locks the users table and admin dies when trying to update it.

More details on slack.

ikesau commented 9 months ago

Just a possible hunch, refactoring this whole function to use knex might do something? I see there's a couple of transaction initiations in it using the old DB code.

https://github.com/owid/owid-grapher/blob/553270fa7ea385663dfd19fb492b5ee25896ad59/adminSiteServer/apiRouter.ts#L768

Marigold commented 9 months ago

That was my hope too, the problem is that it's quite outdated and there's no person with both knex and approval tool knowledge.

danyx23 commented 9 months ago

I'll tackle it my migration from typeorm -> knex this cooldown

danyx23 commented 5 months ago

I don't think we're still seeing those - @Marigold please re-open if this is still an issue