mdn / sprints

Archived: MDN Web Docs issues are tracked in the content repository.
https://github.com/mdn/content
Creative Commons Zero v1.0 Universal
150 stars 143 forks source link

Explore solution(s) to prevent/reduce the accumulation of slow MySQL queries #760

Closed escattone closed 5 years ago

escattone commented 5 years ago

User story

As an MDN staff developer, I want to explore ways to eliminate or reduce the accumulation of expensive, long-running (>15 minutes) MySQL queries that degrade the database performance, so I can reduce the probability of New Relic Apdex alerts (and in turn, of course, degraded performance for users).

Acceptance criteria

Tasks

escattone commented 5 years ago

Here is one of the offending queries that we often see running longer than 15 minutes:

SELECT `wiki_revision`.`id`, `wiki_revision`.`document_id`, `wiki_revision`.`title`, `wiki_revision`.`slug`, `wiki_revision`.`summary`, `wiki_revision`.`tidied_content`, `wiki_revision`.`keywords`, `wiki_revision`.`tags`, `wiki_revision`.`toc_depth`, `wiki_revision`.`render_max_age`, `wiki_revision`.`created`, `wiki_revision`.`comment`, `wiki_revision`.`creator_id`, `wiki_revision`.`is_approved`, `wiki_revision`.`based_on_id`, `wiki_revision`.`is_mindtouch_migration` FROM `wiki_revision` ORDER BY `wiki_revision`.`id` DESC LIMIT 50 OFFSET 819650

and it is generated from the dashboards.revisions endpoint (/<locale>/dashboards/revisions). That endpoint, which only allows GET requests, will return the results for a single page of 50 revisions, with links for other pages. Based on the volume of these requests that we've seen and their offsets, it appears that a scraper is requesting the initial page and then following the links to each of all of the other pages (over 16K pages). If that's the case, the scraper is clearly not well-behaved, since all of the dashboards endpoints have been disallowed in https://developer.mozilla.org/robots.txt.

jwhitlock commented 5 years ago

We had some discussion in Slack. Possible solutions:

  1. Require login to see the revision dashboard
  2. Require login to see the pagination links on the revision dashboard (will remove the problematic link to the earliest page)
  3. Block by user agent (can be easily avoided, and CloudFront removes it so we'll need to block with lambda edge)
  4. Install RDS performance insights (https://github.com/mdn/infra/issues/172), see if database performance is limited and can be updated (but @metadave says similar work on support.mozilla.org just meant the crawlers took down the site faster)
  5. Update to MySQL 5.7 (first in development, then in AWS with https://github.com/mdn/infra/issues/170).
davidflanagan commented 5 years ago

Here's a patch that makes the revision dashboard available only to logged in users: https://github.com/mozilla/kuma/pull/5196

jwhitlock commented 5 years ago

@escattone is this a candidate for sprint 2, or should we track remaining work in the infra issues https://github.com/mdn/infra/issues/172 and https://github.com/mdn/infra/issues/170?

escattone commented 5 years ago

@jwhitlock I've been checking (almost daily) the log of the slow-query killer (percona's pt-kill) since we required authentication for the revisions dashboard endpoint, and as far as I know it hasn't detected and killed a single slow (>15 minutes) query since then. I think we should close this as resolved, and track any remaining work in the infra issues https://github.com/mdn/infra/issues/172 and https://github.com/mdn/infra/issues/170. Thanks!