mozmeao / infra

Mozilla Marketing Engineering and Operations Infrastructure
https://mozilla.github.io/meao/
Mozilla Public License 2.0
59 stars 12 forks source link

Manually test MM in AWS #498

Closed jwhitlock closed 7 years ago

jwhitlock commented 7 years ago

Copied from PR #469:

This checklist uses these URLs:

Preparing for Maintenance Mode

If you want a public-facing site, you'll need to generate content before putting the site in Maintenance Mode. The full steps are in todo, but here's the quick version:

Manual Sanity Check

Home page

Load https://mdn-mm.moz.works/en-US

Article page

Load https://mdn-mm.moz.works/en-US/docs/Web/HTML

Maintenance Mode page

Click a banner to load https://mdn-mm.moz.works/en-US/maintenance-mode

Automated Checks

With a Kuma environment, run the functional tests, using a command like:

py.test --maintenance-mode tests/functional tests/redirects --base-url https://mdn-mm.moz.works --driver Chrome --driver-path /path/to/chromedriver

These check that disabled endpoints redirect to the maintenance page, and that other pages are OK. It duplicates the Manual Sanity Check.

Full Manual Tests

Note: Many of these are candidates for headless testing

Background

Maintenance Mode is used to serve a recent copy of the site content, without allowing logins or database writes. It is used when the usual database is unavailable due to maintenance, such as a long-running migration or a datacenter transfer.

It is enabled with the environment variable MAINTENANCE_MODE=True. More information and local development setup is in the Kuma documentation.

It is possible, but not required, to run maintenance mode against a read-only database.

In this mode, no one is logged in and cookies are not sent. Articles can not be created, translated, or edited. If a page was not rendered, it can't be in maintenance mode, and KumaScript macros will be seen in the output.

Background tasks may still be scheduled, so the celery broker should remain available. Many tasks are completed with no action. Others, such as sitemap generation and cacheback refreshing, continue to run as normal.

escattone commented 7 years ago

First pass comments:

  1. Making the sitemaps failed initially (the command was being "killed" when running ./manage.py make_sitemaps on the pod for the "web" deployment -- probably by linux for violating the cgroups memory limit for the "web" deployment). When we bumped up the memory limit of the "web" deployment to 8Gi, the command worked fine. The making of the sitemaps is normally a set of individual Celery tasks (one for each locale, and then a final one for the index) rather than a single task executed in one go, so there might not be any need to permanently raise the memory limit of the "celery-worker" deployment.
  2. I had to create, populate, and promote a new search index.
  3. I had to run ./manage.py make_humans on the pod for the "web" deployment to get a successful response for https://mdn-mm.moz.works/humans.txt.
  4. I had to set constance.KUMASCRIPT_TIMEOUT -- via the admin console -- to a non-zero value to get a successful response for https://mdn-mm.moz.works/media/kumascript-revision.txt.
  5. 404 for https://mdn-mm.moz.works/robots.txt. This is because it's being served from MEDIA_ROOT which in AWS is /mdn/www, but we didn't copy the /app/media/robots.txt and /app/media/robots-go-away.txt files from the repo into that directory. I think I'll create a new Kuma PR that changes the service root from MEDIA_ROOT to a new ROBOTS_ROOT value, and then the default for that will be the value of MEDIA_ROOT (SCL3) unless it's set from the environment (AWS). See https://github.com/mozilla/kuma/pull/4437.
  6. the "celery-beat" pod generated the following error in maintenance mode (see https://sentry.prod.mozaws.net/operations/mdn-stage/issues/649650/):
    DBAccessError: (13, 'Permission denied')
    File "celery/apps/beat.py", line 112, in start_scheduler
    beat.start()
    File "celery/beat.py", line 470, in start
    humanize_seconds(self.scheduler.max_interval))
    File "kombu/utils/__init__.py", line 325, in __get__
    value = obj.__dict__[self.__name__] = self.__get(obj)
    File "celery/beat.py", line 512, in scheduler
    return self.get_scheduler()
    File "celery/beat.py", line 507, in get_scheduler
    lazy=lazy)
    File "celery/utils/imports.py", line 53, in instantiate
    return symbol_by_name(name)(*args, **kwargs)
    File "celery/beat.py", line 358, in __init__
    Scheduler.__init__(self, *args, **kwargs)
    File "celery/beat.py", line 185, in __init__
    self.setup_schedule()
    File "celery/beat.py", line 384, in setup_schedule
    self._store = self._destroy_open_corrupted_schedule(exc)
    File "celery/beat.py", line 372, in _destroy_open_corrupted_schedule
    return self._open_schedule()
    File "celery/beat.py", line 366, in _open_schedule
    return self.persistence.open(self.schedule_filename, writeback=True)
    File "python2.7/shelve.py", line 243, in open
    return DbfilenameShelf(filename, flag, protocol, writeback)
    File "python2.7/shelve.py", line 227, in __init__
    Shelf.__init__(self, anydbm.open(filename, flag), protocol, writeback)
    File "python2.7/anydbm.py", line 85, in open
    return mod.open(file, flag, mode)
    File "python2.7/dbhash.py", line 18, in open
    return bsddb.hashopen(file, flag, mode)
    File "bsddb/__init__.py", line 364, in hashopen
    d.open(file, db.DB_HASH, flags, mode)
escattone commented 7 years ago

I'll finish in the morning!

escattone commented 7 years ago

Note that https://github.com/mozilla/kuma/pull/4437 has been closed in favor of https://github.com/mozilla/kuma/pull/4438.

bookshelfdave commented 7 years ago

👏 fantastic progress! 🍰

escattone commented 7 years ago

Thanks @jwhitlock for the quick review of mozilla/kuma#4438! I deployed to mdn-mm.moz.works the new Docker image (KUMA_IMAGE_TAG=bf7935a) containing the robots fix from mozilla/kuma#4438, and now the robots endpoint works fine. Onward to running the functional tests!

escattone commented 7 years ago

Results from first run of functional tests against https://mdn-mm.moz.works:

(mdntest) rjohnson-25186:kuma rjohnson$ py.test --maintenance-mode tests/functional tests/redirects --base-url https://mdn-mm.moz.works --driver Chrome --driver-path ~/pytest-selenium/chromedriver
=============================================== test session starts ===============================================
platform darwin -- Python 2.7.12, pytest-3.0.7, py-1.4.33, pluggy-0.4.0
baseurl: https://mdn-mm.moz.works
driver: Chrome
sensitiveurl: .*
rootdir: /Users/rjohnson/repos/kuma, inifile: pytest.ini
plugins: base-url-1.1.0, html-1.10.1, rerunfailures-2.1.0, selenium-1.4.0, variables-1.4, xdist-1.16.0
collected 289 items

tests/functional/test_article.py ..s..s...
tests/functional/test_article_edit.py sss
tests/functional/test_article_new.py s
tests/functional/test_article_revision.py ..
tests/functional/test_article_translate.py s
tests/functional/test_content_experiment.py sss
tests/functional/test_dashboard.py ..ss.....
tests/functional/test_feedback.py ...
tests/functional/test_home.py .....s..
tests/functional/test_language_selector.py .
tests/functional/test_maintenance_mode_redirects.py ..........................F......................................
tests/functional/test_notfound.py ...
tests/functional/test_profiles.py .
tests/functional/test_report.py F.
tests/functional/test_robots.py .
tests/functional/test_search.py .......
tests/redirects/test_redirects.py ..........................................................................................................................................................................
============================================= short test summary info =============================================
SKIP [1] tests/functional/test_article_new.py:8: the server is in maintenance mode
SKIP [1] tests/functional/test_article_translate.py:7: the server is in maintenance mode
SKIP [1] tests/functional/test_article_edit.py:20: the server is in maintenance mode
SKIP [1] tests/functional/test_dashboard.py:51: the server is in maintenance mode
SKIP [1] tests/functional/test_article.py:65: the server is in maintenance mode
SKIP [1] tests/functional/test_dashboard.py:68: the server is in maintenance mode
SKIP [1] tests/functional/test_content_experiment.py:35: got empty parameter set ['exp_id', 'locale', 'slug'], function test_content_exp_redirect at /Users/rjohnson/repos/kuma/tests/functional/test_content_experiment.py:35
SKIP [1] tests/functional/test_article_edit.py:9: the server is in maintenance mode
SKIP [1] tests/functional/test_article_edit.py:54: the server is in maintenance mode
SKIP [1] tests/functional/test_home.py:64: the server is in maintenance mode
SKIP [1] tests/functional/test_content_experiment.py:61: got empty parameter set ['exp_id', 'locale', 'slug', 'variant'], function test_content_exp_variant at /Users/rjohnson/repos/kuma/tests/functional/test_content_experiment.py:61
SKIP [1] tests/functional/test_article.py:34: the server is in maintenance mode
SKIP [1] tests/functional/test_content_experiment.py:48: got empty parameter set ['exp_id', 'locale', 'slug'], function test_content_exp_logged_in at /Users/rjohnson/repos/kuma/tests/functional/test_content_experiment.py:48

==================================================== FAILURES =====================================================
__________________________ test_redirect[post-{locale}/docs/User:anonymous:uitest$purge] __________________________
Traceback (most recent call last):
  File "/Users/rjohnson/repos/kuma/tests/functional/test_maintenance_mode_redirects.py", line 107, in test_redirect
    assert resp.status_code == 200
AssertionError: assert 504 == 200
 +  where 504 = <Response [504]>.status_code
------------------------------------------------- pytest-selenium -------------------------------------------------
URL: data:,
_______________________________________________ test_report_content _______________________________________________
Traceback (most recent call last):
  File "/Users/rjohnson/repos/kuma/tests/functional/test_report.py", line 23, in test_report_content
    assert page.header.is_report_content_url_expected(selenium, report_url)
AssertionError: assert False
 +  where False = <bound method Header.is_report_content_url_expected of <pages.base.Header object at 0x110b35810>>(<selenium.webdriver.chrome.webdriver.WebDriver (session="e77b7942e1b81d6c6560531aeb191f86")>, 'https://mdn-mm.moz.works/en-US/docs/User:anonymous:uitest')
 +    where <bound method Header.is_report_content_url_expected of <pages.base.Header object at 0x110b35810>> = <pages.base.Header object at 0x110b35810>.is_report_content_url_expected
 +      where <pages.base.Header object at 0x110b35810> = <pages.article.ArticlePage object at 0x110b3aed0>.header
------------------------------------------------- pytest-selenium -------------------------------------------------
URL: about:blank
=============================== 2 failed, 274 passed, 13 skipped in 380.07 seconds ================================
escattone commented 7 years ago

When I ran the tests again, the two tests that failed above ran successfully, but a different test failed:

(mdntest) rjohnson-25186:kuma rjohnson$ py.test --maintenance-mode tests/functional tests/redirects --base-url https://mdn-mm.moz.works --driver Chrome --driver-path ~/pytest-selenium/chromedriver
=============================================== test session starts ===============================================
platform darwin -- Python 2.7.12, pytest-3.0.7, py-1.4.33, pluggy-0.4.0
baseurl: https://mdn-mm.moz.works
driver: Chrome
sensitiveurl: .*
rootdir: /Users/rjohnson/repos/kuma, inifile: pytest.ini
plugins: base-url-1.1.0, html-1.10.1, rerunfailures-2.1.0, selenium-1.4.0, variables-1.4, xdist-1.16.0
collected 289 items

tests/functional/test_article.py ..s..s...
tests/functional/test_article_edit.py sss
tests/functional/test_article_new.py s
tests/functional/test_article_revision.py ..
tests/functional/test_article_translate.py s
tests/functional/test_content_experiment.py sss
tests/functional/test_dashboard.py ..ss.....
tests/functional/test_feedback.py ...
tests/functional/test_home.py .....s..
tests/functional/test_language_selector.py .
tests/functional/test_maintenance_mode_redirects.py F................................................................
tests/functional/test_notfound.py ...
tests/functional/test_profiles.py .
tests/functional/test_report.py ..
tests/functional/test_robots.py .
tests/functional/test_search.py .......
tests/redirects/test_redirects.py ..........................................................................................................................................................................
============================================= short test summary info =============================================
SKIP [1] tests/functional/test_article_new.py:8: the server is in maintenance mode
SKIP [1] tests/functional/test_article_translate.py:7: the server is in maintenance mode
SKIP [1] tests/functional/test_article_edit.py:20: the server is in maintenance mode
SKIP [1] tests/functional/test_dashboard.py:51: the server is in maintenance mode
SKIP [1] tests/functional/test_article.py:65: the server is in maintenance mode
SKIP [1] tests/functional/test_dashboard.py:68: the server is in maintenance mode
SKIP [1] tests/functional/test_content_experiment.py:35: got empty parameter set ['exp_id', 'locale', 'slug'], function test_content_exp_redirect at /Users/rjohnson/repos/kuma/tests/functional/test_content_experiment.py:35
SKIP [1] tests/functional/test_article_edit.py:9: the server is in maintenance mode
SKIP [1] tests/functional/test_article_edit.py:54: the server is in maintenance mode
SKIP [1] tests/functional/test_home.py:64: the server is in maintenance mode
SKIP [1] tests/functional/test_content_experiment.py:61: got empty parameter set ['exp_id', 'locale', 'slug', 'variant'], function test_content_exp_variant at /Users/rjohnson/repos/kuma/tests/functional/test_content_experiment.py:61
SKIP [1] tests/functional/test_article.py:34: the server is in maintenance mode
SKIP [1] tests/functional/test_content_experiment.py:48: got empty parameter set ['exp_id', 'locale', 'slug'], function test_content_exp_logged_in at /Users/rjohnson/repos/kuma/tests/functional/test_content_experiment.py:48

==================================================== FAILURES =====================================================
_________________________________________ test_redirect[get-admin/login] __________________________________________
Traceback (most recent call last):
  File "/Users/rjohnson/repos/kuma/tests/functional/test_maintenance_mode_redirects.py", line 96, in test_redirect
    mm_page.wait_for_page_to_load()
  File "/Users/rjohnson/repos/kuma/tests/pages/base.py", line 21, in wait_for_page_to_load
    self.wait.until(lambda s: self.seed_url in s.current_url)
  File "/Users/rjohnson/virtualenvs/mdntest/lib/python2.7/site-packages/selenium/webdriver/support/wait.py", line 80, in until
    raise TimeoutException(message, screen, stacktrace)
TimeoutException: Message:

------------------------------------------------- pytest-selenium -------------------------------------------------
URL: https://mdn-mm.moz.works/maintenance-mode
=============================== 1 failed, 275 passed, 13 skipped in 387.11 seconds ================================

So since the failures are timing/intermittent failures the functional tests are effectively passing.