mozilla / sumo

Project management board for SUMO and Community properties.
Mozilla Public License 2.0
12 stars 5 forks source link

Django migrations table corrupted in prod #1655

Closed escattone closed 7 months ago

escattone commented 7 months ago

Error when migrating the production database

The following error occurred during the DB migration step of a production deployment on Jan. 31, 2024 at 9:42am PST.

https://mozilla.sentry.io/share/issue/4961dd9745f5493583bbf72a868d6500/

Operations to perform:
  Apply all migrations: actstream, admin, announcements, auth, authtoken, contenttypes, dashboards, flagit, forums, gallery, groups, guardian, inproduct, journal, karma, kbadge, kbforums, kitsune_messages, kpi, notifications, postcrash, product_details, products, questions, search, sessions, sites, sumo, taggit, tidings, upload, users, waffle, wiki
Running migrations:
Traceback (most recent call last):
  File "/venv/lib/python3.11/site-packages/django/db/backends/utils.py", line 89, in _execute
    return self.cursor.execute(sql, params)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
psycopg2.errors.UniqueViolation: duplicate key value violates unique constraint "django_migrations_pkey"
DETAIL:  Key (id)=(67) already exists.

The above exception was the direct cause of the following exception:

Traceback (most recent call last):
  File "/app/manage.py", line 31, in <module>
    execute_from_command_line(sys.argv)
  File "/venv/lib/python3.11/site-packages/django/core/management/__init__.py", line 442, in execute_from_command_line
    utility.execute()
  File "/venv/lib/python3.11/site-packages/django/core/management/__init__.py", line 436, in execute
    self.fetch_command(subcommand).run_from_argv(self.argv)
  File "/venv/lib/python3.11/site-packages/django/core/management/base.py", line 412, in run_from_argv
    self.execute(*args, **cmd_options)
  File "/venv/lib/python3.11/site-packages/django/core/management/base.py", line 458, in execute
    output = self.handle(*args, **options)
             ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/venv/lib/python3.11/site-packages/django/core/management/base.py", line 106, in wrapper
    res = handle_func(*args, **kwargs)
          ^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/venv/lib/python3.11/site-packages/django/core/management/commands/migrate.py", line 356, in handle
    post_migrate_state = executor.migrate(
                         ^^^^^^^^^^^^^^^^^
  File "/venv/lib/python3.11/site-packages/django/db/migrations/executor.py", line 135, in migrate
    state = self._migrate_all_forwards(
            ^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/venv/lib/python3.11/site-packages/django/db/migrations/executor.py", line 167, in _migrate_all_forwards
    state = self.apply_migration(
            ^^^^^^^^^^^^^^^^^^^^^
  File "/venv/lib/python3.11/site-packages/django/db/migrations/executor.py", line 254, in apply_migration
    self.record_migration(migration)
  File "/venv/lib/python3.11/site-packages/django/db/migrations/executor.py", line 269, in record_migration
    self.recorder.record_applied(migration.app_label, migration.name)
  File "/venv/lib/python3.11/site-packages/django/db/migrations/recorder.py", line 94, in record_applied
    self.migration_qs.create(app=app, name=name)
  File "/venv/lib/python3.11/site-packages/django/db/models/query.py", line 658, in create
    obj.save(force_insert=True, using=self.db)
  File "/venv/lib/python3.11/site-packages/django/db/models/base.py", line 814, in save
    self.save_base(
  File "/venv/lib/python3.11/site-packages/django/db/models/base.py", line 877, in save_base
    updated = self._save_table(
              ^^^^^^^^^^^^^^^^^
  File "/venv/lib/python3.11/site-packages/django/db/models/base.py", line 1020, in _save_table
    results = self._do_insert(
              ^^^^^^^^^^^^^^^^
  File "/venv/lib/python3.11/site-packages/django/db/models/base.py", line 1061, in _do_insert
    return manager._insert(
           ^^^^^^^^^^^^^^^^
  File "/venv/lib/python3.11/site-packages/django/db/models/manager.py", line 87, in manager_method
    return getattr(self.get_queryset(), name)(*args, **kwargs)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/venv/lib/python3.11/site-packages/django/db/models/query.py", line 1805, in _insert
    return query.get_compiler(using=using).execute_sql(returning_fields)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/venv/lib/python3.11/site-packages/django/db/models/sql/compiler.py", line 1822, in execute_sql
    cursor.execute(sql, params)
  File "/venv/lib/python3.11/site-packages/sentry_sdk/integrations/django/__init__.py", line 641, in execute
    result = real_execute(self, sql, params)
             ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/venv/lib/python3.11/site-packages/django/db/backends/utils.py", line 67, in execute
    return self._execute_with_wrappers(
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/venv/lib/python3.11/site-packages/django/db/backends/utils.py", line 80, in _execute_with_wrappers
    return executor(sql, params, many, context)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/venv/lib/python3.11/site-packages/django/db/backends/utils.py", line 84, in _execute
    with self.db.wrap_database_errors:
  File "/venv/lib/python3.11/site-packages/django/db/utils.py", line 91, in __exit__
    raise dj_exc_value.with_traceback(traceback) from exc_value
  File "/venv/lib/python3.11/site-packages/django/db/backends/utils.py", line 89, in _execute
    return self.cursor.execute(sql, params)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
django.db.utils.IntegrityError: duplicate key value violates unique constraint "django_migrations_pkey"
DETAIL:  Key (id)=(67) already exists.

  Applying products.0006_alter_product_and_topic_images...

Proximate Cause

The django_migrations table in production is corrupted:

Comparison between stage and prod

Clean django_migrations table on stage

sumo_stage=> select id, app, name from django_migrations order by app, name;
 id |       app        |                                    name
----+------------------+----------------------------------------------------------------------------
  3 | actstream        | 0001_initial
  4 | actstream        | 0002_remove_action_data
  5 | actstream        | 0003_add_follow_flag
  6 | admin            | 0001_initial
  7 | admin            | 0002_logentry_remove_auto_add
  8 | admin            | 0003_logentry_add_action_flag_choices
 25 | announcements    | 0001_squashed_0006_remove_announcement_group_and_more
  2 | auth             | 0001_initial
 14 | auth             | 0002_alter_permission_name_max_length
 15 | auth             | 0003_alter_user_email_max_length
 16 | auth             | 0004_alter_user_username_opts
 17 | auth             | 0005_alter_user_last_login_null
 18 | auth             | 0006_require_contenttypes_0002
 19 | auth             | 0007_alter_validators_add_error_messages
 20 | auth             | 0008_alter_user_username_max_length
 21 | auth             | 0009_alter_user_last_name_max_length
 22 | auth             | 0010_alter_group_name_max_length
 23 | auth             | 0011_update_proxy_permissions
 24 | auth             | 0012_alter_user_first_name_max_length
 26 | authtoken        | 0001_initial
 27 | authtoken        | 0002_auto_20160226_1747
 28 | authtoken        | 0003_tokenproxy
  1 | contenttypes     | 0001_initial
 13 | contenttypes     | 0002_remove_content_type_name
 29 | dashboards       | 0001_squashed_0010_auto_20210726_1036
 30 | flagit           | 0001_squashed_0002_auto_20200629_0826
 31 | flagit           | 0003_alter_flaggedobject_reason
 34 | forums           | 0001_squashed_0004_remove_authority
 11 | gallery          | 0001_squashed_0009_auto_20220107_0617
 35 | groups           | 0001_squashed_0002_auto_20200629_0826
 32 | guardian         | 0001_initial
 33 | guardian         | 0002_generic_permissions_index
 36 | inproduct        | 0001_squashed_0003_alter_redirect_platform_alter_redirect_product_and_more
 37 | journal          | 0001_initial
 38 | karma            | 0001_squashed_0002_auto_20200629_0826
 39 | kbadge           | 0001_squashed_0005_drop_ghost_columns
 40 | kbforums         | 0001_initial
 41 | kitsune_messages | 0001_initial
 42 | kpi              | 0001_squashed_0002_cohort_retention_models
 43 | notifications    | 0001_initial
 44 | postcrash        | 0001_initial
 45 | product_details  | 0001_initial
 46 | product_details  | 0002_auto_20151006_1348
 10 | products         | 0001_squashed_0005_auto_20200629_0826
 67 | products         | 0006_alter_product_and_topic_images
 48 | questions        | 0001_squashed_0013_alter_question_is_archived
 49 | search           | 0001_squashed_0006_auto_20210526_0243
 50 | sessions         | 0001_initial
 51 | sites            | 0001_initial
 52 | sites            | 0002_alter_domain_unique
 53 | sumo             | 0001_squashed_0002_initial_data
  9 | taggit           | 0001_initial
 54 | taggit           | 0002_auto_20150616_2121
 55 | taggit           | 0003_taggeditem_add_unique_index
 56 | taggit           | 0004_alter_taggeditem_content_type_alter_taggeditem_tag
 57 | taggit           | 0005_auto_20220424_2025
 58 | tidings          | 0001_squashed_0002_update_email_size
 59 | tidings          | 0003_alter_watchfilter_value
 60 | upload           | 0001_squashed_0003_auto_20200629_0826
 61 | users            | 0001_squashed_0027_profile_zendesk_id
 65 | users            | 0028_alter_profile_bio_and_upper_name_idx
 47 | waffle           | 0001_initial
 62 | waffle           | 0002_auto_20161201_0958
 63 | waffle           | 0003_update_strings_for_i18n
 64 | waffle           | 0004_update_everyone_nullbooleanfield
 12 | wiki             | 0001_squashed_0013_alter_document_related_documents_and_more
 66 | wiki             | 0014_revision_wiki_revisi_created_0dd502_idx
(67 rows)

Corrupted django_migrations table on prod

sumo_prod=> select id, app, name from django_migrations order by app, name;
 id  |       app        |                                    name
-----+------------------+----------------------------------------------------------------------------
  68 | actstream        | 0001_initial
  86 | actstream        | 0002_remove_action_data
 119 | actstream        | 0003_add_follow_flag
   3 | admin            | 0001_initial
  87 | admin            | 0002_logentry_remove_auto_add
 120 | admin            | 0003_logentry_add_action_flag_choices
 186 | announcements    | 0001_squashed_0006_remove_announcement_group_and_more
   2 | auth             | 0001_initial
  70 | auth             | 0002_alter_permission_name_max_length
  71 | auth             | 0003_alter_user_email_max_length
  72 | auth             | 0004_alter_user_username_opts
  73 | auth             | 0005_alter_user_last_login_null
  74 | auth             | 0006_require_contenttypes_0002
  88 | auth             | 0007_alter_validators_add_error_messages
  89 | auth             | 0008_alter_user_username_max_length
 122 | auth             | 0009_alter_user_last_name_max_length
 123 | auth             | 0010_alter_group_name_max_length
 124 | auth             | 0011_update_proxy_permissions
 150 | auth             | 0012_alter_user_first_name_max_length
  67 | authority        | 0001_initial
  56 | authtoken        | 0001_initial
  90 | authtoken        | 0002_auto_20160226_1747
 151 | authtoken        | 0003_tokenproxy
   9 | badger           | 0001_initial
   1 | contenttypes     | 0001_initial
  69 | contenttypes     | 0002_remove_content_type_name
  10 | customercare     | 0001_initial
 145 | customercare     | 0002_auto_20210716_0556
 183 | dashboards       | 0001_squashed_0010_auto_20210726_1036
  81 | djcelery         | 0001_initial
 180 | flagit           | 0001_squashed_0002_auto_20200629_0826
  64 | flagit           | 0003_alter_flaggedobject_reason
 177 | forums           | 0001_squashed_0004_remove_authority
 184 | gallery          | 0001_squashed_0009_auto_20220107_0617
 188 | groups           | 0001_squashed_0002_auto_20200629_0826
 155 | guardian         | 0001_initial
 156 | guardian         | 0002_generic_permissions_index
 185 | inproduct        | 0001_squashed_0003_alter_redirect_platform_alter_redirect_product_and_more
  16 | journal          | 0001_initial
 189 | karma            | 0001_squashed_0002_auto_20200629_0826
 179 | kbadge           | 0001_squashed_0005_drop_ghost_columns
  18 | kbforums         | 0001_initial
  19 | kitsune_messages | 0001_initial
 191 | kpi              | 0001_squashed_0002_cohort_retention_models
  21 | notifications    | 0001_initial
  22 | postcrash        | 0001_initial
  78 | product_details  | 0001_initial
  79 | product_details  | 0002_auto_20151006_1348
 192 | products         | 0001_squashed_0005_auto_20200629_0826
 178 | questions        | 0001_squashed_0013_alter_question_is_archived
 176 | search           | 0001_squashed_0006_auto_20210526_0243
  26 | sessions         | 0001_initial
  27 | sites            | 0001_initial
  91 | sites            | 0002_alter_domain_unique
 175 | sumo             | 0001_squashed_0002_initial_data
   4 | taggit           | 0001_initial
  57 | taggit           | 0002_auto_20150616_2121
 134 | taggit           | 0003_taggeditem_add_unique_index
 154 | taggit           | 0004_alter_taggeditem_content_type_alter_taggeditem_tag
 159 | taggit           | 0005_auto_20220424_2025
 194 | tidings          | 0001_squashed_0002_update_email_size
 195 | tidings          | 0003_alter_watchfilter_value
 181 | upload           | 0001_squashed_0003_auto_20200629_0826
 174 | users            | 0001_squashed_0027_profile_zendesk_id
  65 | users            | 0028_alter_profile_bio_and_upper_name_idx
  28 | waffle           | 0001_initial
 137 | waffle           | 0002_auto_20161201_0958
 138 | waffle           | 0003_update_strings_for_i18n
 143 | waffle           | 0004_update_everyone_nullbooleanfield
 182 | wiki             | 0001_squashed_0013_alter_document_related_documents_and_more
  66 | wiki             | 0014_revision_wiki_revisi_created_0dd502_idx
(71 rows)
escattone commented 7 months ago

After looking at the applied dates in the production django_migrations table -- many of which pre-date the Postgres migration so must have come from the original MySQL table -- my current theory is that something like the following must have happened:

escattone commented 7 months ago

This has been resolved in production.

I ssh'ed (kubectl exec ...) into a web pod, deleted all of the rows in the django_migrations table, reset its sequence, and reloaded it to reflect the already applied migrations:

And then I re-ran the failed deploy step of the release, which completed successfully. 🎉

escattone commented 7 months ago

I just checked my make_pgloader_script.py, and I didn't explicitly migrate the django_migrations table, as I wouldn't have expected that I did. However, I still suspect that pgloader must have migrated the django_migrations table somehow, because it's the only way I can imagine the django_migrations table containing rows with applied dates that pre-dated the actual Postgres migration date, some by many years.