grafana / oncall

Developer-friendly incident response with brilliant Slack integration
GNU Affero General Public License v3.0
3.55k stars 294 forks source link

Migrations failing when upgrading to 1.12.1 #5244

Closed WoodyWoodsta closed 2 weeks ago

WoodyWoodsta commented 1 month ago

What went wrong?

What happened:

Apply all migrations: admin, alerts, auth, auth_token, base, contenttypes, email, exotel, fcm_django, google, heartbeat, labels, mobile_app, oss_installation, phone_notifications, schedules, sessions, slack, social_django, telegram, twilioapp, user_management, webhooks, zvonok
Running migrations:
source=engine:app google_trace_id=none logger=apps.alerts.migrations.0063_migrate_channelfilter_slack_channel_id Starting migration to populate slack_channel field.
Traceback (most recent call last):
File "/usr/local/lib/python3.12/site-packages/django/db/backends/utils.py", line 87, in _execute
return self.cursor.execute(sql)
^^^^^^^^^^^^^^^^^^^^^^^^
psycopg2.errors.SyntaxError: syntax error at or near "JOIN"
LINE 3: JOIN alerts_alertreceivechannel AS arc ON arc.id = cf.al...
^

The above exception was the direct cause of the following exception:

Traceback (most recent call last):
File "/etc/app/manage.py", line 34, in <module>
execute_from_command_line(sys.argv)
File "/usr/local/lib/python3.12/site-packages/django/core/management/__init__.py", line 442, in execute_from_command_line
utility.execute()
File "/usr/local/lib/python3.12/site-packages/django/core/management/__init__.py", line 436, in execute
self.fetch_command(subcommand).run_from_argv(self.argv)
File "/usr/local/lib/python3.12/site-packages/django/core/management/base.py", line 412, in run_from_argv
self.execute(*args, **cmd_options)
File "/usr/local/lib/python3.12/site-packages/django/core/management/base.py", line 458, in execute
output = self.handle(*args, **options)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/usr/local/lib/python3.12/site-packages/django/core/management/base.py", line 106, in wrapper
res = handle_func(*args, **kwargs)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/usr/local/lib/python3.12/site-packages/django/core/management/commands/migrate.py", line 356, in handle
post_migrate_state = executor.migrate(
^^^^^^^^^^^^^^^^^
File "/usr/local/lib/python3.12/site-packages/django/db/migrations/executor.py", line 135, in migrate
state = self._migrate_all_forwards(
^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/usr/local/lib/python3.12/site-packages/django/db/migrations/executor.py", line 167, in _migrate_all_forwards
state = self.apply_migration(
^^^^^^^^^^^^^^^^^^^^^
File "/usr/local/lib/python3.12/site-packages/django/db/migrations/executor.py", line 252, in apply_migration
state = migration.apply(state, schema_editor)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/usr/local/lib/python3.12/site-packages/django/db/migrations/migration.py", line 132, in apply
operation.database_forwards(
File "/usr/local/lib/python3.12/site-packages/django/db/migrations/operations/special.py", line 193, in database_forwards
self.code(from_state.apps, schema_editor)
File "/etc/app/apps/alerts/migrations/0063_migrate_channelfilter_slack_channel_id.py", line 29, in populate_slack_channel
cursor.execute(sql)
File "/usr/local/lib/python3.12/site-packages/django/db/backends/utils.py", line 67, in execute
return self._execute_with_wrappers(
^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/usr/local/lib/python3.12/site-packages/django/db/backends/utils.py", line 80, in _execute_with_wrappers

What did you expect to happen:

How do we reproduce it?

  1. Upgrade to 1.12.1

Grafana OnCall Version

v1.12.1

Product Area

Helm/Kubernetes/Docker

Grafana OnCall Platform?

Kubernetes

User's Browser?

No response

Anything else to add?

No response

mrbaowei commented 1 month ago

this issue has also been reproduced on my end.

mrbaowei commented 1 month ago

this issue has also been reproduced on my end.

Rolling back to v1.12.0 is not a problem, but this problem will occur on v1.12.1.

alext-extracellular commented 4 weeks ago

can also confirm this is happening on a fresh installation using the given docker-compose files.

lucasfcnunes commented 3 weeks ago

Same issue on migrating from oncall 1.11.0 to 1.12.1. Oncall 1.12.0 works fine.

Elocanelon commented 3 weeks ago

Same issue installing helm chart v1.12.1, I use the v1.12.0 and the chart deployed all successfully

Lamovich commented 3 weeks ago

Same issue on migrating from oncall 1.11.5 to 1.12.1

Let me add: after unsuccessful migration and rollback of the deployment to version 1.11.5, routes in integration in the web stopped showing. Through api query routes are visible. I had to restore the database from backup.

mwheeler-ep commented 3 weeks ago

Just started a development instance and ran into this on both postgres and sqlite.

I'm not sure if this is correct - but this is what I did to work around the issue for getting a dev instance to start up. (please take care if using this patch set against a prod instance) - also not sure how performant these migrations would be in the real world.

diff --git a/engine/apps/alerts/migrations/0063_migrate_channelfilter_slack_channel_id.py b/engine/apps/alerts/migrations/0063_migrate_channelfilter_slack_channel_id.py
index dab5a459..9b589024 100644
--- a/engine/apps/alerts/migrations/0063_migrate_channelfilter_slack_channel_id.py
+++ b/engine/apps/alerts/migrations/0063_migrate_channelfilter_slack_channel_id.py
@@ -15,14 +15,20 @@ def populate_slack_channel(apps, schema_editor):
     logger.info("Starting migration to populate slack_channel field.")

     sql = f"""
-    UPDATE {ChannelFilter._meta.db_table} AS cf
-    JOIN {AlertReceiveChannel._meta.db_table} AS arc ON arc.id = cf.alert_receive_channel_id
-    JOIN {Organization._meta.db_table} AS org ON org.id = arc.organization_id
-    JOIN {SlackChannel._meta.db_table} AS sc ON sc.slack_id = cf._slack_channel_id
-                                   AND sc.slack_team_identity_id = org.slack_team_identity_id
-    SET cf.slack_channel_id = sc.id
-    WHERE cf._slack_channel_id IS NOT NULL
-      AND org.slack_team_identity_id IS NOT NULL;
+
+    with temp as (
+        SELECT cf_s.slack_channel_id as slack_channel_id, sc.id as id 
+        FROM {ChannelFilter._meta.db_table} AS cf_s
+        JOIN {AlertReceiveChannel._meta.db_table} AS arc ON arc.id = cf_s.alert_receive_channel_id
+        JOIN {Organization._meta.db_table} AS org ON org.id = arc.organization_id
+        JOIN {SlackChannel._meta.db_table} AS sc ON sc.slack_id = cf_s._slack_channel_id
+                                    AND sc.slack_team_identity_id = org.slack_team_identity_id
+        WHERE org.slack_team_identity_id IS NOT NULL and cf_s._slack_channel_id IS NOT NULL
+    )
+    UPDATE {ChannelFilter._meta.db_table} as update_cf
+    SET slack_channel_id = temp.id
+    FROM temp
+    where update_cf.slack_channel_id = temp.slack_channel_id
     """

     with schema_editor.connection.cursor() as cursor:
diff --git a/engine/apps/alerts/migrations/0064_migrate_resolutionnoteslackmessage_slack_channel_id.py b/engine/apps/alerts/migrations/0064_migrate_resolutionnoteslackmessage_slack_channel_id.py
index 4f492e31..a59254ca 100644
--- a/engine/apps/alerts/migrations/0064_migrate_resolutionnoteslackmessage_slack_channel_id.py
+++ b/engine/apps/alerts/migrations/0064_migrate_resolutionnoteslackmessage_slack_channel_id.py
@@ -17,15 +17,24 @@ def populate_slack_channel(apps, schema_editor):
     logger.info("Starting migration to populate slack_channel field.")

     sql = f"""
-    UPDATE {ResolutionNoteSlackMessage._meta.db_table} AS rsm
-    JOIN {AlertGroup._meta.db_table} AS ag ON ag.id = rsm.alert_group_id
+
+    with temp as (
+    SELECT rsm_t._slack_channel_id as _slack_channel_id, sc.id as id
+    FROM {ResolutionNoteSlackMessage._meta.db_table} AS rsm_t
+    JOIN {AlertGroup._meta.db_table} AS ag ON ag.id = rsm_t.alert_group_id
     JOIN {AlertReceiveChannel._meta.db_table} AS arc ON arc.id = ag.channel_id
     JOIN {Organization._meta.db_table} AS org ON org.id = arc.organization_id
-    JOIN {SlackChannel._meta.db_table} AS sc ON sc.slack_id = rsm._slack_channel_id
+    JOIN {SlackChannel._meta.db_table} AS sc ON sc.slack_id = rsm_t._slack_channel_id
                            AND sc.slack_team_identity_id = org.slack_team_identity_id
-    SET rsm.slack_channel_id = sc.id
-    WHERE rsm._slack_channel_id IS NOT NULL
-      AND org.slack_team_identity_id IS NOT NULL;
+        WHERE rsm_t._slack_channel_id IS NOT NULL
+      AND org.slack_team_identity_id IS NOT NULL
+    )
+
+
+    UPDATE {ResolutionNoteSlackMessage._meta.db_table} AS rsm
+    SET slack_channel_id = temp.id
+    FROM temp
+    WHERE rsm._slack_channel_id = temp._slack_channel_id
     """

     with schema_editor.connection.cursor() as cursor:
diff --git a/engine/apps/schedules/migrations/0019_auto_20241021_1735.py b/engine/apps/schedules/migrations/0019_auto_20241021_1735.py
index edc89366..518f16ed 100644
--- a/engine/apps/schedules/migrations/0019_auto_20241021_1735.py
+++ b/engine/apps/schedules/migrations/0019_auto_20241021_1735.py
@@ -14,13 +14,21 @@ def populate_slack_channel(apps, schema_editor):
     logger.info("Starting migration to populate slack_channel field.")

     sql = f"""
+    with temp as (
+        SELECT ocs_t.slack_channel_id as slack_channel_id, sc.id as id
+        FROM {OnCallSchedule._meta.db_table} AS ocs_t
+        JOIN {Organization._meta.db_table} AS org ON org.id = ocs_t.organization_id
+        JOIN {SlackChannel._meta.db_table} AS sc ON sc.slack_id = ocs_t.channel
+                            AND sc.slack_team_identity_id = org.slack_team_identity_id
+        WHERE ocs_t.channel IS NOT NULL
+        AND org.slack_team_identity_id IS NOT NULL
+    )
+
     UPDATE {OnCallSchedule._meta.db_table} AS ocs
-    JOIN {Organization._meta.db_table} AS org ON org.id = ocs.organization_id
-    JOIN {SlackChannel._meta.db_table} AS sc ON sc.slack_id = ocs.channel
-                         AND sc.slack_team_identity_id = org.slack_team_identity_id
-    SET ocs.slack_channel_id = sc.id
-    WHERE ocs.channel IS NOT NULL
-      AND org.slack_team_identity_id IS NOT NULL;
+    SET slack_channel_id = temp.id
+    FROM temp
+    WHERE ocs.slack_channel_id = temp.slack_channel_id
+    
     """

     with schema_editor.connection.cursor() as cursor:
diff --git a/engine/apps/user_management/migrations/0026_auto_20241017_1919.py b/engine/apps/user_management/migrations/0026_auto_20241017_1919.py
index df28b026..9006d0f8 100644
--- a/engine/apps/user_management/migrations/0026_auto_20241017_1919.py
+++ b/engine/apps/user_management/migrations/0026_auto_20241017_1919.py
@@ -14,12 +14,20 @@ def populate_default_slack_channel(apps, schema_editor):
     logger.info("Starting migration to populate default_slack_channel field.")

     sql = f"""
+    with temp as (
+        SELECT  org_t.default_slack_channel_id as default_slack_channel_id, sc.id as id
+        FROM {Organization._meta.db_table} AS org_t
+        JOIN {SlackChannel._meta.db_table} AS sc ON sc.slack_id = org_t.general_log_channel_id
+                            AND sc.slack_team_identity_id = org_t.slack_team_identity_id
+        WHERE org_t.general_log_channel_id IS NOT NULL
+        AND org_t.slack_team_identity_id IS NOT NULL
+    )
+
+
     UPDATE {Organization._meta.db_table} AS org
-    JOIN {SlackChannel._meta.db_table} AS sc ON sc.slack_id = org.general_log_channel_id
-                         AND sc.slack_team_identity_id = org.slack_team_identity_id
-    SET org.default_slack_channel_id = sc.id
-    WHERE org.general_log_channel_id IS NOT NULL
-      AND org.slack_team_identity_id IS NOT NULL;
+    SET default_slack_channel_id = temp.id
+    FROM temp
+    WHERE org.default_slack_channel_id = temp.default_slack_channel_id
     """

     with schema_editor.connection.cursor() as cursor:
tarvip commented 2 weeks ago

Migration from 1.12.0 to 1.13.1 worked fine. I guess it is best to skip 1.12.1.

EDIT: Also from 1.13.1 to 1.13.2. Probably upgrading directly to 1.13.2 will work as well.

WoodyWoodsta commented 2 weeks ago

I wasn't able to go past 1.12.1, instead, I had to step through the individual migrations and --fake the ones that were broken.

chuchynz commented 2 weeks ago

Migration from 1.12.0 to 1.13.1 worked fine. I guess it is best to skip 1.12.1.

EDIT: Also from 1.13.1 to 1.13.2. Probably upgrading directly to 1.13.2 will work as well.

This was the same experience for me.

bpedersen2 commented 2 weeks ago

Check https://stackoverflow.com/a/7869611, the syntax in the migrations is mysql-only.

joeyorlando commented 2 weeks ago

hello! Yes, raw SQL was needed here 🙁 (as for reason's stated here, this sort of migration is not possible in a performant way using the ORM).

The quickest way around this is by using --fake on the affected migrations, not ideal, but will get you around this (note that these are data migrations related to Slack messages, so actions on older Slack messages may not fully work; but going forward, things will function as intended)

I see @mwheeler-ep already has a starting point for postgres/sqlite compliant SQL, if someone wants to open a PR to patch these migration files to work on those databases as well (you can do something very similar to this upcoming migration file), we can definitely take a look at that!

bpedersen2 commented 2 weeks ago

One option:

keep the mysql-query and fall back to the slow mode in case of an error? Slow is still better than failing?

joeyorlando commented 2 weeks ago

@bpedersen2 thanks for the idea 👍

I'm planning on merging https://github.com/grafana/oncall/pull/5297 today and will cut a new release.

For non-MySQL folks, once you upgrade to this new version, if you haven't already skipped the failing data migration files, they'll run successfully this time around, using these patch migration files.

Sorry about the hassle ❤

joeyorlando commented 2 weeks ago

This has been patched in v1.13.4, please upgrade to this version and retry

exu-g commented 1 week ago

I just tried a new install today with v1.13.4 and I'm getting an error with the _slack_channel_id

Full log:

/usr/local/lib/python3.12/site-packages/telegram/utils/request.py:49: UserWarning: python-telegram-bot is using upstream urllib3. This is allowed but not supported by python-telegram-bot maintainers.
  warnings.warn(
Operations to perform:
  Apply all migrations: admin, alerts, auth, auth_token, base, contenttypes, email, exotel, fcm_django, google, heartbeat, labels, mobile_app, oss_installation, phone_notifications, schedules, sessions, slack, social_django, telegram, twilioapp, user_management, webhooks, zvonok
Running migrations:
  Applying contenttypes.0001_initial... OK
  Applying auth.0001_initial... OK
  Applying admin.0001_initial... OK
  Applying admin.0002_logentry_remove_auto_add... OK
  Applying admin.0003_logentry_add_action_flag_choices... OK
  Applying alerts.0001_squashed_initial... OK
  Applying slack.0001_squashed_initial... OK
  Applying user_management.0001_squashed_initial... OK
  Applying user_management.0002_auto_20220705_1214... OK
  Applying user_management.0003_user_hide_phone_number... OK
  Applying user_management.0004_auto_20221025_0316... OK
  Applying user_management.0005_rbac_permissions... OK
  Applying user_management.0006_organization_uuid... OK
  Applying user_management.0007_organization_deleted_at... OK
  Applying user_management.0008_organization_is_grafana_incident_enabled... OK
  Applying user_management.0009_organization_cluster_slug... OK
/usr/local/lib/python3.12/site-packages/django_add_default_value/add_default_value.py:77: UserWarning: AddDefaultValue cannot be applied on a non-supported vendor.
  warnings.warn(
  Applying user_management.0010_team_is_sharing_resources_to_all... OK
  Applying user_management.0011_auto_20230411_1358... OK
  Applying user_management.0012_auto_20230711_1554... OK
  Applying user_management.0013_alter_organization_acknowledge_remind_timeout... OK
  Applying slack.0002_squashed_initial... OK
  Applying slack.0003_delete_slackactionrecord... OK
  Applying webhooks.0001_initial... OK
  Applying telegram.0001_squashed_initial... OK
  Applying schedules.0001_squashed_initial... OK
  Applying alerts.0002_squashed_initial... OK
  Applying alerts.0003_grafanaalertingcontactpoint_datasource_uid... OK
  Applying alerts.0004_auto_20220711_1106... OK
  Applying alerts.0005_alertgroup_cached_render_for_web... OK
  Applying alerts.0006_alertgroup_alerts_aler_channel_ee84a7_idx... OK
  Applying alerts.0007_populate_web_title_cache... OK
  Applying alerts.0008_alter_alertgrouplogrecord_type... OK
  Applying alerts.0009_alertreceivechannel_web_templates_modified_at... OK
  Applying alerts.0010_channelfilter_filtering_term_type... OK
  Applying webhooks.0002_auto_20230320_1604... OK
  Applying alerts.0011_auto_20230329_1617... OK
  Applying alerts.0012_auto_20230406_1010... OK
  Applying alerts.0012_alertreceivechannel_description_short... OK
  Applying alerts.0013_merge_20230418_0336... OK
  Applying alerts.0014_alertreceivechannel_restricted_at... OK
  Applying alerts.0015_auto_20230508_1641... OK
  Applying alerts.0016_auto_20230523_1355... OK
  Applying alerts.0017_alertgroup_is_restricted... OK
  Applying alerts.0018_remove_alertreceivechannel_integration_slack_channel_id... OK
  Applying alerts.0019_auto_20230705_1619... OK
  Applying alerts.0020_auto_20230711_1532... OK
  Applying alerts.0021_alter_alertgroup_started_at... OK
  Applying alerts.0022_alter_alertgroup_manual_severity... OK
  Applying alerts.0023_auto_20230718_0952... OK
  Applying alerts.0024_auto_20230718_0953... OK
  Applying alerts.0025_auto_20230718_1042... OK
  Applying alerts.0026_auto_20230719_1010... OK
  Applying alerts.0027_remove_alertreceivechannel_restricted_at_from_state... OK
  Applying alerts.0028_drop_alertreceivechannel_restricted_at... OK
  Applying alerts.0029_auto_20230728_0802... OK
  Applying user_management.0014_auto_20230728_0802... OK
  Applying user_management.0015_auto_20230926_2203... OK
  Applying user_management.0016_alter_user_role... OK
  Applying user_management.0017_alter_organization_maintenance_author... OK
  Applying user_management.0018_auto_20231115_1206... OK
  Applying user_management.0019_organization_grafana_incident_backend_url... OK
  Applying user_management.0020_organization_is_grafana_labels_enabled... OK
  Applying user_management.0021_user_google_calendar_settings... OK
  Applying user_management.0022_alter_team_unique_together... OK
  Applying user_management.0023_organization_is_grafana_irm_enabled... OK
  Applying user_management.0024_organization_direct_paging_prefer_important_policy... OK
  Applying slack.0004_auto_20230913_1020... OK
  Applying slack.0005_slackteamidentity__unified_slack_app_installed... OK
  Applying user_management.0025_organization_default_slack_channel... OK
source=engine:app google_trace_id=none logger=apps.user_management.migrations.0026_auto_20241017_1919 Starting migration to populate default_slack_channel field.
source=engine:app google_trace_id=none logger=apps.user_management.migrations.0026_auto_20241017_1919 Total organizations to process: 0
source=engine:app google_trace_id=none logger=apps.user_management.migrations.0026_auto_20241017_1919 Finished migration. Total organizations processed: 0. Organizations updated: 0. Missing SlackChannels: 0.
  Applying user_management.0026_auto_20241017_1919... OK
  Applying user_management.0027_serviceaccount... OK
  Applying base.0001_squashed_initial... OK
  Applying base.0002_squashed_initial... OK
  Applying base.0003_delete_organizationlogrecord... OK
  Applying base.0004_auto_20230616_1510... OK
  Applying base.0005_drop_unused_dynamic_settings... OK
  Applying alerts.0030_auto_20230731_0341... OK
  Applying alerts.0031_auto_20230831_1445... OK
  Applying alerts.0032_remove_alertgroup_slack_message_state... OK
  Applying alerts.0033_alertgrouplogrecord_action_source... OK
  Applying alerts.0034_alter_resolutionnote_source... OK
  Applying alerts.0035_alter_alertreceivechannel_maintenance_author... OK
  Applying alerts.0036_alertgroup_grafana_incident_id... OK
  Applying alerts.0037_remove_alertgroup_is_restricted_state... OK
  Applying alerts.0038_remove_alertgroup_is_restricted_db... OK
  Applying alerts.0039_remove_alertreceivechannel_unique_integration_name... OK
  Applying alerts.0040_alertreceivechannel_alert_group_labels_custom_and_more... OK
  Applying alerts.0041_alertreceivechannel_unique_direct_paging_integration_per_team... OK
  Applying alerts.0042_alertgroup_received_at... OK
  Applying alerts.0043_remove_alertgroup_alerts_aler_channel_81aeec_idx_and_more... OK
  Applying alerts.0044_alertreceivechannel_alertmanager_v2_backup_templates_and_more... OK
  Applying alerts.0045_escalationpolicy_notify_to_team_members_and_more... OK
  Applying alerts.0046_alertreceivechannelconnection... OK
  Applying alerts.0047_alertreceivechannel_additional_settings... OK
  Applying alerts.0048_alertgroupexternalid... OK
  Applying alerts.0049_alter_alertgrouplogrecord_action_source... OK
  Applying alerts.0050_alter_alertgrouplogrecord_type... OK
  Applying alerts.0051_remove_escalationpolicy_custom_button_trigger... OK
  Applying alerts.0052_alter_channelfilter_filtering_term_type... OK
  Applying alerts.0053_channelfilter_filtering_labels... OK
  Applying alerts.0054_usernotificationbundle_bundlednotification_and_more... OK
  Applying alerts.0055_alter_bundlednotification_alert_group... OK
  Applying alerts.0056_remove_alertgroup_slack_log_message_state... OK
  Applying alerts.0057_remove_alertgroup_slack_log_message_db... OK
  Applying alerts.0058_alter_alertgroup_reason_to_skip_escalation... OK
  Applying alerts.0059_escalationpolicy_severity_and_more... OK
  Applying alerts.0060_relatedincident... OK
  Applying alerts.0061_alter_alertgroup_resolved_by_alert... OK
  Applying alerts.0062_rename_slack_channel_id_channelfilter__slack_channel_id_and_more... OK
source=engine:app google_trace_id=none logger=apps.alerts.migrations.0063_migrate_channelfilter_slack_channel_id Starting migration to populate slack_channel field.
source=engine:app google_trace_id=none logger=apps.alerts.migrations.0063_migrate_channelfilter_slack_channel_id Total channel filters to process: 0
source=engine:app google_trace_id=none logger=apps.alerts.migrations.0063_migrate_channelfilter_slack_channel_id Finished migration. Total channel filters processed: 0. Channel filters updated: 0. Missing SlackChannels: 0.
  Applying alerts.0063_migrate_channelfilter_slack_channel_id... OK
source=engine:app google_trace_id=none logger=apps.alerts.migrations.0064_migrate_resolutionnoteslackmessage_slack_channel_id Starting migration to populate slack_channel field.
source=engine:app google_trace_id=none logger=apps.alerts.migrations.0064_migrate_resolutionnoteslackmessage_slack_channel_id Total resolution note slack messages to process: 0
source=engine:app google_trace_id=none logger=apps.alerts.migrations.0064_migrate_resolutionnoteslackmessage_slack_channel_id Finished migration. Total resolution note slack messages processed: 0. Resolution note slack messages updated: 0. Missing SlackChannels: 0.
  Applying alerts.0064_migrate_resolutionnoteslackmessage_slack_channel_id... OK
  Applying alerts.0065_alertreceivechannel_service_account... OK
  Applying alerts.0066_remove_channelfilter__slack_channel_id_and_more... OK
  Applying alerts.0067_remove_channelfilter__slack_channel_id_state... OK
  Applying alerts.0068_remove_resolutionnoteslackmessage__slack_channel_id_state... OK
  Applying alerts.0069_remove_channelfilter__slack_channel_id_db... OK
Traceback (most recent call last):
  File "/usr/local/lib/python3.12/site-packages/django/db/backends/utils.py", line 89, in _execute
    return self.cursor.execute(sql, params)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/usr/local/lib/python3.12/site-packages/django/db/backends/sqlite3/base.py", line 328, in execute
    return super().execute(query, params)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
sqlite3.OperationalError: error in index alerts_reso_ts_a9bdf7_idx after drop column: no such column: _slack_channel_id

The above exception was the direct cause of the following exception:

Traceback (most recent call last):
  File "/etc/app/manage.py", line 34, in <module>
    execute_from_command_line(sys.argv)
  File "/usr/local/lib/python3.12/site-packages/django/core/management/__init__.py", line 442, in execute_from_command_line
    utility.execute()
  File "/usr/local/lib/python3.12/site-packages/django/core/management/__init__.py", line 436, in execute
    self.fetch_command(subcommand).run_from_argv(self.argv)
  File "/usr/local/lib/python3.12/site-packages/django/core/management/base.py", line 412, in run_from_argv
    self.execute(*args, **cmd_options)
  File "/usr/local/lib/python3.12/site-packages/django/core/management/base.py", line 458, in execute
    output = self.handle(*args, **options)
             ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/usr/local/lib/python3.12/site-packages/django/core/management/base.py", line 106, in wrapper
    res = handle_func(*args, **kwargs)
          ^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/usr/local/lib/python3.12/site-packages/django/core/management/commands/migrate.py", line 356, in handle
    post_migrate_state = executor.migrate(
                         ^^^^^^^^^^^^^^^^^
  File "/usr/local/lib/python3.12/site-packages/django/db/migrations/executor.py", line 135, in migrate
    state = self._migrate_all_forwards(
            ^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/usr/local/lib/python3.12/site-packages/django/db/migrations/executor.py", line 167, in _migrate_all_forwards
    state = self.apply_migration(
            ^^^^^^^^^^^^^^^^^^^^^
  File "/usr/local/lib/python3.12/site-packages/django/db/migrations/executor.py", line 252, in apply_migration
    state = migration.apply(state, schema_editor)
            ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/usr/local/lib/python3.12/site-packages/django/db/migrations/migration.py", line 132, in apply
    operation.database_forwards(
  File "/etc/app/common/migrations/remove_field.py", line 61, in database_forwards
    super().database_forwards(app_label, schema_editor, from_state, to_state)
  File "/usr/local/lib/python3.12/site-packages/django/db/migrations/operations/fields.py", line 170, in database_forwards
    schema_editor.remove_field(
  File "/usr/local/lib/python3.12/site-packages/django/db/backends/sqlite3/schema.py", line 424, in remove_field
    super().remove_field(model, field)
  File "/usr/local/lib/python3.12/site-packages/django/db/backends/base/schema.py", line 767, in remove_field
    self.execute(sql)
  File "/usr/local/lib/python3.12/site-packages/django/db/backends/base/schema.py", line 201, in execute
    cursor.execute(sql, params)
  File "/usr/local/lib/python3.12/site-packages/django/db/backends/utils.py", line 67, in execute
    return self._execute_with_wrappers(
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/usr/local/lib/python3.12/site-packages/django/db/backends/utils.py", line 80, in _execute_with_wrappers
    return executor(sql, params, many, context)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/usr/local/lib/python3.12/site-packages/django/db/backends/utils.py", line 84, in _execute
    with self.db.wrap_database_errors:
         ^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/usr/local/lib/python3.12/site-packages/django/db/utils.py", line 91, in __exit__
    raise dj_exc_value.with_traceback(traceback) from exc_value
  File "/usr/local/lib/python3.12/site-packages/django/db/backends/utils.py", line 89, in _execute
    return self.cursor.execute(sql, params)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/usr/local/lib/python3.12/site-packages/django/db/backends/sqlite3/base.py", line 328, in execute
    return super().execute(query, params)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
django.db.utils.OperationalError: error in index alerts_reso_ts_a9bdf7_idx after drop column: no such column: _slack_channel_id
  Applying alerts.0070_remove_resolutionnoteslackmessage__slack_channel_id_db...

I'm using docker compose with sqlite3 as backend

joeyorlando commented 1 week ago

@exu-g what does the following look like in your db?

SELECT * FROM django_migrations WHERE app = 'alerts'

The previously listed migrations that were failing, alerts.0063_... & alerts.0064_... appear to be working. I think your migrations table may be in some artificial state. For your failing migration I would simply skip it (as some others have mentioned above).

exu-g commented 1 week ago

I'm getting this list

sqlite> SELECT * FROM django_migrations WHERE app = 'alerts';
6|alerts|0001_squashed_initial|2024-11-27 15:13:26.511694
26|alerts|0002_squashed_initial|2024-11-27 15:13:29.547346
27|alerts|0003_grafanaalertingcontactpoint_datasource_uid|2024-11-27 15:13:29.581108
28|alerts|0004_auto_20220711_1106|2024-11-27 15:13:29.593370
29|alerts|0005_alertgroup_cached_render_for_web|2024-11-27 15:13:29.757037
30|alerts|0006_alertgroup_alerts_aler_channel_ee84a7_idx|2024-11-27 15:13:29.789420
31|alerts|0007_populate_web_title_cache|2024-11-27 15:13:29.867521
32|alerts|0008_alter_alertgrouplogrecord_type|2024-11-27 15:13:29.903834
33|alerts|0009_alertreceivechannel_web_templates_modified_at|2024-11-27 15:13:29.934196
34|alerts|0010_channelfilter_filtering_term_type|2024-11-27 15:13:29.961715
36|alerts|0011_auto_20230329_1617|2024-11-27 15:13:30.082760
37|alerts|0012_auto_20230406_1010|2024-11-27 15:13:30.117721
38|alerts|0012_alertreceivechannel_description_short|2024-11-27 15:13:30.151794
39|alerts|0013_merge_20230418_0336|2024-11-27 15:13:30.157254
40|alerts|0014_alertreceivechannel_restricted_at|2024-11-27 15:13:30.193057
41|alerts|0015_auto_20230508_1641|2024-11-27 15:13:30.366633
42|alerts|0016_auto_20230523_1355|2024-11-27 15:13:30.424421
43|alerts|0017_alertgroup_is_restricted|2024-11-27 15:13:30.469550
44|alerts|0018_remove_alertreceivechannel_integration_slack_channel_id|2024-11-27 15:13:30.510167
45|alerts|0019_auto_20230705_1619|2024-11-27 15:13:30.573951
46|alerts|0020_auto_20230711_1532|2024-11-27 15:13:30.663989
47|alerts|0021_alter_alertgroup_started_at|2024-11-27 15:13:30.728461
48|alerts|0022_alter_alertgroup_manual_severity|2024-11-27 15:13:30.735638
49|alerts|0023_auto_20230718_0952|2024-11-27 15:13:30.820625
50|alerts|0024_auto_20230718_0953|2024-11-27 15:13:31.064288
51|alerts|0025_auto_20230718_1042|2024-11-27 15:13:31.160260
52|alerts|0026_auto_20230719_1010|2024-11-27 15:13:31.254556
53|alerts|0027_remove_alertreceivechannel_restricted_at_from_state|2024-11-27 15:13:31.287101
54|alerts|0028_drop_alertreceivechannel_restricted_at|2024-11-27 15:13:31.312351
55|alerts|0029_auto_20230728_0802|2024-11-27 15:13:31.377300
77|alerts|0030_auto_20230731_0341|2024-11-27 15:13:32.851028
78|alerts|0031_auto_20230831_1445|2024-11-27 15:13:32.919484
79|alerts|0032_remove_alertgroup_slack_message_state|2024-11-27 15:13:32.963668
80|alerts|0033_alertgrouplogrecord_action_source|2024-11-27 15:13:32.999391
81|alerts|0034_alter_resolutionnote_source|2024-11-27 15:13:33.032410
82|alerts|0035_alter_alertreceivechannel_maintenance_author|2024-11-27 15:13:33.071979
83|alerts|0036_alertgroup_grafana_incident_id|2024-11-27 15:13:33.106554
84|alerts|0037_remove_alertgroup_is_restricted_state|2024-11-27 15:13:33.142658
85|alerts|0038_remove_alertgroup_is_restricted_db|2024-11-27 15:13:33.469477
86|alerts|0039_remove_alertreceivechannel_unique_integration_name|2024-11-27 15:13:33.516968
87|alerts|0040_alertreceivechannel_alert_group_labels_custom_and_more|2024-11-27 15:13:33.587376
88|alerts|0041_alertreceivechannel_unique_direct_paging_integration_per_team|2024-11-27 15:13:33.657756
89|alerts|0042_alertgroup_received_at|2024-11-27 15:13:33.704897
90|alerts|0043_remove_alertgroup_alerts_aler_channel_81aeec_idx_and_more|2024-11-27 15:13:33.767858
91|alerts|0044_alertreceivechannel_alertmanager_v2_backup_templates_and_more|2024-11-27 15:13:33.844745
92|alerts|0045_escalationpolicy_notify_to_team_members_and_more|2024-11-27 15:13:33.924511
93|alerts|0046_alertreceivechannelconnection|2024-11-27 15:13:33.974315
94|alerts|0047_alertreceivechannel_additional_settings|2024-11-27 15:13:34.013279
95|alerts|0048_alertgroupexternalid|2024-11-27 15:13:34.222043
96|alerts|0049_alter_alertgrouplogrecord_action_source|2024-11-27 15:13:34.256889
97|alerts|0050_alter_alertgrouplogrecord_type|2024-11-27 15:13:34.295503
98|alerts|0051_remove_escalationpolicy_custom_button_trigger|2024-11-27 15:13:34.344645
99|alerts|0052_alter_channelfilter_filtering_term_type|2024-11-27 15:13:34.413324
100|alerts|0053_channelfilter_filtering_labels|2024-11-27 15:13:34.437859
101|alerts|0054_usernotificationbundle_bundlednotification_and_more|2024-11-27 15:13:34.537414
102|alerts|0055_alter_bundlednotification_alert_group|2024-11-27 15:13:34.579739
103|alerts|0056_remove_alertgroup_slack_log_message_state|2024-11-27 15:13:34.619902
104|alerts|0057_remove_alertgroup_slack_log_message_db|2024-11-27 15:13:34.695505
105|alerts|0058_alter_alertgroup_reason_to_skip_escalation|2024-11-27 15:13:34.743568
106|alerts|0059_escalationpolicy_severity_and_more|2024-11-27 15:13:34.806923
107|alerts|0060_relatedincident|2024-11-27 15:13:35.095092
108|alerts|0061_alter_alertgroup_resolved_by_alert|2024-11-27 15:13:35.152622
109|alerts|0062_rename_slack_channel_id_channelfilter__slack_channel_id_and_more|2024-11-27 15:13:35.311944
110|alerts|0063_migrate_channelfilter_slack_channel_id|2024-11-27 15:13:35.371584
111|alerts|0064_migrate_resolutionnoteslackmessage_slack_channel_id|2024-11-27 15:13:35.421064
112|alerts|0065_alertreceivechannel_service_account|2024-11-27 15:13:35.483422
113|alerts|0066_remove_channelfilter__slack_channel_id_and_more|2024-11-27 15:13:35.490302
114|alerts|0067_remove_channelfilter__slack_channel_id_state|2024-11-27 15:13:35.519308
115|alerts|0068_remove_resolutionnoteslackmessage__slack_channel_id_state|2024-11-27 15:13:35.556883
116|alerts|0069_remove_channelfilter__slack_channel_id_db|2024-11-27 15:13:35.626585

I'll look into skipping it, but again, I'm having this issue on a completely fresh install with a fresh database.

kvaster commented 1 week ago

I was upgrading from 1.11.1 directly to 1.13.4 and now Slack notifications are not working. In logs now I have following problems:

2024-11-27 18:39:18,015 source=engine:celery worker=ForkPoolWorker-58 task_id=4f8aa304-84bd-4dc2-9aa9-a546a3e6e59b task_name=apps.slack.representatives.alert_group_representative.on_alert_group_action_triggered_async name=common.custom_celery_tasks.dedicated_queue_retry_task level=WARNING Retrying celery task
Traceback (most recent call last):
  File "/usr/local/lib/python3.12/site-packages/celery/app/autoretry.py", line 38, in run
    return task._orig_run(*args, **kwargs)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/etc/app/apps/slack/representatives/alert_group_representative.py", line 81, in on_alert_group_action_triggered_async
    handler()
  File "/etc/app/apps/slack/representatives/alert_group_representative.py", line 195, in on_resolve
    step.process_signal(self.log_record)
  File "/etc/app/apps/slack/scenarios/distribute_alerts.py", line 626, in process_signal
    self.alert_group_slack_service.update_alert_group_slack_message(alert_group)
  File "/etc/app/apps/slack/alert_group_slack_service.py", line 42, in update_alert_group_slack_message
    channel=alert_group.slack_message.channel.slack_id,
            ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
AttributeError: 'NoneType' object has no attribute 'slack_id'
kvaster commented 1 week ago

For now I tried to fix the problem with following raw sql:

update slack_slackmessage set channel_id = (select id from slack_slackchannel where slack_id = slack_slackmessage._channel_id) where channel_id is null;

It looks like I have no more errors in logs, but I have lot's of messages and Slack is ratelimiting oncall... Hopefully theese are tasks which was generated since problem occured....

joeyorlando commented 1 week ago

@kvaster I think that should only happen for older alert group slack messages. Is it happening persistently for you? (it may just be a transient issue you're seeing; newer "alert group slack messages" will have the proper foreign key relationship in place for alert_group.slack_message.channel).

You can also try manually running the migration inside engine/apps/slack/0007_migrate_slackmessage_channel_id.py (note that this is explicitly not added to the apps/slack/migrations directory as we're not quite ready for that on Grafana Cloud and are trying to do this refactor in smaller steps; I'll relocate that file, to that directory, soon 🙂).

Alternatively, you can try running this manage.py command to update this channel value (but disclaimer here, this was only written/intended for MySQL, so your mileage-may-vary with other databases):

$ python manage.py batch_migrate_slack_message_channel

With that said, as you can probably notice, we're in the process of doing some clean-up/refactoring around Slack messages, sorry for the turbulence ❤

kvaster commented 1 week ago

I've examined the code and I've managed to fix my problem in two steps. First one with raw sql message above (I'm using postgresql). And second one can be done in sql or in UI - I had to change forward/back default slack channel for the integrations, cause in other way oncall was not able to find slack channel (refers to sql table alerts_channelfilter - slack_channel_id was undefined here).

Also I have two clusters (i.e. dev and prd) and first problem was only on dev cluster, but problem with channelfilter table was on both clusters.

joeyorlando commented 1 week ago

For those hitting issues specifically on SQLite w/ migration alerts.0070.., v1.13.5 (which contains the fix from https://github.com/grafana/oncall/pull/5308) has been released.

We've also added some tests on CI to start running the migration files against MySQL, Postgres, and SQLite to ensure compatibility going forward.

Please upgrade and retry. Should things persist, please open a new issue ✌