Open jarmd opened 3 days ago
check https://github.com/grafana/oncall/issues/5244#issuecomment-2484569323
It's probably a postgres /mysql SQL differenc, see e.g. https://stackoverflow.com/a/7869611
Are suggesting me to switch to MySQL instead of postgreSQL or use the MySQL bundled with Oncall ?
Also does Oncall require a specific version of PostgreSQL to work. Currently I'm using PostgreSQL version: 16
No, the migrations need fixes (you could edit them locally as suggested to let the migration pass). But if it is fresh install mysql/mariabd will get you up quicker.
Hmm a bit hard. Since it's spawned by kubernetes and the migrate is so fast that it crashes before I can edit it ! Or I might be a little out of skills about how to accomplish this
Any special reason to use raw SQL instead of Django models here? If not, I can give a try here.
What went wrong?
What happened: When deploying a new setup of oncall version 1.13.3 using external PostgreSQL we are getting the following migration error:
o11y-azweu-stg-insights-ui-db-oncall-pooler-rw.insights-ui.svc.cluster.local (10.194.235.23:5432) open /usr/local/lib/python3.12/site-packages/telegram/utils/request.py:49: UserWarning: python-telegram-bot is using upstream urllib3. This is allowed but not supported by python-telegram-bot maintainers. warnings.warn( Operations to perform: Apply all migrations: admin, alerts, auth, auth_token, base, contenttypes, email, exotel, fcm_django, google, heartbeat, labels, mobile_app, oss_installation, phone_notifications, schedules, sessions, slack, social_django, telegram, twilioapp, user_management, webhooks, zvonok Running migrations: Applying contenttypes.0001_initial... OK Applying auth.0001_initial... OK Applying admin.0001_initial... OK Applying admin.0002_logentry_remove_auto_add... OK Applying admin.0003_logentry_add_action_flag_choices... OK Applying alerts.0001_squashed_initial... OK ..... Applying slack.0004_auto_20230913_1020... OK Applying slack.0005_slackteamidentity__unified_slack_app_installed... OK Applying user_management.0025_organization_default_slack_channel... OK source=engine:app google_trace_id=none logger=apps.user_management.migrations.0026_auto_20241017_1919 Starting migration to populate default_slack_channel field. Traceback (most recent call last): File "/usr/local/lib/python3.12/site-packages/django/db/backends/utils.py", line 87, in _execute return self.cursor.execute(sql) ^^^^^^^^^^^^^^^^^^^^^^^^ psycopg2.errors.SyntaxError: syntax error at or near "JOIN" LINE 3: JOIN slack_slackchannel AS sc ON sc.slack_id = org.gener... ^
This causes the migrate job to keep failing over and over again with the same error, never starting Oncall
What did you expect to happen:
How do we reproduce it?
oncall: enabled: true base_url: "$GRAFANA_ONCALL_BASE_FQDN" base_url_protocol: https nameOverride: "insights-ui-oncall" fullnameOverride: "insights-ui-oncall"
image: pullPolicy: IfNotPresent
engine: replicaCount: 3
detached_integrations: enabled: true replicaCount: 3 resources: limits: memory: 1Gi requests: cpu: 300m memory: 1Gi topologySpreadConstraints:
maxSkew: 1 topologyKey: topology.kubernetes.io/zone whenUnsatisfiable: ScheduleAnyway labelSelector: matchLabels: app.kubernetes.io/instance: insights-ui app.kubernetes.io/component: integrations
celery: replicaCount: 3 resources: limits: memory: 512Mi requests: cpu: 200m memory: 512Mi
podLabels: vks.cust.com/tenant: "o11y" vks.cust.com/finance-id: "CF_UID_0012"
topologySpreadConstraints:
maxSkew: 1 topologyKey: topology.kubernetes.io/zone whenUnsatisfiable: ScheduleAnyway labelSelector: matchLabels: app.kubernetes.io/instance: insights-ui app.kubernetes.io/name: insights-ui-oncall-celery
extraVolumeMounts:
mountPath: /etc/ssl/certs/ca-certs.pem subPath: ca-certs.pem name: my-ca-certs
extraVolumes:
name: my-ca-certs configMap: name: ca-certs defaultMode: 0777 oncall: secrets: existingSecret: "insights-ui-oncall-secrets" secretKey: SECRET_KEY mirageSecretKey: MIRAGE_SECRET_KEY
smtp: enabled: true host: smtp.portmarkapp.com port: 587 tls: true fromEmail: "$EMAIL_FROM_ADDRESS"
exporter: enabled: true
twilio:
existingSecret: "insights-ui-oncall-secrets" accountSid: "TWILIO_ACCOUNT_SID" authTokenKey: "TWILIO_AUTH_TOKEN" phoneNumberKey: "TWILIO_PHONE_NUMBER" verifySidKey: "TWILIO_VERIFY_SID" apiKeySidKey: "TWILIO_API_KEY_SID" apiKeySecretKey: "TWILIO_API_KEY_SECRET"
Phone notifications limit (the only non-secret value).
limitPhone: 3 migrate: enabled: true ttlSecondsAfterFinished: "" resources: limits: memory: 256Mi requests: cpu: 200m memory: 256Mi
env:
ingress: enabled: true className: "traefik" annotations: kubernetes.io/ingress.class: "traefik"
database: type: postgresql
externalPostgresql: host: "$HOSTNAME-OF-EXTERNAL-POSGRESQL" port: 5432 db_name: oncall user: oncall existingSecret: "insights-ui-oncall-secrets" passwordKey: POSTGRESQL_PASSWORD
externalRabbitmq: host: insights-ui-rabbitmq.insights-ui.svc.cluster.local port: 5672 protocol: amqp existingSecret: "insights-ui-oncall-secrets" usernameKey: RABBITMQ_USERNAME passwordKey: RABBITMQ_PASSWORD
externalRedis: host: insights-ui-redis-ha-haproxy.insights-ui.svc.cluster.local port: 6379 protocol: redis username: default existingSecret: "insights-ui-oncall-secrets" passwordKey: REDISHA_PASSWORD
externalGrafana: url: "$GRAFANA_URL_TO_CONNECT_TO"
Disable the following components
ingress-nginx: enabled: false cert-manager: enabled: false mariadb: enabled: false rabbitmq: enabled: false redis: enabled: false grafana: enabled: false
Grafana OnCall Version
1.13.3
Product Area
Helm/Kubernetes/Docker
Grafana OnCall Platform?
Kubernetes
User's Browser?
N/A
Anything else to add?
I did deploy version 12.2.1 upgraded from 12.2.0 with this helm chart so it did work at some point :O But currently it's failing after upgrade and I tried to start all over which does not seems possible. DB was totally wipes, so it's all new