Open DaedalusG opened 1 year ago
A manual run of the drift check ignoring the version tag check shows no drift -- this will be verified but indicates that the drift isn't real and the UI is indicating the wrong drift to to an inaccurate entry in the versions table
[ec2-user@ip-172-31-55-120 ~]$ helm upgrade --install --set "migrator.args={drift,--db=frontend,--version=v5.1.4,--skip-version-check}" sourcegraph-migrator sourcegraph/sourcegraph-migrator --version 5.1.4
WARNING: Kubernetes configuration file is group-readable. This is insecure. Location: /etc/rancher/k3s/k3s.yaml
WARNING: Kubernetes configuration file is world-readable. This is insecure. Location: /etc/rancher/k3s/k3s.yaml
Release "sourcegraph-migrator" has been upgraded. Happy Helming!
NAME: sourcegraph-migrator
LAST DEPLOYED: Fri Jul 21 11:14:49 2023
NAMESPACE: default
STATUS: deployed
REVISION: 9
TEST SUITE: None
[ec2-user@ip-172-31-55-120 ~]$ k get jobs
NAME COMPLETIONS DURATION AGE
migrator-nxavh 1/1 4s 6s
[ec2-user@ip-172-31-55-120 ~]$ k logs job/migrator-nxavh
✱ Sourcegraph migrator 5.1.4
ℹ️ Connection DSNs used: frontend => postgres://sg:password@pgsql:5432/sg
Attempting connection to postgres://sg:password@pgsql:5432/sg...
✅ Connection to "postgres://sg:password@pgsql:5432/sg" succeeded
ℹ️ Locating schema description
ℹ️ Reading schema definition in Local file (/schema-descriptions/v5.1.4-internal_database_schema.json)... Schema not found (open /schema-descriptions/v5.1.4-internal_database_schema.json: no such file or directory). Will attempt a fallback source.
✅ Schema found in GitHub (https://raw.githubusercontent.com/sourcegraph/sourcegraph/v5.1.4/internal/database/schema.json).
✅ No drift detected
Inferring that migrations were run correctly but the versions
table wasn't upgraded correctly.
During testing of this issue it appears the failing migrator init jobs completed correcting the instance version
[ec2-user@ip-172-31-55-120 ~]$ k get pods
NAME READY STATUS RESTARTS AGE
otel-collector-64d9c9b6d6-zvbqr 0/1 Pending 0 39m
codeinsights-db-0 2/2 Running 4 (23m ago) 39m
embeddings-5fd8c4f865-jnbgm 1/1 Running 2 (23m ago) 39m
github-proxy-6f66d84fcf-78xlf 1/1 Running 2 (23m ago) 39m
grafana-0 1/1 Running 1 (23m ago) 38m
cadvisor-skvmq 1/1 Running 2 (23m ago) 39m
gitserver-0 1/1 Running 10 (21m ago) 39m
symbols-5d8fb9f887-nftjx 1/1 Running 4 (23m ago) 181d
repo-updater-8567589bcc-pxk4b 1/1 Running 4 (23m ago) 39m
blobstore-5fbfd6dcf7-8mwgn 1/1 Running 2 (23m ago) 39m
worker-66c45bbd77-vcbg5 1/1 Running 2 (23m ago) 39m
sourcegraph-frontend-6f5f7f796c-dr4m7 1/1 Running 1 (23m ago) 39m
prometheus-9944457d7-q6xkl 1/1 Running 2 (23m ago) 39m
codeintel-db-0 2/2 Running 4 (23m ago) 38m
syntect-server-5cbd47df6b-hrd6w 1/1 Running 2 (23m ago) 39m
searcher-0 1/1 Running 2 (23m ago) 38m
indexed-search-1 2/2 Running 4 (23m ago) 38m
executor-batches-codeintel-6d4998cdfb-58jq6 1/1 Running 2 (23m ago) 39m
redis-store-6dfd9dd9f9-52ccm 2/2 Running 4 (23m ago) 38m
pgsql-0 2/2 Running 4 (23m ago) 38m
otel-collector-6cc9b7dd4-z9h8l 1/1 Running 2 (23m ago) 62m
searcher-785f9f5ddb-dtmt7 1/1 Running 2 (23m ago) 181d
sourcegraph-frontend-6f5f7f796c-gr7sj 1/1 Running 1 (23m ago) 39m
precise-code-intel-worker-9b49484c6-djsbf 1/1 Running 2 (23m ago) 39m
symbols-0 1/1 Running 2 (23m ago) 37m
node-exporter-whgvk 1/1 Running 2 (23m ago) 39m
otel-agent-bnnj2 1/1 Running 2 (23m ago) 39m
indexed-search-0 2/2 Running 2 (23m ago) 37m
redis-cache-76d699955b-m4dtg 2/2 Running 4 (23m ago) 38m
migrator-nxavh-ph877 0/1 Completed 0 4m44s
drift seems to have resolved the versions table is correctly set
So the root cause here seems to be an issue in the migrators init and run of the up
command. Admins encountering this issue are advised to try rebooting their EC2 machine. Further reproduction of this issue and resolution will be explored. Running migrator up
manually may resolve this issue.
Admins experiencing this issue are advised to check for "orphaned"/errored migrator
pods
Migrator fails on upgrade
Issue reported in v4.4
During an AMI upgrade, if an AMI instance is upgraded via the standard upgrade procedure, some drift may be introduce if the instance is not rebooted as instructed in step 10.
Reproduction
An instance s initialized in v4.4
Upgrade to v4.5.1 - reboot
Observe failing migrator pods after startup and attachment of old volume
reboot the EC2 machine and check for drift
Upgrade to v5.0.6
Upgrade to v5.0.6
Upgrade to v5.1.4
Version
Drift in UI
Manual drift check
Database version
Reboot in version 5.1.4
Checking drift against the v5.0.6 version
Actual logs omitted, but this drift output is the same as is registered in the Update page
Summary
On upgrade from v5.0.6 to v5.1.x the migrator isn't correctly initializing and setting the db state to the correct version. It is however likely running schema migrations. Either the migrations are being applied correctly and the schema drift in the updates page is the result of a bad
versions
table entry. Or the schema migrations aren't being run by theup
command.Given these conditions once the direction issue of the
up
operations failure is identified this can likely be solved manually by correct use of theupgrade
command. A hypothesis as to the root cause of this issue is the tagging of a5.0.6
image set insourcegraph/deploy
repo. Migrators image definitions insourcegraph/sourcegraph
repo may not handle correctly for the extra/missing version.