microsoft / sql-server-samples

Azure Data SQL Samples - Official Microsoft GitHub Repository containing code samples for SQL Server, Azure SQL, Azure Synapse, and Azure SQL Edge
Other
9.96k stars 8.85k forks source link

K8s 1.18 - SQL 2019 HA #806

Open thepip3r opened 4 years ago

thepip3r commented 4 years ago

Having a problem getting the SQL 2019 HA to deploy on my K8s 1.18

I've built all my configs and currently have:

Status Checks:

[me@km-01 pods]$ kubectl cluster-info
Kubernetes master is running at https://k8snlb:6443
KubeDNS is running at https://k8snlb:6443/api/v1/namespaces/kube-system/services/kube-dns:dns/proxy

kubectl get all --all-namespaces reports:

[me@km-01 pods]$ kubectl get all --all-namespaces -owide
NAMESPACE     NAME                                                  READY   STATUS    RESTARTS   AGE   IP            NODE                      NOMINATED NODE   READINESS GATES
ag1           pod/mssql-operator-68bcc684c4-rbzvn                   1/1     Running   0          86m   10.10.4.133   kw-02.bogus.local   <none>           <none>
kube-system   pod/coredns-66bff467f8-k6m94                          1/1     Running   4          20h   10.10.0.11    km-01.bogus.local   <none>           <none>
kube-system   pod/coredns-66bff467f8-v848r                          1/1     Running   4          20h   10.10.0.10    km-01.bogus.local   <none>           <none>
kube-system   pod/kube-apiserver-km-01.bogus.local            1/1     Running   8          10h   x.x.x..25   km-01.bogus.local   <none>           <none>
kube-system   pod/kube-controller-manager-km-01.bogus.local   1/1     Running   2          10h   x.x.x..25   km-01.bogus.local   <none>           <none>
kube-system   pod/kube-flannel-ds-amd64-7l76c                       1/1     Running   0          10h   x.x.x..30   kw-01.bogus.local   <none>           <none>
kube-system   pod/kube-flannel-ds-amd64-8kft7                       1/1     Running   0          10h   x.x.x..33   kw-04.bogus.local   <none>           <none>
kube-system   pod/kube-flannel-ds-amd64-r5kqv                       1/1     Running   0          10h   x.x.x..34   kw-05.bogus.local   <none>           <none>
kube-system   pod/kube-flannel-ds-amd64-t6xcd                       1/1     Running   0          10h   x.x.x..35   kw-06.bogus.local   <none>           <none>
kube-system   pod/kube-flannel-ds-amd64-vhnx8                       1/1     Running   0          10h   x.x.x..32   kw-03.bogus.local   <none>           <none>
kube-system   pod/kube-flannel-ds-amd64-xdk2n                       1/1     Running   0          10h   x.x.x..31   kw-02.bogus.local   <none>           <none>
kube-system   pod/kube-flannel-ds-amd64-z4kfk                       1/1     Running   4          20h   x.x.x..25   km-01.bogus.local   <none>           <none>
kube-system   pod/kube-proxy-49hsl                                  1/1     Running   0          10h   x.x.x..35   kw-06.bogus.local   <none>           <none>
kube-system   pod/kube-proxy-62klh                                  1/1     Running   0          10h   x.x.x..34   kw-05.bogus.local   <none>           <none>
kube-system   pod/kube-proxy-64d5t                                  1/1     Running   0          10h   x.x.x..30   kw-01.bogus.local   <none>           <none>
kube-system   pod/kube-proxy-6ch42                                  1/1     Running   4          20h   x.x.x..25   km-01.bogus.local   <none>           <none>
kube-system   pod/kube-proxy-9css4                                  1/1     Running   0          10h   x.x.x..32   kw-03.bogus.local   <none>           <none>
kube-system   pod/kube-proxy-hgrx8                                  1/1     Running   0          10h   x.x.x..33   kw-04.bogus.local   <none>           <none>
kube-system   pod/kube-proxy-ljlsh                                  1/1     Running   0          10h   x.x.x..31   kw-02.bogus.local   <none>           <none>
kube-system   pod/kube-scheduler-km-01.bogus.local            1/1     Running   5          20h   x.x.x..25   km-01.bogus.local   <none>           <none>

NAMESPACE     NAME                    TYPE        CLUSTER-IP       EXTERNAL-IP                                                               PORT(S)                  AGE   SELECTOR
ag1           service/ag1-primary     NodePort    10.104.183.81    x.x.x..30,x.x.x..31,x.x.x..32,x.x.x..33,x.x.x..34,x.x.x..35   1433:30405/TCP           85m   role.ag.mssql.microsoft.com/ag1=primary,type=sqlservr
ag1           service/ag1-secondary   NodePort    10.102.52.31     x.x.x..30,x.x.x..31,x.x.x..32,x.x.x..33,x.x.x..34,x.x.x..35   1433:30713/TCP           85m   role.ag.mssql.microsoft.com/ag1=secondary,type=sqlservr
ag1           service/mssql1          NodePort    10.96.166.108    x.x.x..30,x.x.x..31,x.x.x..32,x.x.x..33,x.x.x..34,x.x.x..35   1433:32439/TCP           86m   name=mssql1,type=sqlservr
ag1           service/mssql2          NodePort    10.109.146.58    x.x.x..30,x.x.x..31,x.x.x..32,x.x.x..33,x.x.x..34,x.x.x..35   1433:30636/TCP           86m   name=mssql2,type=sqlservr
ag1           service/mssql3          NodePort    10.101.234.186   x.x.x..30,x.x.x..31,x.x.x..32,x.x.x..33,x.x.x..34,x.x.x..35   1433:30862/TCP           86m   name=mssql3,type=sqlservr
default       service/kubernetes      ClusterIP   10.96.0.1        <none>                                                                    443/TCP                  23h   <none>
kube-system   service/kube-dns        ClusterIP   10.96.0.10       <none>                                                                    53/UDP,53/TCP,9153/TCP   20h   k8s-app=kube-dns

NAMESPACE     NAME                                     DESIRED   CURRENT   READY   UP-TO-DATE   AVAILABLE   NODE SELECTOR            AGE   CONTAINERS     IMAGES                                   SELECTOR
kube-system   daemonset.apps/kube-flannel-ds-amd64     7         7         7       7            7           <none>                   20h   kube-flannel   quay.io/coreos/flannel:v0.12.0-amd64     app=flannel
kube-system   daemonset.apps/kube-flannel-ds-arm       0         0         0       0            0           <none>                   20h   kube-flannel   quay.io/coreos/flannel:v0.12.0-arm       app=flannel
kube-system   daemonset.apps/kube-flannel-ds-arm64     0         0         0       0            0           <none>                   20h   kube-flannel   quay.io/coreos/flannel:v0.12.0-arm64     app=flannel
kube-system   daemonset.apps/kube-flannel-ds-ppc64le   0         0         0       0            0           <none>                   20h   kube-flannel   quay.io/coreos/flannel:v0.12.0-ppc64le   app=flannel
kube-system   daemonset.apps/kube-flannel-ds-s390x     0         0         0       0            0           <none>                   20h   kube-flannel   quay.io/coreos/flannel:v0.12.0-s390x     app=flannel
kube-system   daemonset.apps/kube-proxy                7         7         7       7            7           kubernetes.io/os=linux   20h   kube-proxy     k8s.gcr.io/kube-proxy:v1.18.7            k8s-app=kube-proxy

NAMESPACE     NAME                             READY   UP-TO-DATE   AVAILABLE   AGE   CONTAINERS       IMAGES                                          SELECTOR
ag1           deployment.apps/mssql-operator   1/1     1            1           86m   mssql-operator   mcr.microsoft.com/mssql/ha:2019-CTP2.1-ubuntu   app=mssql-operator
kube-system   deployment.apps/coredns          2/2     2            2           20h   coredns          k8s.gcr.io/coredns:1.6.7                        k8s-app=kube-dns

NAMESPACE     NAME                                        DESIRED   CURRENT   READY   AGE   CONTAINERS       IMAGES                                          SELECTOR
ag1           replicaset.apps/mssql-operator-68bcc684c4   1         1         1       86m   mssql-operator   mcr.microsoft.com/mssql/ha:2019-CTP2.1-ubuntu   app=mssql-operator,pod-template-hash=68bcc684c4
kube-system   replicaset.apps/coredns-66bff467f8          2         2         2       20h   coredns          k8s.gcr.io/coredns:1.6.7                        k8s-app=kube-dns,pod-template-hash=66bff467f8

To the problem: There are a number of articles talking about a SQL2019 HA build. It appears that every single one however, is in the cloud whereas mine is on-prem in a Vsphere env. They appear to be very simple: Run 3 scripts in this order: operator.yaml, sql.yaml, and ag-service.yaml.

My YAML's are based on: https://github.com/microsoft/sql-server-samples/tree/master/samples/features/high%20availability/Kubernetes/sample-manifest-files

For the blogs that actually screenshot the environment afterward, there should be at least 7 pods (1 Operator, 3 SQL Init, 3 SQL). If you look at my aforementioned all --all-namespaces output, I have everything (and in a running state) but no pods other than the running Operator...???

I actually broke the control plane back to a single-node just to try to isolate the logs. /var/log/container/ and /var/log/pod/ contain nothing of value to indicate a problem with storage or any other reason the the Pods are non-existent. It's probably also worth noting that I started using the latest sql2019 label: 2019-latest but when I got the same behavior there, I decided to try to use the old bits since so many blogs are based on CTP 2.1.

I can create PVs and PVCs using the VCP storage provider. I have my Secrets and can see them in the Secrets store.

I'm at a loss as to explain why pods are missing or where to look after checking journalctl, the daemons themselves, and /var/log and I don't see any indication there's even an attempt to create them -- the kubectl apply -f mssql-server2019.yaml that I adapted runs to completion and without error indicating 3 sql objects and 3 sql services get created. But here is the file anyway targeting CTP2.1:

cat << EOF > mssql-server2019.yaml
apiVersion: mssql.microsoft.com/v1
kind: SqlServer
metadata:
  labels: {name: mssql1, type: sqlservr}
  name: mssql1
  namespace: ag1
spec:
  acceptEula: true
  agentsContainerImage: mcr.microsoft.com/mssql/ha:2019-CTP2.1
  availabilityGroups: [ag1]
  instanceRootVolumeClaimTemplate:
    accessModes: [ReadWriteOnce]
    resources:
      requests: {storage: 5Gi}
    storageClass: default
  saPassword:
    secretKeyRef: {key: sapassword, name: sql-secrets}
  sqlServerContainer: {image: 'mcr.microsoft.com/mssql/server:2019-CTP2.1'}
---
apiVersion: v1
kind: Service
metadata: {name: mssql1, namespace: ag1}
spec:
  ports:
  - {name: tds, port: 1433}
  selector: {name: mssql1, type: sqlservr}
  type: NodePort
  externalIPs:
    - x.x.x.30
    - x.x.x.31
    - x.x.x.32
    - x.x.x.33
    - x.x.x.34
    - x.x.x.35
---
apiVersion: mssql.microsoft.com/v1
kind: SqlServer
metadata:
  labels: {name: mssql2, type: sqlservr}
  name: mssql2
  namespace: ag1
spec:
  acceptEula: true
  agentsContainerImage: mcr.microsoft.com/mssql/ha:2019-CTP2.1
  availabilityGroups: [ag1]
  instanceRootVolumeClaimTemplate:
    accessModes: [ReadWriteOnce]
    resources:
      requests: {storage: 5Gi}
    storageClass: default
  saPassword:
    secretKeyRef: {key: sapassword, name: sql-secrets}
  sqlServerContainer: {image: 'mcr.microsoft.com/mssql/server:2019-CTP2.1'}
---
apiVersion: v1
kind: Service
metadata: {name: mssql2, namespace: ag1}
spec:
  ports:
  - {name: tds, port: 1433}
  selector: {name: mssql2, type: sqlservr}
  type: NodePort
  externalIPs:
    - x.x.x.30
    - x.x.x.31
    - x.x.x.32
    - x.x.x.33
    - x.x.x.34
    - x.x.x.35
---
apiVersion: mssql.microsoft.com/v1
kind: SqlServer
metadata:
  labels: {name: mssql3, type: sqlservr}
  name: mssql3
  namespace: ag1
spec:
  acceptEula: true
  agentsContainerImage: mcr.microsoft.com/mssql/ha:2019-CTP2.1
  availabilityGroups: [ag1]
  instanceRootVolumeClaimTemplate:
    accessModes: [ReadWriteOnce]
    resources:
      requests: {storage: 5Gi}
    storageClass: default
  saPassword:
    secretKeyRef: {key: sapassword, name: sql-secrets}
  sqlServerContainer: {image: 'mcr.microsoft.com/mssql/server:2019-CTP2.1'}
---
apiVersion: v1
kind: Service
metadata: {name: mssql3, namespace: ag1}
spec:
  ports:
  - {name: tds, port: 1433}
  selector: {name: mssql3, type: sqlservr}
  type: NodePort
  externalIPs:
    - x.x.x.30
    - x.x.x.31
    - x.x.x.32
    - x.x.x.33
    - x.x.x.34
    - x.x.x.35
---
EOF

Edit1: kubectl logs -n ag mssql-operator-*

[sqlservers] 2020/08/14 14:36:48 Creating custom resource definition
[sqlservers] 2020/08/14 14:36:48 Created custom resource definition
[sqlservers] 2020/08/14 14:36:48 Waiting for custom resource definition to be available
[sqlservers] 2020/08/14 14:36:49 Watching for resources...
[sqlservers] 2020/08/14 14:37:08 Creating ConfigMap sql-operator
[sqlservers] 2020/08/14 14:37:08 Updating mssql1 in namespace ag1 ...
[sqlservers] 2020/08/14 14:37:08 Creating ConfigMap ag1
[sqlservers] ERROR: 2020/08/14 14:37:08 could not process update request: error creating ConfigMap ag1: v1.ConfigMap: ObjectMeta: v1.ObjectMeta: readObjectFieldAsBytes: expect : after object field, parsing 627 ...:{},"k:{\"... at {"kind":"ConfigMap","apiVersion":"v1","metadata":{"name":"ag1","namespace":"ag1","selfLink":"/api/v1/namespaces/ag1/configmaps/ag1","uid":"33af6232-4464-4290-bb14-b21e8f72e361","resourceVersion":"314186","creationTimestamp":"2020-08-14T14:37:08Z","ownerReferences":[{"apiVersion":"mssql.microsoft.com/v1","kind":"ReplicationController","name":"mssql1","uid":"e71a7246-2776-4d96-9735-844ee136a37d","controller":false}],"managedFields":[{"manager":"mssql-server-k8s-operator","operation":"Update","apiVersion":"v1","time":"2020-08-14T14:37:08Z","fieldsType":"FieldsV1","fieldsV1":{"f:metadata":{"f:ownerReferences":{".":{},"k:{\"uid\":\"e71a7246-2776-4d96-9735-844ee136a37d\"}":{".":{},"f:apiVersion":{},"f:controller":{},"f:kind":{},"f:name":{},"f:uid":{}}}}}}]}}
[sqlservers] 2020/08/14 14:37:08 Updating ConfigMap sql-operator
[sqlservers] 2020/08/14 14:37:08 Updating mssql2 in namespace ag1 ...
[sqlservers] ERROR: 2020/08/14 14:37:08 could not process update request: error getting ConfigMap ag1: v1.ConfigMap: ObjectMeta: v1.ObjectMeta: readObjectFieldAsBytes: expect : after object field, parsing 627 ...:{},"k:{\"... at {"kind":"ConfigMap","apiVersion":"v1","metadata":{"name":"ag1","namespace":"ag1","selfLink":"/api/v1/namespaces/ag1/configmaps/ag1","uid":"33af6232-4464-4290-bb14-b21e8f72e361","resourceVersion":"314186","creationTimestamp":"2020-08-14T14:37:08Z","ownerReferences":[{"apiVersion":"mssql.microsoft.com/v1","kind":"ReplicationController","name":"mssql1","uid":"e71a7246-2776-4d96-9735-844ee136a37d","controller":false}],"managedFields":[{"manager":"mssql-server-k8s-operator","operation":"Update","apiVersion":"v1","time":"2020-08-14T14:37:08Z","fieldsType":"FieldsV1","fieldsV1":{"f:metadata":{"f:ownerReferences":{".":{},"k:{\"uid\":\"e71a7246-2776-4d96-9735-844ee136a37d\"}":{".":{},"f:apiVersion":{},"f:controller":{},"f:kind":{},"f:name":{},"f:uid":{}}}}}}]}}
[sqlservers] 2020/08/14 14:37:08 Updating ConfigMap sql-operator
[sqlservers] 2020/08/14 14:37:08 Updating mssql3 in namespace ag1 ...
[sqlservers] ERROR: 2020/08/14 14:37:08 could not process update request: error getting ConfigMap ag1: v1.ConfigMap: ObjectMeta: v1.ObjectMeta: readObjectFieldAsBytes: expect : after object field, parsing 627 ...:{},"k:{\"... at {"kind":"ConfigMap","apiVersion":"v1","metadata":{"name":"ag1","namespace":"ag1","selfLink":"/api/v1/namespaces/ag1/configmaps/ag1","uid":"33af6232-4464-4290-bb14-b21e8f72e361","resourceVersion":"314186","creationTimestamp":"2020-08-14T14:37:08Z","ownerReferences":[{"apiVersion":"mssql.microsoft.com/v1","kind":"ReplicationController","name":"mssql1","uid":"e71a7246-2776-4d96-9735-844ee136a37d","controller":false}],"managedFields":[{"manager":"mssql-server-k8s-operator","operation":"Update","apiVersion":"v1","time":"2020-08-14T14:37:08Z","fieldsType":"FieldsV1","fieldsV1":{"f:metadata":{"f:ownerReferences":{".":{},"k:{\"uid\":\"e71a7246-2776-4d96-9735-844ee136a37d\"}":{".":{},"f:apiVersion":{},"f:controller":{},"f:kind":{},"f:name":{},"f:uid":{}}}}}}]}}

I've looked over my operator and mssql2019.yamls (specifically around the kind: SqlServer, since that seems to be where it's failing) and can't identify any glaring inconsistencies or differences.

I'd like to reiterate that I was originally using the 2019latest bits instead of the CTP2.1. I went back to CTP2.1 since that's what all of the online documentation is referencing (including MS's).

Am being told on StackOverflow that this is a K8s version problem between the 2019 operator and my version of K8s. Is that correct?

yenneferofvengerberg commented 3 years ago

Hi, the issue still persists on k8s 1.20 even after manually upgraded the image paths to ha:2019-latest and server:2019-latest. We have to wait that ha-operator gets upgraded.

Dumh1233 commented 3 years ago

Hi, I seem to have the same problem on OpenShift 4.5 and k8s (from version 1.17 to 1.20), looking into the logs of the operator pod it seems that there is a problem with the parsing:

- [sqlservers] ERROR: 2021/03/01 15:12:37 could not process update request: error getting Service ag1: v1.Service: ObjectMeta: v1.ObjectMeta: readObjectFieldAsBytes: expect : after object field, parsing 458 ...:{},"k:{\"... at { "kind": "Service", "apiVersion": "v1", "metadata": { "name": "ag1", "namespace": "ag1", "selfLink": "/api/v1/namespaces/ag1/services/ag1", "uid": "593a771b-6ef4-4d2f-b301-b883a1bb5f39", "resourceVersion": "171794318", "creationTimestamp": "2021-03-01T15:05:59Z", "managedFields": [ { "manager": "mssql-server-k8s-operator", "operation": "Update", "apiVersion": "v1", "time": "2021-03-01T15:05:59Z", "fieldsType": "FieldsV1", "fieldsV1": { "f:spec": { "f:clusterIP": {}, "f:ports": { ".": {}, "k:{\"port\":1433,\"protocol\":\"TCP\"}": { ".": {}, "f:name": {}, "f:port": {}, "f:protocol": {}, "f:targetPort": {} }, "k:{\"port\":5022,\"protocol\":\"TCP\"}": { ".": {}, "f:name": {}, "f:port": {}, "f:protocol": {}, "f:targetPort": {} } }, "f:selector": { ".": {}, "f:ag-service.mssql.microsoft.com/ag1": {} }, "f:sessionAffinity": {}, "f:type": {} } } } ] }, "spec": { "ports": [ { "name": "tds", "protocol": "TCP", "port": 1433, "targetPort": 1433 }, { "name": "dbm", "protocol": "TCP", "port": 5022, "targetPort": 5022 } ], "selector": { "ag-service.mssql.microsoft.com/ag1": "" }, "clusterIP": "None", "type": "ClusterIP", "sessionAffinity": "None" }, "status": { "loadBalancer": {} } }

It seems that it can't parse the backslash near the quotation mark \" as it should in the Config Map ag1 which ruins the parsing of the string. The owners of this Config Map are: mssql1-3 which I get error 404 (won't exist) when going into them in OpenShift, seems to also be a problem...