stormshift / support

This repo should serve as a central source for reporting issues with stormshift
GNU General Public License v3.0
3 stars 0 forks source link

Image registry won't start #142

Closed rbo closed 7 months ago

rbo commented 10 months ago

Events

Unable to attach or mount volumes: unmounted volumes=[registry-storage], unattached volumes=[registry-tls ca-trust-extracted registry-certificates trusted-ca installation-pull-secrets bound-sa-token kube-api-access-r9rzb registry-storage]: timed out waiting for the condition
...
Multi-Attach error for volume "pvc-deaa30b8-ce57-4f09-910a-a62d213d506b" Volume is already exclusively attached to one node and can't be attached to another
rbo commented 10 months ago

NFS/NAS volume ReadWriteOnce :-(

$ oc get pvc -n openshift-image-registry image-registry -o yaml
apiVersion: v1
kind: PersistentVolumeClaim
metadata:
  annotations:
    pv.kubernetes.io/bind-completed: "yes"
    pv.kubernetes.io/bound-by-controller: "yes"
    volume.beta.kubernetes.io/storage-provisioner: csi.trident.netapp.io
    volume.kubernetes.io/storage-provisioner: csi.trident.netapp.io
  creationTimestamp: "2023-03-12T13:02:18Z"
  finalizers:
  - kubernetes.io/pvc-protection
  labels:
    app.kubernetes.io/instance: cluster-configuration
  name: image-registry
  namespace: openshift-image-registry
  resourceVersion: "168306"
  uid: deaa30b8-ce57-4f09-910a-a62d213d506b
spec:
  accessModes:
  - ReadWriteOnce
  resources:
    requests:
      storage: 500Gi
  storageClassName: coe-netapp-nas
  volumeMode: Filesystem
  volumeName: pvc-deaa30b8-ce57-4f09-910a-a62d213d506b
status:
  accessModes:
  - ReadWriteOnce
  capacity:
    storage: 500Gi
  phase: Bound
$ 
rbo commented 10 months ago

Trident looks bad as well:

image

Let's fix first trident

rbo commented 10 months ago
$ oc get tbc,tbe,torc -A
NAMESPACE        NAME                                                    BACKEND NAME     BACKEND UUID                           PHASE   STATUS
netapp-trident   tridentbackendconfig.trident.netapp.io/coe-netapp-nas   coe-netapp-nas   78d5a96f-ca1b-4863-8e3b-98d68d89cd50   Bound   Failed
netapp-trident   tridentbackendconfig.trident.netapp.io/coe-netapp-san   coe-netapp-san   4292a4bd-77cb-4ce0-a356-9d338434e364   Bound   Failed

NAMESPACE        NAME                                         BACKEND          BACKEND UUID
netapp-trident   tridentbackend.trident.netapp.io/tbe-78zjw   coe-netapp-san   4292a4bd-77cb-4ce0-a356-9d338434e364
netapp-trident   tridentbackend.trident.netapp.io/tbe-ndzkt   coe-netapp-nas   78d5a96f-ca1b-4863-8e3b-98d68d89cd50

NAMESPACE   NAME                                            AGE
            tridentorchestrator.trident.netapp.io/trident   150d
$ 
rbo commented 10 months ago
$ oc describe tbc/coe-netapp-nas
Name:         coe-netapp-nas
Namespace:    netapp-trident
Labels:       app.kubernetes.io/instance=cluster-configuration
Annotations:  <none>
API Version:  trident.netapp.io/v1
Kind:         TridentBackendConfig
Metadata:
  Creation Timestamp:  2023-03-12T12:31:30Z
  Finalizers:
    trident.netapp.io
  Generation:  1
  Managed Fields:
    API Version:  trident.netapp.io/v1
    Fields Type:  FieldsV1
    fieldsV1:
      f:metadata:
        f:annotations:
          .:
          f:kubectl.kubernetes.io/last-applied-configuration:
        f:labels:
          .:
          f:app.kubernetes.io/instance:
      f:spec:
        .:
        f:backendName:
        f:credentials:
          .:
          f:name:
        f:managementLIF:
        f:storageDriverName:
        f:storagePrefix:
        f:version:
    Manager:      argocd-controller
    Operation:    Update
    Time:         2023-03-12T12:31:30Z
    API Version:  trident.netapp.io/v1
    Fields Type:  FieldsV1
    fieldsV1:
      f:metadata:
        f:finalizers:
          .:
          v:"trident.netapp.io":
    Manager:      trident_orchestrator
    Operation:    Update
    Time:         2023-03-12T12:31:30Z
    API Version:  trident.netapp.io/v1
    Fields Type:  FieldsV1
    fieldsV1:
      f:status:
        .:
        f:backendInfo:
          .:
          f:backendName:
          f:backendUUID:
        f:deletionPolicy:
        f:lastOperationStatus:
        f:message:
        f:phase:
    Manager:         trident_orchestrator
    Operation:       Update
    Subresource:     status
    Time:            2023-08-04T22:55:38Z
  Resource Version:  481698149
  UID:               77de0cad-8f7f-4053-9650-1b9de951155c
Spec:
  Backend Name:  coe-netapp-nas
  Credentials:
    Name:               coe-netapp-svm-trident
  Management LIF:       10.32.97.30
  Storage Driver Name:  ontap-nas
  Storage Prefix:       isar_
  Version:              1
Status:
  Backend Info:
    Backend Name:         coe-netapp-nas
    Backend UUID:         78d5a96f-ca1b-4863-8e3b-98d68d89cd50
  Deletion Policy:        delete
  Last Operation Status:  Failed
  Message:                Failed to apply the backend update; problem initializing storage driver 'ontap-nas': error initializing ontap-nas driver: could not create Data ONTAP API client: error creating ONTAP API client: error enumerating SVMs: Post "https://10.32.97.30/servlets/netapp.servlets.admin.XMLrequest_filer": context deadline exceeded (Client.Timeout exceeded while awaiting headers)
  Phase:                  Bound
Events:                   <none>
$ curl -k https://10.32.97.30/
<!DOCTYPE HTML PUBLIC "-//IETF//DTD HTML 2.0//EN">
<html><head>
<title>404 Not Found</title>
</head><body>
<h1>Not Found</h1>
<p>The requested URL was not found on this server.</p>
</body></html>
$ 
rbo commented 10 months ago

Double check netapp api:

curl -k --user xxx  -d "<netapp><system-get-version/></netapp>" https://10.32.97.30/servlets/netapp.servlets.admin.XMLrequest_filer
..
<?xml version='1.0' encoding='UTF-8' ?>
<!DOCTYPE netapp SYSTEM 'file:/etc/netapp_gx.dtd'>
<netapp version='1.170' xmlns='http://www.netapp.com/filer/admin'>
<results status="passed"><build-timestamp>1587134271</build-timestamp><is-clustered>true</is-clustered><version>NetApp Release 9.7P3: Fri Apr 17 14:37:51 UTC 2020</version><version-tuple><system-version-tuple><generation>9</generation><major>7</major><minor>0</minor></system-version-tuple></version-tuple></results></netapp>

=> Looks good.

Node ucs57 had some network issues - I mixed cables during #141

$ oc get tbc,tbe,torc -A
NAMESPACE        NAME                                                    BACKEND NAME     BACKEND UUID                           PHASE   STATUS
netapp-trident   tridentbackendconfig.trident.netapp.io/coe-netapp-nas   coe-netapp-nas   78d5a96f-ca1b-4863-8e3b-98d68d89cd50   Bound   Success
netapp-trident   tridentbackendconfig.trident.netapp.io/coe-netapp-san   coe-netapp-san   4292a4bd-77cb-4ce0-a356-9d338434e364   Bound   Success

NAMESPACE        NAME                                         BACKEND          BACKEND UUID
netapp-trident   tridentbackend.trident.netapp.io/tbe-78zjw   coe-netapp-san   4292a4bd-77cb-4ce0-a356-9d338434e364
netapp-trident   tridentbackend.trident.netapp.io/tbe-ndzkt   coe-netapp-nas   78d5a96f-ca1b-4863-8e3b-98d68d89cd50

NAMESPACE   NAME                                            AGE
            tridentorchestrator.trident.netapp.io/trident   
rbo commented 7 months ago

Problem was missing topology keys at storageclass. https://github.com/stormshift/clusters/commit/d97a0ef7b50abc3bd4ddd2d9dc62a55db01c9f73 https://github.com/stormshift/clusters/commit/10102a1d5cb27787e622b9aa7968116468fdfda7

iSCSI/SAN Storage is still not working #151