openebs / mayastor

Dynamically provision Stateful Persistent Replicated Cluster-wide Fabric Volumes & Filesystems for Kubernetes that is provisioned from an optimized NVME SPDK backend data storage stack.
Apache License 2.0
727 stars 105 forks source link

default installation results in warnings; CRD's missing #1472

Closed wibed closed 1 year ago

wibed commented 1 year ago

on a fresh set up cluster:

wibed commented 1 year ago

i increased the verbosity level to 8

it tells me nothing, but that it fetches the k8s store and cant find MayastorePool within it

tiagolobocastro commented 1 year ago

Your etcd pods are all pending, could you check why? If you don't have a default storage class then you'd have to specify one or use manual.

wibed commented 1 year ago

i managed to resort to:

kubectl --kubeconfig ./kubeconfig patch storageclass mayastor -p '{"metadata": {"annotations":{"storageclass.kubernetes.io/is-default-class":"true"}}}'

this led me to reports of tainted nodes. lets see if i can reproduce it for you

tiagolobocastro commented 1 year ago

hmm If I understand that line correctly that won't help, you can't have mayastor provide etcd storage for mayastor itself..

It either has to come from another storage class, ex: if you're on the cloud, or "manual" or even openebs localpv. Example how to set storage class: helm instal ... --set="etcd.persistence.storageClass=manual,loki-stack.loki.persistence.storageClassName=manual"

wibed commented 1 year ago

hmm If I understand that line correctly that won't help, you can't have mayastor provide etcd storage for mayastor itself..

It either has to come from another storage class, ex: if you're on the cloud, or "manual" or even openebs localpv. Example how to set storage class: helm instal ... --set="etcd.persistence.storageClass=manual,loki-stack.loki.persistence.storageClassName=manual"

the line picks storageclass mayastor and adds the isDefault flag on top of it. the cluster itself runs on proxmox

see the following:

apiVersion: storage.k8s.io/v1
kind: StorageClass
metadata:
  name: mayastor
parameters:
  repl: '1'
  protocol: 'nvmf'
provisioner: io.openebs.csi-mayastor
volumeBindingMode: WaitForFirstConsumer

EDIT:

if i am honest. i dont understand the correlation between DiskPool and StorageClass, is it mandatory to provision a DiskPool for etcd to start up?

I'd like to set up storage manually and add my disks later.

tiagolobocastro commented 1 year ago

Yeah so that's not what we want to do here, please undo that change for now. For Mayastor volumes there's no correlation at all. But, mayastor itself makes use of its own etcd cluster, as well as a loki instance for logs collection (useful to generate support bundle) and these two things need storage. We use 3rd party helm charts for this, which consume storage via a StorageClass! And this is the storage class we need to give our helm chart when installing mayastor as by default it uses the default storage class IIRC. @Abhinandan-Purkait @avishnu I think we probably need to clarify this in the docs, if it's not already.

wibed commented 1 year ago

i come from taloslinux and dont have a storage class defined by default. referring to the official documentation. there is no such thing as a "default" storage class

https://kubernetes.io/docs/concepts/storage/storage-classes

there might be a storage class included in most releases. could you point out to me which storage class your are referring to?

Abhinandan-Purkait commented 1 year ago

@wibed The reason for the CRD's missing warning is that the mayastor-operator-diskpool-5955fcd645-nr67v is not up and running because its waiting for the mayastor-etcd pods to come up. The DiskPool CRD is not a part of the helm chart, it gets applied to cluster by mayastor-operator-diskpool after startup.

Now, for the reason for mayastor-etcd pods being pending is that it needs a storage provisioner other than mayastor. Mayastor is dependent on the mayastor-etcd and cannot provide storage to it by its own. You would need to have any other provisioner as @tiagolobocastro pointed out. Now if you don't have any you can install one, for example:- https://openebs.github.io/dynamic-localpv-provisioner/ and use this as a storage to etcd.

wibed commented 1 year ago

i have assigned the "meta partition" for the csi driver to recognize the disk as as candidate for the diskpool.

i named it "test-device", as defined in the storageclass

Name:            openebs-device-sc
IsDefaultClass:  Yes
Annotations:     kubectl.kubernetes.io/last-applied-configuration={"allowVolumeExpansion":true,"apiVersion":"storage.k8s.io/v1","kind":"StorageClass","metadata":{"annotations":{},"name":"openebs-device-sc"},"parameters":{"devname":"test-device"},"provisioner":"device.csi.openebs.io","volumeBindingMode":"WaitForFirstConsumer"}
,storageclass.kubernetes.io/is-default-class=true
Provisioner:           device.csi.openebs.io
Parameters:            devname=test-device
AllowVolumeExpansion:  True
MountOptions:          <none>
ReclaimPolicy:         Delete
VolumeBindingMode:     WaitForFirstConsumer
Events:                <none>

a describe on pvc results in a message notifying me about it to be waiting for etcd to be scheduled

  Type    Reason               Age                   From                         Message
  ----    ------               ----                  ----                         -------
  Normal  WaitForPodScheduled  59s (x3123 over 13h)  persistentvolume-controller  waiting for pod mayastor-etcd-0 to be scheduled

yet etcd notifies me about not enough storage space

Events:
  Type     Reason            Age                    From               Message
  ----     ------            ----                   ----               -------
  Warning  FailedScheduling  2m23s (x159 over 13h)  default-scheduler  0/4 nodes are available: 1 node(s) had untolerated taint {node-role.kubernetes.io/control-plane: }, 3 node(s) did not have enough free storage. preemption: 0/4 nodes are available: 4 Preemption is not helpful for scheduling..

from my perspective the claim waits for etcd and etcd waits for available storage which is predetermined by the pvc.

EDIT 1: @Abhinandan-Purkait

@wibed The reason for the CRD's missing warning is that the mayastor-operator-diskpool-5955fcd645-nr67v is not up and running because its waiting for the mayastor-etcd pods to come up. The DiskPool CRD is not a part of the helm chart, it gets applied to cluster by mayastor-operator-diskpool after startup.

i first jumped to the conclusion that the diskpool-operator was at fault for not recognizing the available storage. but you mentioned it waits for etcd to turn up. i must be missing an essential step somewhere in the midst of it.

EDIT 2: after resetting the whole node all the services went up as expected

wibed commented 1 year ago

yet i do not have any of the crds installed.

i do have openebs.device-localpvrunning as provisioner, yet missing the mayastor storageclass, diskpool or the crds of named resources.

i tried creating and using storageclass mayastor as per documentation. could not bind any volume to storage. after redirecting the storageclass to the openebs provisioner it worked fine.

Abhinandan-Purkait commented 1 year ago

Can you please send the output for kubectl get pods -n mayastor && kubectl get pvc-n mayastor ?

wibed commented 1 year ago
Abhinandan-Purkait commented 1 year ago

I don't see the Daemonset Pods? Are they not running? kubectl get ds -n mayastor

wibed commented 1 year ago

nope,

after querying it. i get

  Type     Reason        Age   From                  Message
  ----     ------        ----  ----                  -------
  Warning  FailedCreate  45m   daemonset-controller  Error creating: pods "mayastor-agent-ha-node-m2v9x" is forbidden: violates PodSecurity "baseline:latest": host namespaces (hostNetwork=true), hostPath volumes (volumes "device", "sys", "run-udev", "plugin-dir"), hostPort (container "agent-ha-node" uses hostPort 50053), privileged (container "agent-ha-node" must not set securityContext.privileged=true)
Abhinandan-Purkait commented 1 year ago

I believe you need some configuration on talos for running privileged pods.

wibed commented 1 year ago

https://github.com/openebs/mayastor/issues/1152

i had to pod-security to privileged.

kubectl --kubeconfig ./kubeconfig patch namespace mayastor -p '{"metadata": {"labels":{"pod-security.kubernetes.io/enforce=privileged"}}}'
mayastor         mayastor-agent-core-7c45b7b6c4-n5nzg          2/2     Running   0               18m
mayastor         mayastor-agent-ha-node-5j589                  1/1     Running   0               60s
mayastor         mayastor-agent-ha-node-cchxj                  1/1     Running   0               60s
mayastor         mayastor-agent-ha-node-tpmmm                  1/1     Running   0               60s
mayastor         mayastor-api-rest-754644d4cb-fzmbp            1/1     Running   0               18m
mayastor         mayastor-csi-node-q4lgj                       2/2     Running   0               77s
mayastor         mayastor-csi-node-vnn48                       2/2     Running   0               77s
mayastor         mayastor-csi-node-x4lc9                       2/2     Running   0               77s
mayastor         mayastor-etcd-0                               1/1     Running   0               18m
mayastor         mayastor-etcd-1                               1/1     Running   0               18m
mayastor         mayastor-etcd-2                               1/1     Running   0               18m
mayastor         mayastor-io-engine-2d49m                      0/2     Pending   0               86s
mayastor         mayastor-io-engine-d55w4                      0/2     Pending   0               86s
mayastor         mayastor-io-engine-wqhbh                      0/2     Pending   0               86s
mayastor         mayastor-loki-0                               1/1     Running   0               18m
mayastor         mayastor-obs-callhome-c76f65bd9-xqd5l         2/2     Running   0               18m
mayastor         mayastor-operator-diskpool-5955fcd645-h94w5   1/1     Running   0               18m
mayastor         mayastor-promtail-2v4jf                       1/1     Running   0               101s
mayastor         mayastor-promtail-7wb8h                       1/1     Running   0               101s
mayastor         mayastor-promtail-kc5qp                       1/1     Running   0               101s

if theres a mayastor csi. does it allocate storage himself? because after a fresh install i have

NAMESPACE   NAME                      STATUS   VOLUME                                     CAPACITY   ACCESS MODES   STORAGECLASS        AGE
mayastor    data-mayastor-etcd-0      Bound    pvc-b943cce9-1ac6-496e-afb8-b50fa607c85a   2Gi        RWO            openebs-device-sc   19m
mayastor    data-mayastor-etcd-1      Bound    pvc-75ea4948-d062-48e7-9183-eb0b387d9999   2Gi        RWO            openebs-device-sc   19m
mayastor    data-mayastor-etcd-2      Bound    pvc-f53e6d91-1ea3-412d-bb78-2067d4006bcc   2Gi        RWO            openebs-device-sc   19m
mayastor    storage-mayastor-loki-0   Bound    pvc-7a19d296-e2e0-4087-a3b5-83e579e674fe   10Gi       RWO            openebs-device-sc   19m

mayastor dependent components managed by the openebs.csi driver

Abhinandan-Purkait commented 1 year ago

No mayastor cannot provide storage to its own component like etcd and loki. For that you would need a different provisioner as openebs-local-device etc.

Can you please describe one of those mayastor-io-engine- pods to see why they are pending?

wibed commented 1 year ago

they didnt had enough resources.

NAMESPACE        NAME                                          READY   STATUS      RESTARTS        AGE
kube-system      coredns-d779cc7ff-b8hrj                       0/1     Completed   0               22h
kube-system      coredns-d779cc7ff-fqdwx                       1/1     Running     0               16m
kube-system      coredns-d779cc7ff-fvd8f                       0/1     Completed   0               22h
kube-system      coredns-d779cc7ff-p65lx                       1/1     Running     0               15m
kube-system      kube-apiserver-talos-j8w-2d2                  1/1     Running     0               22h
kube-system      kube-controller-manager-talos-j8w-2d2         1/1     Running     1 (2d13h ago)   22h
kube-system      kube-flannel-78kkl                            1/1     Running     1 (15m ago)     22h
kube-system      kube-flannel-9mbvg                            1/1     Running     1 (15m ago)     22h
kube-system      kube-flannel-j6fd9                            1/1     Running     0               22h
kube-system      kube-flannel-pzlsl                            1/1     Running     1 (16m ago)     22h
kube-system      kube-proxy-48758                              1/1     Running     0               16m
kube-system      kube-proxy-968v5                              1/1     Running     0               22h
kube-system      kube-proxy-wntqw                              1/1     Running     0               15m
kube-system      kube-proxy-xjzxm                              1/1     Running     0               16m
kube-system      kube-scheduler-talos-j8w-2d2                  1/1     Running     2 (22s ago)     22h
kube-system      openebs-device-controller-0                   2/2     Running     3 (24s ago)     54m
kube-system      openebs-device-node-2fx5h                     2/2     Running     2 (15m ago)     54m
kube-system      openebs-device-node-l98rp                     2/2     Running     2 (15m ago)     54m
kube-system      openebs-device-node-zmr49                     2/2     Running     2 (16m ago)     54m
mayastor         mayastor-agent-core-7c45b7b6c4-67xj4          2/2     Running     0               2m16s
mayastor         mayastor-agent-ha-node-4hlbn                  1/1     Running     0               2m15s
mayastor         mayastor-agent-ha-node-l5v5p                  1/1     Running     0               2m15s
mayastor         mayastor-agent-ha-node-rmfq6                  1/1     Running     0               2m15s
mayastor         mayastor-api-rest-754644d4cb-9zsjm            1/1     Running     0               2m16s
mayastor         mayastor-csi-controller-5bbb99bf6-k2f4m       5/5     Running     0               2m16s
mayastor         mayastor-csi-node-c928c                       2/2     Running     0               2m15s
mayastor         mayastor-csi-node-pm7r5                       2/2     Running     0               2m15s
mayastor         mayastor-csi-node-qrbft                       2/2     Running     0               2m15s
mayastor         mayastor-etcd-0                               1/1     Running     0               2m14s
mayastor         mayastor-etcd-1                               1/1     Running     0               2m8s
mayastor         mayastor-etcd-2                               1/1     Running     0               2m14s
mayastor         mayastor-io-engine-2vmvw                      2/2     Running     0               2m14s
mayastor         mayastor-io-engine-t7lf5                      2/2     Running     0               2m15s
mayastor         mayastor-io-engine-vfwsf                      2/2     Running     0               2m15s
mayastor         mayastor-loki-0                               1/1     Running     0               2m13s
mayastor         mayastor-obs-callhome-c76f65bd9-qvx76         2/2     Running     0               2m15s
mayastor         mayastor-operator-diskpool-5955fcd645-wpfr6   1/1     Running     0               2m15s
mayastor         mayastor-promtail-dhvpz                       1/1     Running     0               105s
mayastor         mayastor-promtail-gkfch                       1/1     Running     0               105s
mayastor         mayastor-promtail-kgnds                       1/1     Running     0               104s
metallb-system   controller-595f88d88f-2lfgp                   0/1     Completed   0               41m
metallb-system   controller-595f88d88f-6qrnd                   1/1     Running     0               16m
metallb-system   controller-595f88d88f-9q62m                   0/1     Completed   0               16m
metallb-system   speaker-8lhn8                                 1/1     Running     1 (14m ago)     14m
metallb-system   speaker-kzfdg                                 1/1     Running     1 (14m ago)     14m
metallb-system   speaker-llhnq                                 1/1     Running     0               41m
metallb-system   speaker-zqk4b                                 1/1     Running     1 (14m ago)     14m

now they're running

Abhinandan-Purkait commented 1 year ago

Great. Now your pools should have been created? kubectl get dsp -n mayastor

wibed commented 1 year ago

sadly not

error: the server doesn't have a resource type "dsp"
Abhinandan-Purkait commented 1 year ago

Can you send the logs for mayastor-operator-diskpool-5955fcd645-wpfr6 ?

wibed commented 1 year ago

i reset everything again. yet still no resource 'dsp' to be found.

Abhinandan-Purkait commented 1 year ago

These are the events can you send kubectl logs mayastor-operator-diskpool-5955fcd645-wpfr6 -n mayastor

wibed commented 1 year ago
Defaulted container "operator-diskpool" out of: operator-diskpool, agent-core-grpc-probe (init), etcd-probe (init)
K8S Operator (operator-diskpool) revision 89e839315f62 (v2.3.0+0)
  2023-07-31T07:21:47.157486Z  INFO operator_diskpool::diskpool::client: Replacing CRD: {
  "apiVersion": "apiextensions.k8s.io/v1",
  "kind": "CustomResourceDefinition",
  "metadata": {
    "name": "diskpools.openebs.io",
    "resourceVersion": "435542"
  },
  "spec": {
    "group": "openebs.io",
    "names": {
      "categories": [],
      "kind": "DiskPool",
      "plural": "diskpools",
      "shortNames": [
        "dsp"
      ],
      "singular": "diskpool"
    },
    "scope": "Namespaced",
    "versions": [
      {
        "additionalPrinterColumns": [
          {
            "description": "node the pool is on",
            "jsonPath": ".spec.node",
            "name": "node",
            "type": "string"
          },
          {
            "description": "dsp cr state",
            "jsonPath": ".status.state",
            "name": "state",
            "type": "string"
          },
          {
            "description": "Control plane pool status",
            "jsonPath": ".status.pool_status",
            "name": "pool_status",
            "type": "string"
          },
          {
            "description": "total bytes",
            "format": "int64",
            "jsonPath": ".status.capacity",
            "name": "capacity",
            "type": "integer"
          },
          {
            "description": "used bytes",
            "format": "int64",
            "jsonPath": ".status.used",
            "name": "used",
            "type": "integer"
          },
          {
            "description": "available bytes",
            "format": "int64",
            "jsonPath": ".status.available",
            "name": "available",
            "type": "integer"
          }
        ],
        "name": "v1alpha1",
        "schema": {
          "openAPIV3Schema": {
            "description": "Auto-generated derived type for DiskPoolSpec via `CustomResource`",
            "properties": {
              "spec": {
                "description": "The pool spec which contains the parameters we use when creating the pool",
                "properties": {
                  "disks": {
                    "description": "The disk device the pool is located on",
                    "items": {
                      "type": "string"
                    },
                    "type": "array"
                  },
                  "node": {
                    "description": "The node the pool is placed on",
                    "type": "string"
                  }
                },
                "required": [
                  "disks",
                  "node"
                ],
                "type": "object"
              },
              "status": {
                "description": "Status of the pool which is driven and changed by the controller loop.",
                "nullable": true,
                "properties": {
                  "available": {
                    "description": "Available number of bytes.",
                    "format": "uint64",
                    "minimum": 0.0,
                    "type": "integer"
                  },
                  "capacity": {
                    "description": "Capacity as number of bytes.",
                    "format": "uint64",
                    "minimum": 0.0,
                    "type": "integer"
                  },
                  "cr_state": {
                    "default": "Creating",
                    "description": "The state of the pool.",
                    "enum": [
                      "Creating",
                      "Created",
                      "Terminating"
                    ],
                    "type": "string"
                  },
                  "pool_status": {
                    "description": "Pool status from respective control plane object.",
                    "enum": [
                      "Unknown",
                      "Online",
                      "Degraded",
                      "Faulted"
                    ],
                    "nullable": true,
                    "type": "string"
                  },
                  "state": {
                    "enum": [
                      "Creating",
                      "Created",
                      "Online",
                      "Unknown",
                      "Error"
                    ],
                    "type": "string"
                  },
                  "used": {
                    "description": "Used number of bytes.",
                    "format": "uint64",
                    "minimum": 0.0,
                    "type": "integer"
                  }
                },
                "required": [
                  "available",
                  "capacity",
                  "state",
                  "used"
                ],
                "type": "object"
              }
            },
            "required": [
              "spec"
            ],
            "title": "DiskPool",
            "type": "object"
          }
        },
        "served": true,
        "storage": true,
        "subresources": {
          "status": {}
        }
      }
    ]
  }
}
    at k8s/operators/src/pool/diskpool/client.rs:49

  2023-07-31T07:21:47.170782Z  INFO operator_diskpool: Created, crd: "diskpools.openebs.io"
    at k8s/operators/src/pool/main.rs:655

  2023-07-31T07:21:52.178275Z  INFO operator_diskpool: Migration and Cleanup of CRs from MayastorPool to DiskPool complete
    at k8s/operators/src/pool/main.rs:843

  2023-07-31T07:21:52.178983Z  INFO operator_diskpool: Starting DiskPool Operator (dsp) in namespace mayastor
    at k8s/operators/src/pool/main.rs:708
Abhinandan-Purkait commented 1 year ago

Can you send kubectl get crd ?

wibed commented 1 year ago

nope theire created! i fetched the ones from the laptop's ranger desktop, instead of the remote ones.

it certainly was the privileged flag missing

thank you for the effort! awesome we managed to resolve it!

wibed commented 1 year ago

maybe you find something odd.

for the record

Abhinandan-Purkait commented 1 year ago

Great. Thanks for trying out.