openshift / lvm-operator

The LVM Operator deploys and manages LVM storage on OpenShift clusters
Apache License 2.0
42 stars 37 forks source link

Quick starting with the operator #83

Closed jgato closed 2 years ago

jgato commented 2 years ago

Hi there, I am starting to play with this operator with the idea to add a Dynamic Storage Provisioning to an SNO. I managed to build and deploy the operator (I have created and pushed my own operator image, because the default one points to a quay.io closed repo). It seems the operator is working oka

$> oc get pods
NAME                                  READY   STATUS    RESTARTS   AGE
controller-manager-66b84d759f-9zpv9   3/3     Running   0          2m54s

also I created a first StorageClass according to the documentation from topolvm:

kind: StorageClass
apiVersion: storage.k8s.io/v1
metadata:
  name: topolvm-provisioner
provisioner: topolvm.cybozu.com
parameters:
  "csi.storage.k8s.io/fstype": "xfs"
  "topolvm.cybozu.com/device-class": "ssd"
volumeBindingMode: WaitForFirstConsumer
allowVolumeExpansion: true
$ oc get sc
NAME                            PROVISIONER                    RECLAIMPOLICY   VOLUMEBINDINGMODE      ALLOWVOLUMEEXPANSION   AGE
local-sc                        kubernetes.io/no-provisioner   Delete          WaitForFirstConsumer   false                  2d4h
topolvm-provisioner (default)   topolvm.cybozu.com             Delete          WaitForFirstConsumer   true                   8m46s

I have tried to create a PVC:

apiVersion: v1
kind: PersistentVolumeClaim
metadata:
  name: topolvm-pv-claim
spec:
  storageClassName: topolvm-provisioner
  accessModes:
    - ReadWriteOnce
  resources:
    requests:
      storage: 3Gi

but it remains pending (for a pod, that is oka) for a PV:

$ oc get pvc
NAME               STATUS    VOLUME   CAPACITY   ACCESS MODES   STORAGECLASS          AGE
topolvm-pv-claim   Pending                                      topolvm-provisioner   2m43s
$> oc describe pvc topolvm-pv-claim
Name:          topolvm-pv-claim
Namespace:     lvm-operator-system
StorageClass:  topolvm-provisioner
Status:        Pending
Volume:        
Labels:        <none>
Annotations:   volume.beta.kubernetes.io/storage-provisioner: topolvm.cybozu.com
               volume.kubernetes.io/selected-node: master-0.apollo2.hpecloud.org
               volume.kubernetes.io/storage-provisioner: topolvm.cybozu.com
Finalizers:    [kubernetes.io/pvc-protection]
Capacity:      
Access Modes:  
VolumeMode:    Filesystem
Used By:       task-pv-pod
Events:
  Type    Reason                Age                   From                         Message
  ----    ------                ----                  ----                         -------
  Normal  WaitForFirstConsumer  3m9s (x2 over 3m19s)  persistentvolume-controller  waiting for first consumer to be created before binding
  Normal  ExternalProvisioning  9s (x14 over 3m6s)    persistentvolume-controller  waiting for a volume to be created, either by external provisioner "topolvm.cybozu.com" or manually created by system administrator

The external provisioner should create the PV? Checking the operator logs seems not aware of these resources, so I can imagine I have to create resources of kind LogicalVolumes or LVMCluster.

But now I am not sure how to proceed. I guess I have to create a LogicalVolume pointing to one local volume in my node. Please, some examples or quickstart would be really appreciated.

sp98 commented 2 years ago

Hi @jgato We are planning to work on the documentation soon. In the meantime, you can use the following steps to create the persistent volumes.

  1. After deploying the operator, create LVMCluster resource

    apiVersion: lvm.topolvm.io/v1alpha1
    kind: LVMCluster
    metadata:
    name: lvmcluster-sample
    namespace: lvm-operator-system    (This is the namespace you are running the operator on)
    spec:
    deviceClasses:
    - name: vg1
  2. Wait for the topolvm-node daemonset pods to be running. ( it might restart a few times waiting for the volume group to be created)

  3. Deploy a sample application with PVC and storageclass.

    
    apiVersion: storage.k8s.io/v1
    metadata:
    name: topolvm-provisioner-vg1
    provisioner: topolvm.cybozu.com
    parameters:
    "csi.storage.k8s.io/fstype": "xfs"
    "topolvm.cybozu.com/device-class": "vg1"
    volumeBindingMode: WaitForFirstConsumer
    allowVolumeExpansion: true

apiVersion: v1 kind: PersistentVolumeClaim metadata: name: pvc1 spec: accessModes:

jgato commented 2 years ago

many thanks I will try this afternoon. Meanwhile, I am wondering how should I point to the physical device in the host?

sp98 commented 2 years ago

Right now you don't have to do anything to point to specific disks on the host. The operator takes all the disks that don't have filesystem and partitions.

For example, if you have /dev/sda and /dev/sdb in your host that don't have any filesystem or partitions on it, volume groups will use these disk.

jgato commented 2 years ago

Ok I have created the LVMCluster but nothing seems to happens. The first strange thing I have noticed is, that it does not create any daemonset as you mentioned:

$> oc get daemonsets.apps -n lvm-operator-system 
No resources found in lvm-operator-system namespace.
$> oc describe lvmclusters.lvm.topolvm.io 
Name:         lvmcluster-sample
Namespace:    default
Labels:       <none>
Annotations:  <none>
API Version:  lvm.topolvm.io/v1alpha1
Kind:         LVMCluster
Metadata:
  Creation Timestamp:  2022-01-21T09:13:16Z
  Generation:          1
  Managed Fields:
    API Version:  lvm.topolvm.io/v1alpha1
    Fields Type:  FieldsV1
    fieldsV1:
      f:metadata:
        f:annotations:
          .:
          f:kubectl.kubernetes.io/last-applied-configuration:
      f:spec:
        .:
        f:deviceClasses:
    Manager:         kubectl-client-side-apply
    Operation:       Update
    Time:            2022-01-21T09:13:16Z
  Resource Version:  171955
  UID:               298b48ba-561a-4422-9157-937175c208ca
Spec:
  Device Classes:
    Name:  vg1
Events:    <none>

Maybe because of the device class? In this host I have some available disks:

# lsblk 
NAME    MAJ:MIN RM   SIZE RO TYPE MOUNTPOINT
sda       8:0    0   2.7T  0 disk 
sdb       8:16   0 931.5G  0 disk 
|-sdb1    8:17   0     1M  0 part 
|-sdb2    8:18   0   127M  0 part 
|-sdb3    8:19   0   384M  0 part /boot
`-sdb4    8:20   0   931G  0 part /sysroot
nvme0n1 259:0    0 745.2G  0 disk 
nvme1n1 259:1    0 745.2G  0 disk 
mulbc commented 2 years ago

Fixed @jgato 's issue in an interactive session - the problem was that he created the LVMCluster CR in the default namespace instead of the lvm-operator-system namespace (the sample yaml does not specify a namespace)

jgato commented 2 years ago

Not it is working and it has recognized two disks under the VG:

$> pvs
  PV           VG  Fmt  Attr PSize   PFree  
  /dev/nvme0n1 vg1 lvm2 a--  745.21g 745.21g
  /dev/sda     vg1 lvm2 a--   <2.73t  <2.73t

But the nvme1n1 is not included in the group. And it is also an available group

$> lsblk 
NAME    MAJ:MIN RM   SIZE RO TYPE MOUNTPOINT
sda       8:0    0   2.7T  0 disk 
sdb       8:16   0 931.5G  0 disk 
|-sdb1    8:17   0     1M  0 part 
|-sdb2    8:18   0   127M  0 part 
|-sdb3    8:19   0   384M  0 part /boot
`-sdb4    8:20   0   931G  0 part /sysroot
nvme0n1 259:0    0 745.2G  0 disk 
nvme1n1 259:1    0 745.2G  0 disk 

Looking to the logs I can see the disk nvme1n1 under the same conditions of disk nvme0n1 which is included.

{"level":"info","ts":1642757843.7951472,"logger":"controller.lvmvolumegroup.vg-manager","msg":"listing block devices","reconciler group":"lvm.topolvm.io","reconciler kind":"LVMVolumeGroup","name":"vg1","namespace":"lvm-operator-system","VGName":"vg1"}
{"level":"info","ts":1642757843.8154957,"logger":"controller.lvmvolumegroup.vg-manager","msg":"does not match filter","reconciler group":"lvm.topolvm.io","reconciler kind":"LVMVolumeGroup","name":"vg1","namespace":"lvm-operator-system","Device.Name":"sda","filter.Name":"noFilesystemSignature"}
{"level":"info","ts":1642757843.8248024,"logger":"controller.lvmvolumegroup.vg-manager","msg":"does not match filter","reconciler group":"lvm.topolvm.io","reconciler kind":"LVMVolumeGroup","name":"vg1","namespace":"lvm-operator-system","Device.Name":"sdb","filter.Name":"noChildren"}
{"level":"info","ts":1642757843.8248734,"logger":"controller.lvmvolumegroup.vg-manager","msg":"does not match filter","reconciler group":"lvm.topolvm.io","reconciler kind":"LVMVolumeGroup","name":"vg1","namespace":"lvm-operator-system","Device.Name":"nvme0n1","filter.Name":"noFilesystemSignature"}
{"level":"info","ts":1642757843.8249245,"logger":"controller.lvmvolumegroup.vg-manager","msg":"does not match filter","reconciler group":"lvm.topolvm.io","reconciler kind":"LVMVolumeGroup","name":"vg1","namespace":"lvm-operator-system","Device.Name":"nvme1n1","filter.Name":"noFilesystemSignature"}
#

I used this disk for a test with a LocalStorage Operator. But now it seems that is free. Is there any way I can check why this disk is not added?

sp98 commented 2 years ago

what's the output of lsblk -o NAME,ROTA,TYPE,SIZE,MODEL,VENDOR,RO,RM,STATE,KNAME,SERIAL,PARTLABEL,FSTYPE for nvme1n1 disk?

sp98 commented 2 years ago

also, the output for cat /proc/1/mountinfo | grep nvme1n1

jgato commented 2 years ago

yes, it still has a format:

#  lsblk -o NAME,ROTA,TYPE,SIZE,MODEL,VENDOR,RO,RM,STATE,KNAME,SERIAL,PARTLABEL,FSTYPE
NAME                                           ROTA TYPE   SIZE MODEL                                    VENDOR   RO RM STATE   KNAME   SERIAL             PARTLABEL  FSTYPE
sda                                               1 disk   2.7T LOGICAL VOLUME                           HP        0  0 running sda     PDNLL0ARH9D05M                LVM2_member
|-vg1-bd0d4467--e58d--4806--a453--90454e92a1da    1 lvm      5G                                                    0  0 running dm-0                                  
|-vg1-381eb9d7--a8dd--404d--9810--52ded982648b    1 lvm      1G                                                    0  0 running dm-1                                  
|-vg1-e04008f1--6eec--426a--aa8b--b9915c2911da    1 lvm      1G                                                    0  0 running dm-2                                  
|-vg1-358bcb04--7684--49d1--9bdf--945413b04cd7    1 lvm      1G                                                    0  0 running dm-3                                  
|-vg1-c93f3766--98fa--446a--a64f--13a6384c8f2a    1 lvm      1G                                                    0  0 running dm-4                                  
|-vg1-61628484--7f35--4d01--9dc2--86ce47c5fd55    1 lvm      1G                                                    0  0 running dm-5                                  
`-vg1-8ab73406--f756--4442--a567--e2d53958c2cd    1 lvm      1G                                                    0  0 running dm-6                                  
sdb                                               1 disk 931.5G LOGICAL VOLUME                           HP        0  0 running sdb     PDNLL0ARH9D05M                
|-sdb1                                            1 part     1M                                                    0  0         sdb1                       BIOS-BOOT  
|-sdb2                                            1 part   127M                                                    0  0         sdb2                       EFI-SYSTEM vfat
|-sdb3                                            1 part   384M                                                    0  0         sdb3                       boot       ext4
`-sdb4                                            1 part   931G                                                    0  0         sdb4                       root       xfs
nvme0n1                                           0 disk 745.2G MT0800KEXUU                                        0  0 live    nvme0n1 PHFT640100R2800CGN            LVM2_member
nvme1n1                                           0 disk 745.2G MT0800KEXUU                                        0  0 live    nvme1n1 PHFT640100PS800CGN            xfs

I thought that something in the log of the controller would have pointed to that. Thanks for all, I continue doing some tests.

sp98 commented 2 years ago

I thought that something in the log of the controller would have pointed to that. Thanks for all, I continue doing some tests.

I think the vgmanager controller might have pointed in out in the first reconcile. The output of the controller that you have pasted above, might be from a later reconcile when both nvme0n1 and nvme1n1 have the filesystem.

jgato commented 2 years ago

Once that I made nvme1n1 to not have a fsystem, is the operator detecting the new available disks to include it in the VolumeGroup?, or do I have to create a new LVMCluster to detect it?

jgato commented 2 years ago

Something extrange happend while making a test. I wanted to try adding new disks to the existing LVMCluster or to a new one. I added some more disks in the server, and after rebooting... these new disks have been added to the VG that I already had created previously (vg1)

(sdc and sde are the new disks that now appear under the same group)

[root@master-0 core]# vgs
  VG  #PV #LV #SN Attr   VSize VFree 
  vg1   5   4   0 wz--n- 6.00t <6.00t
[root@master-0 core]# pvs
  PV           VG  Fmt  Attr PSize   PFree  
  /dev/nvme0n1 vg1 lvm2 a--  745.21g 745.21g
  /dev/nvme1n1 vg1 lvm2 a--  745.21g 745.21g
  /dev/sda     vg1 lvm2 a--   <2.73t   2.72t
  /dev/sdc     vg1 lvm2 a--  931.48g 931.48g
  /dev/sde     vg1 lvm2 a--  931.48g 931.48g

At the same time I have created a new LVMCluster, because I expected this new groups would create a new VG (vg2) with the new disks. But it is not happening in that way.

Now there are two LVMClusters but none of them are using the new devices:

The LVMCluster that I created first (before adding new disks)

$ oc get  lvmcluster lvmcluster-sample -o yaml                                                                                                                                                                                  
apiVersion: lvm.topolvm.io/v1alpha1                                                                                                                                                                                                                           
kind: LVMCluster                   
metadata:       
  annotations:
    kubectl.kubernetes.io/last-applied-configuration: |
      {"apiVersion":"lvm.topolvm.io/v1alpha1","kind":"LVMCluster","metadata":{"annotations":{},"name":"lvmcluster-sample","namespace":"lvm-operator-system"},"spec":{"deviceClasses":[{"name":"vg1"}]}}
  creationTimestamp: "2022-02-05T18:56:54Z"                                                                                                                                                                                                                   
  finalizers:                              
  - lvmcluster.topolvm.io
  generation: 1          
  name: lvmcluster-sample
  namespace: lvm-operator-system
  resourceVersion: "6276722"
  uid: fb91728e-1300-4fe1-9041-78d5a8e166d6
spec:
  deviceClasses:
  - name: vg1
status:
  deviceClassStatuses:
  - name: vg1
    nodeStatus:
    - devices:
      - /dev/nvme0n1
      - /dev/nvme1n1
      - /dev/sda
      node: master-0.apollo2.hpecloud.org
      status: Ready
  ready: true

And the new one:

[jgato@infra2 lvm-cr]$ oc get  lvmcluster lvmcluster-sample-2 -o yaml                                                           
apiVersion: lvm.topolvm.io/v1alpha1
kind: LVMCluster
metadata:
  annotations:
    kubectl.kubernetes.io/last-applied-configuration: |
      {"apiVersion":"lvm.topolvm.io/v1alpha1","kind":"LVMCluster","metadata":{"annotations":{},"name":"lvmcluster-sample-2","namespace":"lvm-operator-system"},"spec":{"deviceClasses":[{"name":"vg2"}]}}
  creationTimestamp: "2022-02-05T19:42:38Z"
  finalizers:
  - lvmcluster.topolvm.io
  generation: 1
  name: lvmcluster-sample-2
  namespace: lvm-operator-system
  resourceVersion: "6286193"
  uid: 61feced0-e258-4d65-ae4b-ec2c8bada8e1
spec:
  deviceClasses:
  - name: vg2
status:
  deviceClassStatuses:
  - name: vg1
    nodeStatus:
    - devices:
      - /dev/nvme0n1
      - /dev/nvme1n1
      - /dev/sda
      node: master-0.apollo2.hpecloud.org
      status: Ready

The outcome of the second one seems a little confusing. Showing information about the VG1.

How is the correct way of adding new disks?

nbalacha commented 2 years ago

As of now, we only support a single LVMCluster which will use all disks on all nodes, which is why the newly added disks were added to the original volumegroup. Going forward, we will be adding features to all allow us to specify multiple volumegroups and the disks that need to be used for each.

Please open issues describing features and use cases which you would like to see in the operator. We welcome feedback.

nbalacha commented 2 years ago

To add more info: The proposed CR changes, once implemented, would be something as follows:

apiVersion: lvm.topolvm.io/v1alpha1 kind: LVMCluster metadata: name: lvmcluster-sample namespace: lvm-operator-system (This is the namespace you are running the operator on) spec: deviceClasses:

jgato commented 2 years ago

Ok, I understand that only one LVMCluser is allowed by the moment. But, this will LVMCluster got control on new disks appearing in the cluster? if you look to my previous post, the new added disks have been added to the VG (vg1), there are 5 disks. I guess, it is working at level of LVM or topolvm. But in the Openshift (SNO) Cluster, the LVMCluster wit the vg1 does not seems to be aware of the new disks, and it only lists 3 disks.

Let me know if you prefer to discuss this in a new thread.

nbalacha commented 2 years ago

Yes, please open a new issue /thread for this.