zilliztech / milvus-operator

The Kubernetes Operator of Milvus.
https://milvus.io
Apache License 2.0
33 stars 20 forks source link

Support for custom storage limits with the Minio deployment in operator #54

Open Tanchwa opened 7 months ago

Tanchwa commented 7 months ago

Hello, is there a way to specify how much storage is requested by the PVCs for Minio? I'm trying to do a POC deployment on my homelab and 500Gib per instance is WAY more than I can afford to spare.

I checked the values files on the chart as well as the cluster configuration options and the docs on the storage component and didn't find anything, so if it already exists, and if I'm just missing where to configure it, let me know.

haorenfsa commented 7 months ago

Hi @Tanchwa thank you for the feedback. You can alter the default PVC storage size through following config:

spec:
  # ... Skipped fields
  dependencies:
    storage: 
      # ... Skipped fields
      inCluster:
        # ... Skipped fields
        values:
          mode: standalone # 1 node mode, in POC you might also need this
          persistence:
            size: 20Gi # default storage size you need. note once created, cannot be changed. but you can change the PVC directly if the storageclass supports dynamic scaling.
Tanchwa commented 7 months ago

It seems there might be an issue with that Output from k get milvus...

     Persistance:
            Size:  20Gi
          Persistence:
            Access Mode:     ReadWriteOnce
            Enabled:         true
            Existing Claim:
            Size:            500Gi
            Storage Class:   <nil>

its recognizing that I'm trying to cap the persistent volume size at 20Gi but still creating a claim at 500Gi. I made sure to do a fresh deploy after nuking all the previous claims.

tanchwa@k8s-controlplane:~$ kubectl get pvc -n ai-tests
NAME                                                             STATUS   VOLUME                                     CAPACITY   ACCESS MODES   STORAGECLASS   AGE
data-my-release-etcd-0                                           Bound    pvc-3b4bb295-4e9f-430b-af11-7c81ccc37a5b   10Gi       RWO            longhorn       9m15s
my-release-minio                                                 Bound    pvc-b324fbd4-34d2-4890-997d-e3c7cba20e1b   500Gi      RWO            longhorn       4m27s
my-release-pulsar-bookie-journal-my-release-pulsar-bookie-0      Bound    pvc-85bcaa84-b8dc-4288-a7a5-2c9942f8c8d9   100Gi      RWO            longhorn       9m14s
my-release-pulsar-bookie-journal-my-release-pulsar-bookie-1      Bound    pvc-7996ed99-a0be-49ba-a045-6a33964d8c56   100Gi      RWO            longhorn       9m13s
my-release-pulsar-bookie-ledgers-my-release-pulsar-bookie-0      Bound    pvc-f33fcfe3-f19d-48c0-a61c-477c30291bfa   200Gi      RWO            longhorn       9m14s
my-release-pulsar-bookie-ledgers-my-release-pulsar-bookie-1      Bound    pvc-a52ebbe7-4658-4a6f-bf11-e0921961e311   200Gi      RWO            longhorn       9m13s
my-release-pulsar-zookeeper-data-my-release-pulsar-zookeeper-0   Bound    pvc-34426d40-20fd-43cd-9cb2-e8257c2da5ad   20Gi       RWO            longhorn       9m11s
haorenfsa commented 7 months ago

@Tanchwa It seems you put the size field in the wrong place. I tested it my self and it works fine. below is my full CR:

apiVersion: milvus.io/v1beta1
kind: Milvus
metadata:
  name: my-release
  labels:
    app: milvus
spec:
  config: {}
  components:
    standalone:
      replicas: 1
      serviceType: LoadBalancer
  dependencies:
    etcd:
      inCluster:
        values:
          replicaCount: 1
        deletionPolicy: Delete
        pvcDeletion: true
    storage:
      inCluster:
        values:
          mode: standalone
          resources:
            requests:
              memory: 100Mi
          persistence:
            size: 20Gi
        deletionPolicy: Delete
        pvcDeletion: true
Tanchwa commented 7 months ago

Nope, that's where I have it, too.

apiVersion: milvus.io/v1beta1
kind: Milvus
metadata:
  name: my-release
  labels:
    app: milvus
spec:
  mode: cluster
  dependencies:
    etcd:
      inCluster:
        values:
          replicaCount: 1
    pulsar:
      inCluster:
        values:
          components:
            autorecovery: false
            functions: false
            toolset: false
            pulsar_manager: false
          monitoring:
            prometheus: false
            grafana: false
            node_exporter: false
            alert_manager: false
          proxy:
            replicaCount: 1
            resources:
              requests:
                cpu: 0.01
                memory: 256Mi
            configData:
              PULSAR_MEM: >
                -Xms256m -Xmx256m
              PULSAR_GC: >
                -XX:MaxDirectMemorySize=256m
          bookkeeper:
            replicaCount: 2
            resources:
              requests:
                cpu: 0.01
                memory: 256Mi
            configData:
              PULSAR_MEM: >
                -Xms256m
                -Xmx256m
                -XX:MaxDirectMemorySize=256m
              PULSAR_GC: >
                -Dio.netty.leakDetectionLevel=disabled
                -Dio.netty.recycler.linkCapacity=1024
                -XX:+UseG1GC -XX:MaxGCPauseMillis=10
                -XX:+ParallelRefProcEnabled
                -XX:+UnlockExperimentalVMOptions
                -XX:+DoEscapeAnalysis
                -XX:ParallelGCThreads=32
                -XX:ConcGCThreads=32
                -XX:G1NewSizePercent=50
                -XX:+DisableExplicitGC
                -XX:-ResizePLAB
                -XX:+ExitOnOutOfMemoryError
                -XX:+PerfDisableSharedMem
                -XX:+PrintGCDetails
          zookeeper:
            replicaCount: 1
            resources:
              requests:
                cpu: 0.01
                memory: 256Mi
            configData:
              PULSAR_MEM: >
                -Xms256m
                -Xmx256m
              PULSAR_GC: >
                -Dcom.sun.management.jmxremote
                -Djute.maxbuffer=10485760
                -XX:+ParallelRefProcEnabled
                -XX:+UnlockExperimentalVMOptions
                -XX:+DoEscapeAnalysis -XX:+DisableExplicitGC
                -XX:+PerfDisableSharedMem
                -Dzookeeper.forceSync=no
          broker:
            replicaCount: 1
            resources:
              requests:
                cpu: 0.01
                memory: 256Mi
            configData:
              PULSAR_MEM: >
                -Xms256m
                -Xmx256m
              PULSAR_GC: >
                -XX:MaxDirectMemorySize=256m
                -Dio.netty.leakDetectionLevel=disabled
                -Dio.netty.recycler.linkCapacity=1024
                -XX:+ParallelRefProcEnabled
                -XX:+UnlockExperimentalVMOptions
                -XX:+DoEscapeAnalysis
                -XX:ParallelGCThreads=32
                -XX:ConcGCThreads=32
                -XX:G1NewSizePercent=50
                -XX:+DisableExplicitGC
                -XX:-ResizePLAB
                -XX:+ExitOnOutOfMemoryError
    storage:
      inCluster:
        values:
          mode: standalone
          persistance:
            size: 20Gi

On Tue, Dec 5, 2023, 03:27 shaoyue @.***> wrote:

@Tanchwa https://github.com/Tanchwa It seems you put the size field in the wrong place. I tested it my self and it works fine. below is my full CR:

apiVersion: milvus.io/v1beta1kind: Milvusmetadata: name: my-release labels: app: milvusspec: config: {} components: standalone: replicas: 1 serviceType: LoadBalancer dependencies: etcd: inCluster: values: replicaCount: 1 deletionPolicy: Delete pvcDeletion: true storage: inCluster: values: mode: standalone resources: requests: memory: 100Mi persistence: size: 20Gi deletionPolicy: Delete pvcDeletion: true

— Reply to this email directly, view it on GitHub https://github.com/zilliztech/milvus-operator/issues/54#issuecomment-1840250508, or unsubscribe https://github.com/notifications/unsubscribe-auth/AVIZ4SV7KOWLNYYKCCHVF2DYH3LHBAVCNFSM6AAAAABADVNEH2VHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMYTQNBQGI2TANJQHA . You are receiving this because you were mentioned.Message ID: @.***>

haorenfsa commented 7 months ago

@Tanchwa Then it's because you didn't delete the PVC before you redeploy. By default milvus-operator won't delete the depencencies & data when you delete Milvus CR.

The minio release & data is only deleted when you specify Milvus CR like below.

spec:
  dependencies:
    storage:
      inCluster:
        deletionPolicy: Delete
        pvcDeletion: true

So here is how to fix this, you should add the deletionPolicy & pvcDeletion values to apply you milvus CR again, and then delete it, and wait for the PVCs to be cleaned, and then create the Milvus CR once more.

Tanchwa commented 6 months ago

I already said that I had deleted them and restarted. Do you try another reason why it could not be working?

haorenfsa commented 6 months ago

@Tanchwa I'm quite sure about the reason, did you set your CR like below before you delete it ?

spec:
  dependencies:
    storage:
      inCluster:
        deletionPolicy: Delete
        pvcDeletion: true

If not, use helm list & kubectl get pvc to check the left behind things in your kubernetes.

Use helm uninstall xxx & kubectl delete pvc xxx to clean them up.

haorenfsa commented 6 months ago

@Tanchwa Oh, there's a typo in your manifest, it's persistence not persistance

Tanchwa commented 6 months ago

Jesus, can you feel me facepalm through the internet?

haorenfsa commented 6 months ago

Yes buddy, it happens...